# 03 - Construct transactions history in the last 3 months features

This notebook creates the transactions history features in the last 3 months the loan was created. What this means is that if a user creates a loan at `2022-05-05 23:30:48.986000` we're going to calculate the transactions history features that happened between `2022-02-01 00:00:00.000000` and `2022-05-01 00:00:00.000000`. 

For that, we have 2 columns that are stored in `database.db` file that are in `\databases\database.db`:

#### Loans Table
- **id (int)**: Unique identifier for the loan.
- **user_id (int)**: Unique identifier for the user who has taken the loan.
- **amount (float)**: The amount of loan disbursed.
- **total_amount (float)**: The amount of loan, including fees.
- **due_amount (int)**: The amount of the loan by the due date if there are no repayments during the contract period. Good to get interest rates.
- **due_date (object)**: The date by which the loan is due.
- **status (object)**: Current status of the loan (e.g., repaid, debt_collection, ongoing, debt_repaid).
    - **repaid**: A loan that was was paid until due.
    - **debt_collection**: A loan that was not paid until due.
    - **debt_repaid**: A loan that was not paid until due but we recovered the money somehow.
    - **cancelled**: A canceled loan.
    - **error**: Operational error.
- **created_at (object)**: Timestamp of when the loan record was created. <u>Have it as the beginning of the loan</u>.

### Transactions Table
- **id (int)**: Unique identifier for the transaction.
- **user_id (int)**: The user ID associated with the transaction.
- **amount (float)**: Transaction amount.
- **status (object)**: Status of the transaction.
    - **approved**: A transaction that happened.
    - **denied**: A transaction that didn't happened due to it being denied.
- **capture_method (object)**: Method of capturing the transaction.
- **payment_method (object)**: Payment method used (e.g., credit, debit).
- **installments (int)**: Number of installments for the transaction.
- **card_brand (object)**: Brand of the card used for the transaction.
- **created_at (object)**: Timestamp of when the transaction record was created. <u>Have it as the moment the transaction happened</u>.


## Results

For this dataset, we're going to construct these features:

- `avg_amt_transactions_in_last_three_months`: This calculates the avg amount of transactions between the first day of the third month before the loan was created and first day of the month that the loan was created.
- `max_amt_transactions_in_last_three_months`: This calculates the maximum value of transactions between the first day of the third month before the loan was created and first day of the month that the loan was created.
- `most_frequent_transactions_payment_method_in_last_three_months`: This calculates the most frequent payment transactions method betweem the first day of the third month before the loan was created and first day of the month that the loan was created.
- `avg_amt_payment_method_credit_method_in_last_three_months`: This calculates the avg amount of transactions that uses credit as payment method between the first day of the third month before the loan was created and first day of the month that the loan was created.
- `avg_amt_payment_method_debit_method_in_last_three_months`: This calculates the avg amount of transactions that uses debit as payment method between the first day of the third month before the loan was created and first day of the month that the loan was created.
- `avg_amt_transactions_in_visa_in_last_three_months`: the amount of transactions between the first day of the third month before the loan was created and first day of the month that the loan was created that uses the visa card_brand.
- `avg_amt_transactions_in_mastercard_in_last_three_months`: the amount of transactions between the first day of the third month before the loan was created and first day of the month that the loan was created that uses the mastercard card_brand.
- `avg_amt_transactions_in_elo_in_last_three_months`: the amount of transactions between the first day of the third month before the loan was created and first day of the month that the loan was created that uses the elo card_brand.
- `avg_amt_transactions_in_hipercard_in_last_three_months`: the amount of transactions between the first day of the third month before the loan was created and first day of the month that the loan was created that uses the hipercard card_brand.
- `avg_amt_transactions_in_amex_in_last_three_months`: the amount of transactions between the first day of the third month before the loan was created and first day of the month that the loan was created that uses the amex card_brand.
- `max_installments_in_last_three_months`: The maximum installments value considere the transactions that happnend between the first day of the third month before the loan was created and first day of the month that the loan was created.
- `median_installments_in_last_three_months`: The median installments value considere the transactions that happnend between the first day of the third month before the loan was created.

Have in mind that the transactions we're considering here are the approved transactions between the first day of the third month before the loan was created and first day of the month that the loan was created.


The final dataset is located at `data/processed` with name of `df_transactions_history_per_user_in_last_three_months.csv`.

## 1 - Imports

In [1]:
import os 
os.chdir("../../")

In [2]:
import sqlalchemy
import pandas as pd 
import numpy as np

from pandas.tseries.offsets import DateOffset
from datetime import datetime,date
from typing import Union

## 2 - Read tables

In [3]:
engine = sqlalchemy.create_engine("sqlite:///./database/database.db", echo=True)

df_loans = pd.read_sql(
    sql="""
    SELECT * FROM loans l
    """,
    con=engine
)
df_loans_repay = pd.read_sql(
    sql="""
    SELECT * FROM loan_repayments lr
    """,
    con=engine
)

df_transactions = pd.read_sql(
    sql="""
    SELECT * FROM transactions tr
    """,
    con=engine
)

2024-04-08 17:26:13,617 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-08 17:26:13,618 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("
    SELECT * FROM loans l
    ")
2024-04-08 17:26:13,618 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-04-08 17:26:13,620 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("
    SELECT * FROM loans l
    ")
2024-04-08 17:26:13,620 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-04-08 17:26:13,622 INFO sqlalchemy.engine.Engine 
    SELECT * FROM loans l
    
2024-04-08 17:26:13,622 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-04-08 17:26:13,760 INFO sqlalchemy.engine.Engine ROLLBACK
2024-04-08 17:26:13,770 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-08 17:26:13,773 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("
    SELECT * FROM loan_repayments lr
    ")
2024-04-08 17:26:13,786 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-04-08 17:26:13,796 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("
    SELECT * FR

## 3 - Preprocessing data

In [4]:
#convert to datetime
df_loans["created_at"] = pd.to_datetime(df_loans["created_at"],utc=True,format="ISO8601")
df_loans_repay["created_at"] = pd.to_datetime(df_loans_repay["created_at"],utc=True,format="ISO8601")
df_transactions["created_at"] = pd.to_datetime(df_transactions["created_at"],utc=True,format="ISO8601")

#convert to date
df_loans["due_date"] = pd.to_datetime(df_loans["due_date"],format="%Y-%m-%d")

#create date_created 
df_loans["date_created"] = df_loans["created_at"].apply(func=lambda d:d.date())
df_loans_repay["date_created"] = df_loans_repay["created_at"].apply(func=lambda d:d.date())
df_transactions["date_created"] = df_transactions["created_at"].apply(func=lambda d:d.date())

# add reference_date in df_loans
df_loans["reference_date"] = [date(year=d.year,month=d.month,day=1)
                              for d in df_loans["date_created"]]

## 4 - Construct features

### 4.1 - Functions to create features

In [5]:
def filter_amt_transactions_in_last_three_months(
        dataframe_loans:pd.DataFrame,
        dataframe_transactions:pd.DataFrame,
)->list:
    """
    Filters the transactions of a user that happended 
    in the last three months before the loan was created.

    Args:
        dataframe_loans (pd.DataFrame): Dataframe with loans made by a user and their 
        date created, reference date and timestamp of the moment when it was created.
        dataframe_transactions (pd.DataFrame): Dataframe with all transactions made 
        by users and their date created and timestamp of the moment when it was created.

    Returns:
        list: A list of all transactions made by users in dataframe_loans with transactions
        made in the last three months before the loan was created.
    """    
    dfs_all_transactions_per_month = []
    for ref_date in dataframe_loans["reference_date"].drop_duplicates().values:
        b_date_month = (ref_date + DateOffset(months=-3)).date()
        filter_df =  dataframe_transactions[dataframe_transactions["date_created"].between(left=b_date_month,right=ref_date,inclusive="left")].copy()
        if len(filter_df)>0:
            filter_df["reference_date"]=ref_date
            dfs_all_transactions_per_month.append(filter_df)
    
    return dfs_all_transactions_per_month

def calculate_avg_in_group_per_user(
          list_dataframes:list,
          group_by_col:Union[str,list],
          col_to_avg:Union[str,list],
          new_col_name:str
          )->pd.DataFrame:
    """
    For each dataframe calculates the average value of a column 
    in a group.

    Args:
        list_dataframes (list): A list of dataframes to calculate the average values per groups.
        group_by_col (Union[str,list]): A column to group values.
        col_to_avg (Union[str,list]): A column to calculate the average values.
        new_col_name (str): The new column name for the average values calculated.

    Returns:
        pd.DataFrame: A dataframe with all the calculated average values per group.
    """    
    dfs_avg_in_group_per_user = [
    df.groupby(by=group_by_col)[col_to_avg].mean().to_frame(name=new_col_name)
    for df in list_dataframes
    ]
    df_avg_in_group_per_user = pd.concat(dfs_avg_in_group_per_user).reset_index()
    return df_avg_in_group_per_user

def calculate_max_in_group_per_user(
          list_dataframes:list,
          group_by_col:Union[str,list],
          col_to_search_max:Union[str,list],
          new_col_name:str
          )->pd.DataFrame:
    """
    For each dataframe in list_dataframes groups their values and 
    calculates the maxiumm value per group.

    Args:
        list_dataframes (list): A list of dataframes to calculate the maximum values per groups.
        group_by_col (Union[str,list]): A column to group values.
        col_to_search_max (Union[str,list]): A column to search the maximum value.
        new_col_name (str): New column name for the maximum values in the result dataframe.

    Returns:
        pd.DataFrame: A dataframe with all the calculated maximum values per group.
    """    
    dfs_max_in_group_per_user  = [
    df.groupby(by=group_by_col)[col_to_search_max].max().to_frame(name=new_col_name)
    for df in list_dataframes
    ]
    df_max_in_group_per_user = pd.concat(dfs_max_in_group_per_user).reset_index()
    return df_max_in_group_per_user

def calculate_median_in_group_per_user(
          list_dataframes:list,
          group_by_col:Union[str,list],
          col_to_search_median:Union[str,list],
          new_col_name:str
          )->pd.DataFrame:
    """
    For each dataframe in list_dataframes groups their values and 
    calculates the median value per group.

    Args:
        list_dataframes (list): A list of dataframes to calculate the median values per groups.
        group_by_col (Union[str,list]): A column to group values.
        col_to_search_max (Union[str,list]): A column to search the median value.
        new_col_name (str): New column name for the median values in the result dataframe.

    Returns:
        pd.DataFrame: A dataframe with all the calculated median values per group.
    """    
    dfs_median_in_group_per_user  = [
    df.groupby(by=group_by_col)[col_to_search_median].median().to_frame(name=new_col_name)
    for df in list_dataframes
    ]
    df_median_in_group_per_user = pd.concat(dfs_median_in_group_per_user).reset_index()
    return df_median_in_group_per_user

def calculate_most_frequent(
        dataframe:pd.DataFrame,
        col_to_count:Union[str,list],
        col_to_group:Union[str,list],
        return_counts:bool=False
)->pd.DataFrame:
    """
    Count the most frequent ocurrence of a value in a column in a dataframe.

    Args:
        dataframe (pd.DataFrame): Initial dataframe to count 
        col_to_count (str,list): Column to count the vaules.
        col_to_group (str,list): Column to group the vaules in dataframe.
        return_counts (bool, optional): If true, returns the counts of the most frequent value.
        If false, return only the group id and their most frequent value . Defaults to False.

    Returns:
        pd.DataFrame: A new dataframe with the most frequent value in the column by group. If
        return_counts is true, returns the frequency of the value per group. If return_counts
        is false, then, returns only the group and their most frequent value.
    """    
    df_counts_type = dataframe.groupby(by=col_to_group)[col_to_count]\
                              .value_counts()\
                              .to_frame(name="count_types")\
                              .reset_index()

    idxs_most_frequent = df_counts_type.groupby(by=col_to_group)["count_types"].idxmax()
    final_df = df_counts_type.loc[idxs_most_frequent,:].reset_index(drop=True)

    if return_counts:
        return final_df
    else:
         final_df = final_df.drop(columns="count_types")
    
    return final_df

### 4.2 - Create sum amount features

- `avg_amt_transactions_in_last_month`
- `avg_amt_payment_method_credit_in_last_month`
- `avg_amt_payment_method_debit_in_last_month`
- `avg_amt_transactions_in_visa_in_last_month`
- `avg_amt_transactions_in_mastercard_in_last_month`
- `avg_amt_transactions_in_elo_in_last_month`
- `avg_amt_transactions_in_hipercard_in_last_month`
- `avg_amt_transactions_in_amex_in_last_month`

In [6]:
df_approved_transactions = df_transactions[df_transactions["status"]=="approved"]
dfs_trans_in_last_three_mths= filter_amt_transactions_in_last_three_months(
    dataframe_loans=df_loans,
    dataframe_transactions=df_approved_transactions
)

In [7]:
## avg_amt_transactions_at_created_loan feature
df_avg_amt_transactions_in_last_three_months = calculate_avg_in_group_per_user(
    list_dataframes=dfs_trans_in_last_three_mths,
    group_by_col=["user_id","reference_date"],
    col_to_avg="amount",
    new_col_name="avg_amt_transactions_in_last_three_months"
)
df_avg_amt_transactions_in_last_three_months.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,avg_amt_transactions_in_last_three_months
8360,0,2022-06-01,3272.833667
5684,0,2022-05-01,3494.315789
11131,0,2022-07-01,2358.467000
3178,0,2022-04-01,2406.928571
19185,0,2022-10-01,2339.392857
...,...,...,...
16687,3153,2022-08-01,1045.774194
5683,3153,2022-04-01,1054.136737
3177,3153,2022-03-01,1004.186176
11130,3153,2022-06-01,959.660194


In [8]:
## avg_amt_transactions per payment_method
df_avg_amt_transactions_per_user_in_last_three_months_by_pm = calculate_avg_in_group_per_user(
    list_dataframes=dfs_trans_in_last_three_mths,
    group_by_col=["user_id","reference_date","payment_method"],
    col_to_avg="amount",
    new_col_name="avg_amt_transactions_in_last_three_months_by_payment_method"
)
df_avg_amt_transactions_per_user_in_last_three_months_by_pm.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,payment_method,avg_amt_transactions_in_last_three_months_by_payment_method
9822,0,2022-05-01,credit,3688.388889
1905,0,2022-03-01,credit,600.000000
29899,0,2022-09-01,debit,749.000000
29898,0,2022-09-01,credit,2054.161290
24901,0,2022-08-01,debit,176.375000
...,...,...,...,...
9820,3153,2022-04-01,credit,1607.792931
5344,3153,2022-03-01,debit,194.333333
5343,3153,2022-03-01,credit,1505.523651
14687,3153,2022-05-01,debit,263.031250


In [9]:
## avg_amt_transactions per card brand
df_avg_amt_transactions_per_user_in_last_three_months_by_crdb = calculate_avg_in_group_per_user(
    list_dataframes=dfs_trans_in_last_three_mths,
    group_by_col=["user_id","reference_date","card_brand"],
    col_to_avg="amount",
    new_col_name="avg_amt_transactions_in_last_three_months_by_card_brand"
)
df_avg_amt_transactions_per_user_in_last_three_months_by_crdb.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,card_brand,avg_amt_transactions_in_last_three_months_by_card_brand
8677,0,2022-04-01,mastercard,1987.125000
56551,0,2022-10-01,elo,550.000000
49093,0,2022-09-01,visa,2234.833333
49092,0,2022-09-01,mastercard,1381.454545
49091,0,2022-09-01,elo,550.000000
...,...,...,...,...
8675,3153,2022-03-01,mastercard,773.999767
8674,3153,2022-03-01,elo,310.818182
3187,3153,2022-02-01,visa,1498.028571
24121,3153,2022-05-01,visa,1149.406250


In [10]:
## create dataframes of avg amount transactions per user by payment_method

df_avg_amt_transactions_per_user_in_last_three_months_by_credit = df_avg_amt_transactions_per_user_in_last_three_months_by_pm[
    df_avg_amt_transactions_per_user_in_last_three_months_by_pm["payment_method"]=="credit"
].copy()\
 .rename({"avg_amt_transactions_in_last_three_months_by_payment_method":
          "avg_amt_payment_method_credit_in_last_three_months"},axis=1)\
 .drop(columns=["payment_method"])

df_avg_amt_transactions_per_user_in_last_three_months_by_debit = df_avg_amt_transactions_per_user_in_last_three_months_by_pm[
    df_avg_amt_transactions_per_user_in_last_three_months_by_pm["payment_method"]=="debit"
].copy()\
 .rename({"avg_amt_transactions_in_last_three_months_by_payment_method":
          "avg_amt_payment_method_debit_in_last_three_months"},axis=1)\
 .drop(columns=["payment_method"])

In [11]:
## create dataframes of avg amount transactions per user by card_brand

df_avg_amt_transactions_per_user_in_last_three_months_with_visa = df_avg_amt_transactions_per_user_in_last_three_months_by_crdb[
    df_avg_amt_transactions_per_user_in_last_three_months_by_crdb["card_brand"]=="visa"
].copy()\
 .rename({"avg_amt_transactions_in_last_three_months_by_card_brand":
          "avg_amt_transactions_in_visa_in_last_three_months"},axis=1)\
 .drop(columns=["card_brand"])

df_avg_amt_transactions_per_user_in_last_three_months_with_mastercard = df_avg_amt_transactions_per_user_in_last_three_months_by_crdb[
    df_avg_amt_transactions_per_user_in_last_three_months_by_crdb["card_brand"]=="mastercard"
].copy()\
 .rename({"avg_amt_transactions_in_last_three_months_by_card_brand":
          "avg_amt_transactions_in_mastercard_in_last_three_months"},axis=1)\
 .drop(columns=["card_brand"])

df_avg_amt_transactions_per_user_in_last_three_months_with_elo = df_avg_amt_transactions_per_user_in_last_three_months_by_crdb[
    df_avg_amt_transactions_per_user_in_last_three_months_by_crdb["card_brand"]=="elo"
].copy()\
 .rename({"avg_amt_transactions_in_last_three_months_by_card_brand":
          "avg_amt_transactions_in_elo_in_last_three_months"},axis=1)\
 .drop(columns=["card_brand"])

df_avg_amt_transactions_per_user_in_last_three_months_with_hpcrd = df_avg_amt_transactions_per_user_in_last_three_months_by_crdb[
    df_avg_amt_transactions_per_user_in_last_three_months_by_crdb["card_brand"]=="hipercard"
].copy()\
 .rename({"avg_amt_transactions_in_last_three_months_by_card_brand":
          "avg_amt_transactions_in_hipercard_in_last_three_months"},axis=1)\
 .drop(columns=["card_brand"])

df_avg_amt_transactions_per_user_in_last_three_months_with_amex = df_avg_amt_transactions_per_user_in_last_three_months_by_crdb[
    df_avg_amt_transactions_per_user_in_last_three_months_by_crdb["card_brand"]=="amex"
].copy()\
 .rename({"avg_amt_transactions_in_last_three_months_by_card_brand":
          "avg_amt_transactions_in_amex_in_last_three_months"},axis=1)\
 .drop(columns=["card_brand"])

### 4.3 - Create max amount and installments features
- `max_installments_in_last_three_months`
- `max_amt_transactions_in_last_three_months`

In [12]:
df_max_amt_transactions_per_user_in_last_three_months = calculate_max_in_group_per_user(
    list_dataframes=dfs_trans_in_last_three_mths,
    group_by_col=["user_id","reference_date"],
    col_to_search_max="amount",
    new_col_name="max_amt_transactions_in_last_three_months"
)
df_max_amt_transactions_per_user_in_last_three_months.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,max_amt_transactions_in_last_three_months
8360,0,2022-06-01,22000.0
5684,0,2022-05-01,22000.0
11131,0,2022-07-01,22000.0
3178,0,2022-04-01,6600.0
19185,0,2022-10-01,23000.0
...,...,...,...
16687,3153,2022-08-01,8500.0
5683,3153,2022-04-01,25000.0
3177,3153,2022-03-01,25000.0
11130,3153,2022-06-01,6500.0


In [13]:
df_max_installments_per_user_in_last_three_months = calculate_max_in_group_per_user(
    list_dataframes=dfs_trans_in_last_three_mths,
    group_by_col=["user_id","reference_date"],
    col_to_search_max="installments",
    new_col_name="max_installments_in_last_three_months"
)
df_max_installments_per_user_in_last_three_months.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,max_installments_in_last_three_months
8360,0,2022-06-01,12
5684,0,2022-05-01,12
11131,0,2022-07-01,12
3178,0,2022-04-01,12
19185,0,2022-10-01,12
...,...,...,...
16687,3153,2022-08-01,12
5683,3153,2022-04-01,12
3177,3153,2022-03-01,12
11130,3153,2022-06-01,12


### 4.4 - Create median installments features

- `median_installments_in_last_three_months`

In [14]:
df_median_installments_per_user_in_last_three_months = calculate_median_in_group_per_user(
    list_dataframes=dfs_trans_in_last_three_mths,
    group_by_col=["user_id","reference_date"],
    col_to_search_median="installments",
    new_col_name="median_installments_in_last_three_months"
)
df_median_installments_per_user_in_last_three_months.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,median_installments_in_last_three_months
8360,0,2022-06-01,5.0
5684,0,2022-05-01,5.0
11131,0,2022-07-01,3.5
3178,0,2022-04-01,4.0
19185,0,2022-10-01,2.5
...,...,...,...
16687,3153,2022-08-01,4.0
5683,3153,2022-04-01,1.0
3177,3153,2022-03-01,1.0
11130,3153,2022-06-01,4.0


### 4.5 - Create most frequent features 

- `most_frequent_transactions_payment_method_in_last_three_months`

In [15]:
dfs_most_frequent_payment_per_user_in_last_three_months = [
    calculate_most_frequent(
        dataframe=df,
        col_to_count="payment_method",
        col_to_group=["user_id","reference_date"]
    )
    for df in dfs_trans_in_last_three_mths
]
df_most_frequent_payment_per_user_in_last_three_months = pd.concat(dfs_most_frequent_payment_per_user_in_last_three_months)
df_most_frequent_payment_per_user_in_last_three_months = df_most_frequent_payment_per_user_in_last_three_months.rename({"payment_method":"most_frequent_transactions_payment_method_in_last_three_months"},axis=1)
df_most_frequent_payment_per_user_in_last_three_months

Unnamed: 0,user_id,reference_date,most_frequent_transactions_payment_method_in_last_three_months
0,1,2022-02-01,credit
1,2,2022-02-01,credit
2,6,2022-02-01,credit
3,7,2022-02-01,debit
4,8,2022-02-01,credit
...,...,...,...
2295,3148,2022-10-01,credit
2296,3150,2022-10-01,credit
2297,3151,2022-10-01,credit
2298,3152,2022-10-01,credit


## 5 - Merge tables

In [16]:
df_feats_hist_trans_in_last_three_months = df_loans.merge(
    right=df_avg_amt_transactions_in_last_three_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left")\
    .merge(
    right=df_avg_amt_transactions_per_user_in_last_three_months_by_credit.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_avg_amt_transactions_per_user_in_last_three_months_by_debit.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_avg_amt_transactions_per_user_in_last_three_months_with_visa.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_avg_amt_transactions_per_user_in_last_three_months_with_mastercard.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_avg_amt_transactions_per_user_in_last_three_months_with_elo.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_avg_amt_transactions_per_user_in_last_three_months_with_hpcrd.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_avg_amt_transactions_per_user_in_last_three_months_with_amex.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_max_amt_transactions_per_user_in_last_three_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_max_installments_per_user_in_last_three_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_median_installments_per_user_in_last_three_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_most_frequent_payment_per_user_in_last_three_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )

In [17]:
print("Shape of final dataset:",df_feats_hist_trans_in_last_three_months.shape)
print("Features in final dataset:",df_feats_hist_trans_in_last_three_months.columns)
df_feats_hist_trans_in_last_three_months.sort_values(by=["user_id","created_at"])

Shape of final dataset: (6746, 22)
Features in final dataset: Index(['id', 'user_id', 'amount', 'total_amount', 'due_amount', 'due_date',
       'status', 'created_at', 'date_created', 'reference_date',
       'avg_amt_transactions_in_last_three_months',
       'avg_amt_payment_method_credit_in_last_three_months',
       'avg_amt_payment_method_debit_in_last_three_months',
       'avg_amt_transactions_in_visa_in_last_three_months',
       'avg_amt_transactions_in_mastercard_in_last_three_months',
       'avg_amt_transactions_in_elo_in_last_three_months',
       'avg_amt_transactions_in_hipercard_in_last_three_months',
       'avg_amt_transactions_in_amex_in_last_three_months',
       'max_amt_transactions_in_last_three_months',
       'max_installments_in_last_three_months',
       'median_installments_in_last_three_months',
       'most_frequent_transactions_payment_method_in_last_three_months'],
      dtype='object')


Unnamed: 0,id,user_id,amount,total_amount,due_amount,due_date,status,created_at,date_created,reference_date,...,avg_amt_payment_method_debit_in_last_three_months,avg_amt_transactions_in_visa_in_last_three_months,avg_amt_transactions_in_mastercard_in_last_three_months,avg_amt_transactions_in_elo_in_last_three_months,avg_amt_transactions_in_hipercard_in_last_three_months,avg_amt_transactions_in_amex_in_last_three_months,max_amt_transactions_in_last_three_months,max_installments_in_last_three_months,median_installments_in_last_three_months,most_frequent_transactions_payment_method_in_last_three_months
2477,2477,0,6000.0,6045.28,6459000000,2022-07-25,error,2022-04-26 16:47:20.625000+00:00,2022-04-26,2022-04-01,...,1.000000,2966.666667,1987.125000,,,,6600.00,12.0,4.0,credit
86,86,1,6000.0,6045.28,6459000000,2022-05-03,debt_collection,2022-02-02 15:36:00.574000+00:00,2022-02-02,2022-02-01,...,250.000000,1358.180000,7283.333333,605.000000,,,17100.00,12.0,6.0,credit
223,223,2,6000.0,6045.28,6459000000,2022-05-05,debt_collection,2022-02-04 18:20:58.272000+00:00,2022-02-04,2022-02-01,...,,658.750000,1528.500000,3820.000000,2296.153846,,8000.00,12.0,10.0,credit
1744,1744,3,6000.0,6045.28,6458800000,2022-07-18,repaid,2022-04-18 21:46:00.032000+00:00,2022-04-18,2022-04-01,...,,855.769231,690.000000,3300.000000,,,4500.00,10.0,5.0,credit
4538,4538,3,6000.0,6045.28,6458800000,2022-10-07,debt_collection,2022-07-09 16:23:37.569000+00:00,2022-07-09,2022-07-01,...,,1645.833333,1525.000000,4266.666667,,,10000.00,10.0,5.0,credit
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1186,1186,3153,6000.0,6045.28,6458800000,2022-06-13,repaid,2022-03-15 15:28:53.048000+00:00,2022-03-15,2022-03-01,...,194.333333,1369.291667,773.999767,310.818182,,,25000.00,12.0,1.0,credit
3111,3111,3153,6000.0,6045.28,6458800000,2022-08-02,repaid,2022-05-04 11:18:29.811000+00:00,2022-05-04,2022-05-01,...,263.031250,1149.406250,938.937292,523.000000,,,6999.99,12.0,2.0,credit
3856,3856,3153,6000.0,6045.28,6458780000,2022-09-13,repaid,2022-06-15 19:31:51.132000+00:00,2022-06-15,2022-06-01,...,252.260870,1115.702703,910.339286,658.500000,,,6500.00,12.0,4.0,credit
4358,4358,3153,6000.0,6045.28,6458800000,2022-10-02,repaid,2022-07-04 15:32:00.095000+00:00,2022-07-04,2022-07-01,...,292.500000,1235.642857,972.941176,772.000000,,,8500.00,12.0,4.0,credit


## 6 - Save table

In [18]:
df_feats_hist_trans_in_last_three_months.to_csv(
    "./data/processed/df_transactions_history_per_user_in_last_three_months.csv",index=False
)