# 03 - Construct loans repayment history in the last six months features

This notebook creates the loan history features in the last six months the loan was created. What this means is that if a user creates a loan at `2022-05-05 23:30:48.986000` we're going to calculate if he have a history of loans that happened between `2022-11-01 00:00:00.000000` and `2022-05-01 00:00:00.000000`. 

For this dataset, we're going to construct these features:

- `avg_repaid_total_amt_loans_in_last_six_months`: The average total_amount of loans repaid that a user had in last six months.
- `max_repaid_total_amt_loans_in_last_six_months`: The maximum value of loans repaid that a user had in last six months.
- `avg_pct_repaid_first_month_loans_in_last_six_months`: The average percentual of repayments in first month in the last six months.
- `avg_pct_repaid_sec_month_loans_in_last_six_months`: The average percentual of repayments in second month in the last six months
- `avg_pct_repaid_trd_month_loans_in_last_six_months`: The average percentual of repayments in third month in the last six months.
- `pct_repaid_loans_in_last_six_months`: The percentual of repaid loans of a user in the last six months.
- `most_frequent_loans_repayment_in_last_six_months`: The most frequent form of repayment type for loans made by a user in the last six months.


The final dataset is located at `data/processed` with name of X.

## 1 - Imports

In [1]:
import os 
os.chdir("../../")

In [2]:
import sqlalchemy
import pandas as pd 
import numpy as np

from pandas.tseries.offsets import DateOffset
from datetime import datetime,date
from typing import Union

## 2 - Read tables

In [3]:
engine = sqlalchemy.create_engine("sqlite:///./database/database.db", echo=True)

df_loans = pd.read_sql(
    sql="""
    SELECT * FROM loans l
    """,
    con=engine
)

2024-04-07 15:20:28,929 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-04-07 15:20:28,929 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("
    SELECT * FROM loans l
    ")
2024-04-07 15:20:28,930 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-04-07 15:20:28,931 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("
    SELECT * FROM loans l
    ")
2024-04-07 15:20:28,932 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-04-07 15:20:28,933 INFO sqlalchemy.engine.Engine 
    SELECT * FROM loans l
    
2024-04-07 15:20:28,933 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-04-07 15:20:28,984 INFO sqlalchemy.engine.Engine ROLLBACK


In [4]:
df_loans_hist = pd.read_csv("./data/processed/loans_repayment_history_features.csv")
print("shape of the dataset:",df_loans_hist.shape)
print("columns of the dataset:",df_loans_hist.columns)
df_loans_hist.head()

shape of the dataset: (6746, 26)
columns of the dataset: Index(['id', 'user_id', 'amount', 'total_amount', 'due_amount', 'due_date',
       'status', 'created_at', 'date_created', 'loan_id',
       'amt_paid_till_first_month', 'amt_paid_till_sec_month',
       'amt_paid_till_trd_month', 'amt_paid_first_month', 'amt_paid_sec_month',
       'amt_paid_third_month', 'pct_paid_first_month', 'pct_paid_sec_month',
       'pct_paid_third_month', 'amt_defaulted_till_first_month',
       'amt_defaulted_till_sec_month', 'amt_defaulted_till_trd_month',
       'amt_defaulted_first_month', 'amt_defaulted_sec_month',
       'amt_defaulted_third_month', 'most_frequent_repayment_type_for_loan'],
      dtype='object')


Unnamed: 0,id,user_id,amount,total_amount,due_amount,due_date,status,created_at,date_created,loan_id,...,pct_paid_first_month,pct_paid_sec_month,pct_paid_third_month,amt_defaulted_till_first_month,amt_defaulted_till_sec_month,amt_defaulted_till_trd_month,amt_defaulted_first_month,amt_defaulted_sec_month,amt_defaulted_third_month,most_frequent_repayment_type_for_loan
0,0,3070,6000.0,6045.28,6459000000,2022-05-02,repaid,2022-02-01 00:47:29.575000+00:00,2022-02-01,0.0,...,0.598707,1.024096,1.024096,,,,,,,autopilot
1,1,2546,6000.0,6045.28,6459000000,2022-05-02,repaid,2022-02-01 00:49:51.763000+00:00,2022-02-01,1.0,...,0.319065,0.751914,1.032918,,,,,,,autopilot
2,2,2413,6000.0,6045.28,6459000000,2022-05-02,repaid,2022-02-01 01:24:40.537000+00:00,2022-02-01,2.0,...,0.354946,0.354946,0.997949,,,,,,,autopilot
3,3,2585,6000.0,6045.28,6459000000,2022-05-02,debt_collection,2022-02-01 02:52:59.803000+00:00,2022-02-01,3.0,...,0.006948,0.026798,0.026798,,,,,,,autopilot
4,4,2556,6000.0,6045.28,6459000000,2022-05-02,repaid,2022-02-01 02:53:07.123000+00:00,2022-02-01,4.0,...,0.58259,0.58259,0.58259,3055.39,3055.39,3055.39,3055.39,0.0,0.0,pix


In [5]:
# check types 
df_loans_hist.dtypes

id                                         int64
user_id                                    int64
amount                                   float64
total_amount                             float64
due_amount                                 int64
due_date                                  object
status                                    object
created_at                                object
date_created                              object
loan_id                                  float64
amt_paid_till_first_month                float64
amt_paid_till_sec_month                  float64
amt_paid_till_trd_month                  float64
amt_paid_first_month                     float64
amt_paid_sec_month                       float64
amt_paid_third_month                     float64
pct_paid_first_month                     float64
pct_paid_sec_month                       float64
pct_paid_third_month                     float64
amt_defaulted_till_first_month           float64
amt_defaulted_till_s

In [7]:
## check object dtypes 

### dates variables 
for col in ["due_date","created_at","date_created"]:
    print("type:",type(df_loans_hist[col].loc[0]))

### cat variables 
for col in ["most_frequent_repayment_type_for_loan","status"]:
    print("type:",type(df_loans_hist[col].loc[0]))


type: <class 'str'>
type: <class 'str'>
type: <class 'str'>
type: <class 'str'>
type: <class 'str'>


## 3 - Preprocessing data

In [8]:
#convert to datetime
df_loans["created_at"] = pd.to_datetime(df_loans_hist["created_at"],utc=True,format="ISO8601")
df_loans_hist["created_at"] = pd.to_datetime(df_loans_hist["created_at"],utc=True,format="ISO8601")


#convert to date
df_loans["due_date"] = pd.to_datetime(df_loans["due_date"],format="%Y-%m-%d")
df_loans_hist["due_date"] = pd.to_datetime(df_loans_hist["due_date"],format="%Y-%m-%d")

#create date_created 
df_loans_hist["date_created"] = df_loans_hist["created_at"].apply(func=lambda d:d.date())

#create reference_date in df_loans 
df_loans["reference_date"] = [
    date(year=d.date().year,month=d.date().month,day=1)
    for d in df_loans["created_at"]
]

## 4 - Construct features

### 4.1 - Functions to create features

In [11]:
def filter_loans_hist_per_month(
        dataframe_loans_hist:pd.DataFrame,
        time_range:Union[list,pd.DatetimeIndex]
)->list:
    """
    Filters the loans repayments history of a user that happended
    in the last six months of a reference date in time range values.

    Args:
        dataframe_loans_hist (pd.DataFrame): A dataframe with loans repayments history 
        and their date created of the moment when it was created.
        time_range (Union[list,pd.DatetimeIndex]): A time period list to be a reference
        date to filter the loans repayments history.

    Returns:
        list: A list of all loans repayments made by a user in the last six months before a reference date.
    """    
    dfs_all_loans_history_per_month = []
    for ref_date in time_range:
        b_date_month = (ref_date + DateOffset(months=-3)).date()
        filter_df = dataframe_loans_hist[dataframe_loans_hist["date_created"].between(left=b_date_month,right=ref_date.date(),inclusive="left")].copy()
        if len(filter_df)>0:
            filter_df["reference_date"]=ref_date.date()
            dfs_all_loans_history_per_month.append(filter_df)
    
    return dfs_all_loans_history_per_month

def calculate_avg_in_group_per_user(
          list_dataframes:list,
          group_by_col:Union[str,list],
          col_to_avg:Union[str,list],
          new_col_name:str
          )->pd.DataFrame:
    """
    For each dataframe calculates the average value of a column 
    in a group.

    Args:
        list_dataframes (list): A list of dataframes to calculate the average values per groups.
        group_by_col (Union[str,list]): A column to group values.
        col_to_avg (Union[str,list]): A column to calculate the average values.
        new_col_name (str): The new column name for the average values calculated.

    Returns:
        pd.DataFrame: A dataframe with all the calculated average values per group.
    """    
    dfs_avg_in_group_per_user = [
    df.groupby(by=group_by_col)[col_to_avg].mean().to_frame(name=new_col_name)
    for df in list_dataframes
    ]
    df_avg_in_group_per_user = pd.concat(dfs_avg_in_group_per_user).reset_index()
    return df_avg_in_group_per_user

def calculate_max_in_group_per_user(
          list_dataframes:list,
          group_by_col:Union[str,list],
          col_to_search_max:Union[str,list],
          new_col_name:str
          )->pd.DataFrame:
    """
    For each dataframe in list_dataframes groups their values and 
    calculates the maxiumm value per group.

    Args:
        list_dataframes (list): A list of dataframes to calculate the maximum values per groups.
        group_by_col (Union[str,list]): A column to group values.
        col_to_search_max (Union[str,list]): A column to search the maximum value.
        new_col_name (str): New column name for the maximum values in the result dataframe.

    Returns:
        pd.DataFrame: A dataframe with all the calculated maximum values per group.
    """    
    dfs_max_in_group_per_user  = [
    df.groupby(by=group_by_col)[col_to_search_max].max().to_frame(name=new_col_name)
    for df in list_dataframes
    ]
    df_max_in_group_per_user = pd.concat(dfs_max_in_group_per_user).reset_index()
    return df_max_in_group_per_user

def calculate_median_in_group_per_user(
          list_dataframes:list,
          group_by_col:Union[str,list],
          col_to_search_median:Union[str,list],
          new_col_name:str
          )->pd.DataFrame:
    """
    For each dataframe in list_dataframes groups their values and 
    calculates the median value per group.

    Args:
        list_dataframes (list): A list of dataframes to calculate the median values per groups.
        group_by_col (Union[str,list]): A column to group values.
        col_to_search_max (Union[str,list]): A column to search the median value.
        new_col_name (str): New column name for the median values in the result dataframe.

    Returns:
        pd.DataFrame: A dataframe with all the calculated median values per group.
    """    
    dfs_median_in_group_per_user  = [
    df.groupby(by=group_by_col)[col_to_search_median].median().to_frame(name=new_col_name)
    for df in list_dataframes
    ]
    df_median_in_group_per_user = pd.concat(dfs_median_in_group_per_user).reset_index()
    return df_median_in_group_per_user

def calculate_most_frequent(
        dataframe:pd.DataFrame,
        col_to_count:Union[str,list],
        col_to_group:Union[str,list],
        return_counts:bool=False
)->pd.DataFrame:
    """
    Count the most frequent ocurrence of a value in a column in a dataframe.

    Args:
        dataframe (pd.DataFrame): Initial dataframe to count 
        col_to_count (str,list): Column to count the vaules.
        col_to_group (str,list): Column to group the vaules in dataframe.
        return_counts (bool, optional): If true, returns the counts of the most frequent value.
        If false, return only the group id and their most frequent value . Defaults to False.

    Returns:
        pd.DataFrame: A new dataframe with the most frequent value in the column by group. If
        return_counts is true, returns the frequency of the value per group. If return_counts
        is false, then, returns only the group and their most frequent value.
    """    
    df_counts_type = dataframe.groupby(by=col_to_group)[col_to_count]\
                              .value_counts()\
                              .to_frame(name="count_types")\
                              .reset_index()

    idxs_most_frequent = df_counts_type.groupby(by=col_to_group)["count_types"].idxmax()
    final_df = df_counts_type.loc[idxs_most_frequent,:].reset_index(drop=True)

    if return_counts:
        return final_df
    else:
         final_df = final_df.drop(columns="count_types")
    
    return final_df

def calculate_pct_values_per_col(
        dataframe:pd.DataFrame,
        col_to_count:Union[str,list],
        col_to_group:Union[str,list],
)->pd.DataFrame:
    df_counts_type = dataframe.groupby(by=col_to_group)[col_to_count]\
                              .value_counts(normalize=True)\
                              .to_frame(name="pct")\
                              .reset_index()
    
    return df_counts_type

### 4.2 - Create mean of repayment history features

- `avg_repaid_total_amt_loans_in_last_six_months`
- `avg_pct_repaid_first_month_loans_in_last_six_months`
- `avg_pct_repaid_sec_month_loans_in_last_six_months`
- `avg_pct_repaid_trd_month_loans_in_last_six_months`

In [12]:
df_paid_hist_loans = df_loans_hist[df_loans_hist["status"]=="repaid"]
periods = pd.date_range(start=date(2022,1,2),end=date(2023,1,1),freq="MS")
dfs_paid_hist_loans_per_6mths = filter_loans_hist_per_month(
  dataframe_loans_hist=df_paid_hist_loans,
  time_range=periods
)

In [13]:
## avg_repaid_total_amt_in_last_six_months features
df_avg_amt_loans_repay_feats_in_last_six_months = calculate_avg_in_group_per_user(
    list_dataframes=dfs_paid_hist_loans_per_6mths,
    group_by_col=["user_id","reference_date"],
    col_to_avg="total_amount",
    new_col_name="avg_repaid_total_amt_loans_in_last_six_months"
)
df_avg_amt_loans_repay_feats_in_last_six_months.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,avg_repaid_total_amt_loans_in_last_six_months
1166,3,2022-05-01,6045.28
4012,3,2022-07-01,6045.28
2605,3,2022-06-01,6045.28
6673,4,2022-09-01,6045.28
5439,4,2022-08-01,6045.28
...,...,...,...
2604,3153,2022-05-01,6045.28
1165,3153,2022-04-01,6045.28
7955,3153,2022-09-01,6045.28
478,3153,2022-03-01,6045.28


In [14]:
## avg_pct_repaid features
df_avg_pct_first_month_loans_repay_feats_in_last_six_months = calculate_avg_in_group_per_user(
    list_dataframes=dfs_paid_hist_loans_per_6mths,
    group_by_col=["user_id","reference_date"],
    col_to_avg="pct_paid_first_month",
    new_col_name="avg_pct_repaid_first_month_loans_in_last_six_months"
)
df_avg_pct_first_month_loans_repay_feats_in_last_six_months.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,avg_pct_repaid_first_month_loans_in_last_six_months
1166,3,2022-05-01,0.102973
4012,3,2022-07-01,0.102973
2605,3,2022-06-01,0.102973
6673,4,2022-09-01,0.664800
5439,4,2022-08-01,0.301761
...,...,...,...
2604,3153,2022-05-01,0.777383
1165,3153,2022-04-01,0.777383
7955,3153,2022-09-01,0.675231
478,3153,2022-03-01,0.760385


In [15]:
## avg_pct_repaid features
df_avg_pct_sec_month_loans_repay_feats_in_last_six_months = calculate_avg_in_group_per_user(
    list_dataframes=dfs_paid_hist_loans_per_6mths,
    group_by_col=["user_id","reference_date"],
    col_to_avg="pct_paid_sec_month",
    new_col_name="avg_pct_repaid_sec_month_loans_in_last_six_months"
)
df_avg_pct_sec_month_loans_repay_feats_in_last_six_months.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,avg_pct_repaid_sec_month_loans_in_last_six_months
1166,3,2022-05-01,0.625281
4012,3,2022-07-01,0.625281
2605,3,2022-06-01,0.625281
6673,4,2022-09-01,1.028574
5439,4,2022-08-01,1.029307
...,...,...,...
2604,3153,2022-05-01,1.017601
1165,3153,2022-04-01,1.017601
7955,3153,2022-09-01,1.039350
478,3153,2022-03-01,1.017256


In [16]:
## avg_pct_repaid features
df_avg_pct_third_month_loans_repay_feats_in_last_six_months = calculate_avg_in_group_per_user(
    list_dataframes=dfs_paid_hist_loans_per_6mths,
    group_by_col=["user_id","reference_date"],
    col_to_avg="pct_paid_third_month",
    new_col_name="avg_pct_repaid_trd_month_loans_in_last_six_months"
)
df_avg_pct_third_month_loans_repay_feats_in_last_six_months.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,avg_pct_repaid_trd_month_loans_in_last_six_months
1166,3,2022-05-01,1.037472
4012,3,2022-07-01,1.037472
2605,3,2022-06-01,1.037472
6673,4,2022-09-01,1.028574
5439,4,2022-08-01,1.029307
...,...,...,...
2604,3153,2022-05-01,1.017601
1165,3153,2022-04-01,1.017601
7955,3153,2022-09-01,1.039350
478,3153,2022-03-01,1.017256


### 4.3 - Create max repaid total amount features
- `max_repaid_total_amt_loans_in_last_six_months`

In [17]:
df_max_amt_repaid_loans_per_user_in_last_six_months = calculate_max_in_group_per_user(
    list_dataframes=dfs_paid_hist_loans_per_6mths,
    group_by_col=["user_id","reference_date"],
    col_to_search_max="total_amount",
    new_col_name="max_repaid_total_amt_loans_in_last_six_months"
)
df_max_amt_repaid_loans_per_user_in_last_six_months.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,max_repaid_total_amt_loans_in_last_six_months
1166,3,2022-05-01,6045.28
4012,3,2022-07-01,6045.28
2605,3,2022-06-01,6045.28
6673,4,2022-09-01,6045.28
5439,4,2022-08-01,6045.28
...,...,...,...
2604,3153,2022-05-01,6045.28
1165,3153,2022-04-01,6045.28
7955,3153,2022-09-01,6045.28
478,3153,2022-03-01,6045.28


### 4.4 - Create most frequent features 

- `most_frequent_repayment_method_in_last_six_months`

In [18]:
dfs_most_frequent_payment_per_user_in_last_six_months = [
    calculate_most_frequent(
        dataframe=df,
        col_to_count="most_frequent_repayment_type_for_loan",
        col_to_group=["user_id","reference_date"]
    )
    for df in dfs_paid_hist_loans_per_6mths
]
df_most_frequent_payment_per_user_in_last_six_months = pd.concat(dfs_most_frequent_payment_per_user_in_last_six_months)
df_most_frequent_payment_per_user_in_last_six_months = df_most_frequent_payment_per_user_in_last_six_months.rename({"most_frequent_repayment_type_for_loan":"most_frequent_loans_repayment_method_in_last_six_months"},axis=1)
df_most_frequent_payment_per_user_in_last_six_months

Unnamed: 0,user_id,reference_date,most_frequent_loans_repayment_method_in_last_six_months
0,9,2022-03-01,autopilot
1,36,2022-03-01,autopilot
2,38,2022-03-01,autopilot
3,43,2022-03-01,autopilot
4,44,2022-03-01,autopilot
...,...,...,...
27,2873,2023-01-01,autopilot
28,2885,2023-01-01,autopilot
29,2924,2023-01-01,autopilot
30,3042,2023-01-01,autopilot


### 4.5 - Create pct_repaid_loans_in_last_six_months
- `pct_repaid_loans_in_last_six_months`

In [19]:
dfs_all_hist_loans_per_6mths = filter_loans_hist_per_month(
    dataframe_loans_hist=df_loans_hist,
    time_range=periods
)

dfs_pct_status_values_in_last_six_months = [
    calculate_pct_values_per_col(
        dataframe=df,
        col_to_count="status",
        col_to_group=["user_id","reference_date"]
    )
    for df in dfs_all_hist_loans_per_6mths
]
df_pct_status_values_in_last_six_months = pd.concat(dfs_pct_status_values_in_last_six_months)
df_pct_status_values_in_last_six_months_repaid = df_pct_status_values_in_last_six_months[df_pct_status_values_in_last_six_months["status"]=="repaid"]
df_pct_status_values_in_last_six_months_repaid = df_pct_status_values_in_last_six_months_repaid.rename({"pct":"pct_repaid_loans_in_last_six_months"},axis=1)
df_pct_status_values_in_last_six_months_repaid = df_pct_status_values_in_last_six_months_repaid.drop(columns=["status"])
df_pct_status_values_in_last_six_months_repaid.sort_values(by="user_id")

Unnamed: 0,user_id,reference_date,pct_repaid_loans_in_last_six_months
3,3,2022-05-01,1.0
1,3,2022-07-01,1.0
1,3,2022-06-01,1.0
1,4,2022-09-01,1.0
1,4,2022-08-01,1.0
...,...,...,...
2421,3153,2022-05-01,1.0
1192,3153,2022-04-01,1.0
1722,3153,2022-09-01,1.0
832,3153,2022-03-01,1.0


## 5 - Merge tables

In [20]:
df_feats_loans_repay_hist_in_last_six_months = df_loans.merge(
    right=df_avg_amt_loans_repay_feats_in_last_six_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left")\
    .merge(
    right=df_avg_pct_first_month_loans_repay_feats_in_last_six_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_avg_pct_sec_month_loans_repay_feats_in_last_six_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_avg_pct_third_month_loans_repay_feats_in_last_six_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_max_amt_repaid_loans_per_user_in_last_six_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_most_frequent_payment_per_user_in_last_six_months.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )\
    .merge(
    right=df_pct_status_values_in_last_six_months_repaid.set_index(["user_id","reference_date"]),
    right_index=True,
    left_on=["user_id","reference_date"],
    how="left"
    )

In [21]:
print("Shape of final dataset:",df_feats_loans_repay_hist_in_last_six_months.shape)
print("final dataset columns:",df_feats_loans_repay_hist_in_last_six_months.columns)
df_feats_loans_repay_hist_in_last_six_months.sort_values(by=["user_id","reference_date"])

Shape of final dataset: (6746, 16)
final dataset columns: Index(['id', 'user_id', 'amount', 'total_amount', 'due_amount', 'due_date',
       'status', 'created_at', 'reference_date',
       'avg_repaid_total_amt_loans_in_last_six_months',
       'avg_pct_repaid_first_month_loans_in_last_six_months',
       'avg_pct_repaid_sec_month_loans_in_last_six_months',
       'avg_pct_repaid_trd_month_loans_in_last_six_months',
       'max_repaid_total_amt_loans_in_last_six_months',
       'most_frequent_loans_repayment_method_in_last_six_months',
       'pct_repaid_loans_in_last_six_months'],
      dtype='object')


Unnamed: 0,id,user_id,amount,total_amount,due_amount,due_date,status,created_at,reference_date,avg_repaid_total_amt_loans_in_last_six_months,avg_pct_repaid_first_month_loans_in_last_six_months,avg_pct_repaid_sec_month_loans_in_last_six_months,avg_pct_repaid_trd_month_loans_in_last_six_months,max_repaid_total_amt_loans_in_last_six_months,most_frequent_loans_repayment_method_in_last_six_months,pct_repaid_loans_in_last_six_months
2477,2477,0,6000.0,6045.28,6459000000,2022-07-25,error,2022-04-26 16:47:20.625000+00:00,2022-04-01,,,,,,,
86,86,1,6000.0,6045.28,6459000000,2022-05-03,debt_collection,2022-02-02 15:36:00.574000+00:00,2022-02-01,,,,,,,
223,223,2,6000.0,6045.28,6459000000,2022-05-05,debt_collection,2022-02-04 18:20:58.272000+00:00,2022-02-01,,,,,,,
1744,1744,3,6000.0,6045.28,6458800000,2022-07-18,repaid,2022-04-18 21:46:00.032000+00:00,2022-04-01,,,,,,,
4538,4538,3,6000.0,6045.28,6458800000,2022-10-07,debt_collection,2022-07-09 16:23:37.569000+00:00,2022-07-01,6045.28,0.102973,0.625281,1.037472,6045.28,autopilot,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1186,1186,3153,6000.0,6045.28,6458800000,2022-06-13,repaid,2022-03-15 15:28:53.048000+00:00,2022-03-01,6045.28,0.760385,1.017256,1.017256,6045.28,autopilot,1.0
3111,3111,3153,6000.0,6045.28,6458800000,2022-08-02,repaid,2022-05-04 11:18:29.811000+00:00,2022-05-01,6045.28,0.777383,1.017601,1.017601,6045.28,autopilot,1.0
3856,3856,3153,6000.0,6045.28,6458780000,2022-09-13,repaid,2022-06-15 19:31:51.132000+00:00,2022-06-01,6045.28,0.901349,1.013131,1.013131,6045.28,autopilot,1.0
4358,4358,3153,6000.0,6045.28,6458800000,2022-10-02,repaid,2022-07-04 15:32:00.095000+00:00,2022-07-01,6045.28,1.009462,1.009462,1.009462,6045.28,autopilot,1.0


## 6 - Save table

In [22]:
df_feats_loans_repay_hist_in_last_six_months.to_csv(
    "./data/processed/df_loans_repay_history_per_user_in_last_six_months.csv",index=False
)