# 1.0.0 Build inputs from Previous Internal Applications
### - Methodology
This section aims to create informative and meaningful features that capture customers' past interactions with credit products, particularly their Buy Now, Pay Later (BNPL) applications and SF applications. 

Here are some features we explored: 
- Raw Features:
    1. Account to Application Days (previous_internal_apps__account_to_application_days): This directly captures the duration from account creation to loan application, providing insights into the customer’s planning or urgency in financial matters.
    2. Number of Smartphone Financing Applications (previous_internal_apps__n_sf_apps): Reflects the customer's interest in financing options specifically for smartphones, which can be indicative of their spending habits and preferences.
    - Total BNPL Applications and Approvals:
      3. Applications (previous_internal_apps__n_bnpl_apps): Total number of BNPL applications made.
      4. Approvals (previous_internal_apps__n_bnpl_approved_apps): Number of BNPL applications that were approved.
    - Credit Inquiries:
      5. Last 3 Months (previous_internal_apps__n_inquiries_l3m): Inquiries in the last 3 months.
      6. Last 6 Months (previous_internal_apps__n_inquiries_l6m): Inquiries in the last 6 months.

- Derived Features: 
    7. BNPL Approval Ratio (previous_internal_apps__ratio_bnpl_approved): The ratio of approved BNPL applications to the total number of BNPL applications (n_bnpl_approved_apps / n_bnpl_apps).
    8. Days from Last BNPL Application to Loan Application (previous_internal_apps__last_bnpl_app_to_application_days): The number of days between the date of the last BNPL application and the date of the current loan application (application_datetime - last_bnpl_app_date).
    9. Days from First BNPL Application to Loan Application (previous_internal_apps__first_bnpl_app_to_application_days): The number of days between the date of the first BNPL application and the date of the current loan application (application_datetime - first_bnpl_app_date).

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime

### 1.0.0 Loan Data 

In [2]:
DATA_PATH = Path.cwd().parent / "data"
MAIN_DATASET_PATH = DATA_PATH / "raw_data/main_dataset.parquet"
pd.set_option("display.max_columns", None)

In [3]:
df = pd.read_parquet(MAIN_DATASET_PATH)
df["LOAN_ORIGINATION_DATETIME_MONTH"] = df["LOAN_ORIGINATION_DATETIME"].dt.strftime("%Y-%m")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14454 entries, 0 to 14453
Data columns (total 18 columns):
 #   Column                           Non-Null Count  Dtype         
---  ------                           --------------  -----         
 0   customer_id                      14454 non-null  int64         
 1   loan_id                          14454 non-null  int64         
 2   ACC_CREATION_DATETIME            14454 non-null  datetime64[us]
 3   APPLICATION_DATETIME             14454 non-null  datetime64[us]
 4   LOAN_ORIGINATION_DATETIME        14454 non-null  datetime64[us]
 5   max_days_late                    14454 non-null  int64         
 6   target                           14454 non-null  int64         
 7   account_to_application_days      14454 non-null  int64         
 8   n_sf_apps                        6806 non-null   float64       
 9   first_app_date                   6806 non-null   datetime64[us]
 10  last_app_date                    6806 non-null   datetime6

In [4]:
df.describe()

Unnamed: 0,customer_id,loan_id,ACC_CREATION_DATETIME,APPLICATION_DATETIME,LOAN_ORIGINATION_DATETIME,max_days_late,target,account_to_application_days,n_sf_apps,first_app_date,last_app_date,n_bnpl_apps,n_bnpl_approved_apps,first_bnpl_app_date,last_bnpl_app_date,n_inquiries_l3m,n_inquiries_l6m
count,14454.0,14454.0,14454,14454,14454,14454.0,14454.0,14454.0,6806.0,6806,6806,8739.0,8739.0,8739,8739,9083.0,9083.0
mean,7227.5,7227.5,2022-06-17 07:24:49.443337,2022-11-28 03:42:40.896637,2022-12-28 06:04:09.504220,14.225889,0.1868,163.489,1.653982,2022-06-16 04:31:39.970614,2022-06-16 01:42:11.531002,1.221765,0.264904,2022-08-13 09:30:46.852090,2022-08-03 09:11:07.589265,10.350435,17.11483
min,1.0,1.0,2020-10-14 18:22:10,2022-04-26 07:00:00,2022-07-01 09:03:20,-7.0,0.0,0.0,1.0,2021-04-27 00:00:00,2021-04-25 00:00:00,1.0,0.0,2022-01-06 21:17:08.193000,2022-01-06 21:17:08.193000,0.0,0.0
25%,3614.25,3614.25,2022-02-21 18:46:22.250000,2022-09-15 13:00:00,2022-10-27 21:15:58.250000,0.0,0.0,0.0,1.0,2022-02-27 00:00:00,2022-02-25 00:00:00,1.0,0.0,2022-05-01 21:03:56.963500,2022-04-20 05:33:33.586000,0.0,0.0
50%,7227.5,7227.5,2022-07-19 20:29:43.500000,2022-12-20 08:00:00,2023-01-11 10:05:49.500000,2.0,0.0,103.0,1.0,2022-07-15 00:00:00,2022-07-16 00:00:00,1.0,0.0,2022-08-18 13:36:14.271000,2022-07-28 17:37:41.677000,0.0,8.0
75%,10840.75,10840.75,2022-11-13 07:37:39.250000,2023-02-04 08:00:00,2023-03-06 18:07:46.250000,20.0,0.0,271.75,2.0,2022-10-21 00:00:00,2022-10-22 00:00:00,1.0,0.0,2022-11-06 19:24:55.189500,2022-11-06 01:50:47.642000,14.0,26.0
max,14454.0,14454.0,2023-05-19 19:55:04,2023-05-26 07:00:00,2023-05-29 12:18:28,70.0,1.0,901.0,42.0,2023-05-12 00:00:00,2023-05-12 00:00:00,18.0,15.0,2023-05-20 17:15:47,2023-05-17 15:20:48,170.0,213.0
std,4172.654731,4172.654731,,,,21.738445,0.389764,181.110989,1.697131,,,0.831144,0.602481,,,19.694595,23.229088


In [5]:
df[df["LOAN_ORIGINATION_DATETIME"]<df["first_app_date"]]

Unnamed: 0,customer_id,loan_id,ACC_CREATION_DATETIME,APPLICATION_DATETIME,LOAN_ORIGINATION_DATETIME,max_days_late,target,account_to_application_days,n_sf_apps,first_app_date,last_app_date,n_bnpl_apps,n_bnpl_approved_apps,first_bnpl_app_date,last_bnpl_app_date,n_inquiries_l3m,n_inquiries_l6m,LOAN_ORIGINATION_DATETIME_MONTH


In [6]:
df[df["LOAN_ORIGINATION_DATETIME"]<df["last_bnpl_app_date"]]

Unnamed: 0,customer_id,loan_id,ACC_CREATION_DATETIME,APPLICATION_DATETIME,LOAN_ORIGINATION_DATETIME,max_days_late,target,account_to_application_days,n_sf_apps,first_app_date,last_app_date,n_bnpl_apps,n_bnpl_approved_apps,first_bnpl_app_date,last_bnpl_app_date,n_inquiries_l3m,n_inquiries_l6m,LOAN_ORIGINATION_DATETIME_MONTH


### 2. Build features

In [7]:
def build_previous_internal_app_features(df: pd.DataFrame) -> pd.DataFrame:
    """
    Processes a Main Dataset to create features derived from the customer's history within the organization.
    These features include:
        - Ratio of approved BNPL applications to total BNPL applications.
        - Days between the last BNPL application and the credit application date.
        - Days between the first BNPL application and the credit application date.
        - Days from account creation to credit application.
        - Total counts of SF and BNPL applications, including approved BNPL applications.
        - Number of inquiries to credit reports from external entities in the last 3 and 6 months.
   

    Parameters:
    df (pd.DataFrame): A DataFrame containing the main dataset with the customer's history within the organization.

    Returns:
    pd.DataFrame: A DataFrame containing the loan ID and the newly created features prefixed with 'previous_internal_apps__'.
    """
    
    df.columns = [i.lower() for i in df.columns]
    
    df = df.assign(
        previous_internal_apps__ratio_bnpl_approved=(df["n_bnpl_approved_apps"]/df["n_bnpl_apps"]).fillna(0),
        previous_internal_apps__last_bnpl_app_to_application_days=(df["application_datetime"] - df["first_bnpl_app_date"]).dt.days,
        previous_internal_apps__first_bnpl_app_to_application_days=(df["application_datetime"] - df["last_bnpl_app_date"]).dt.days,
        previous_internal_apps__account_to_application_days=df["account_to_application_days"],
        previous_internal_apps__n_sf_apps=df["n_sf_apps"].fillna(0),
        previous_internal_apps__n_bnpl_apps=df["n_bnpl_apps"].fillna(0),
        previous_internal_apps__n_bnpl_approved_apps=df["n_bnpl_approved_apps"].fillna(0),
        previous_internal_apps__n_inquiries_l3m=df["n_inquiries_l3m"].fillna(0),
        previous_internal_apps__n_inquiries_l6m=df["n_inquiries_l6m"].fillna(0),
    )

    features = [i for i in df.columns if "previous_internal_apps__" in i]
    
    return df[["loan_id"] + features]
    
features_df = build_previous_internal_app_features(df)

In [8]:
features_df.columns

Index(['loan_id', 'previous_internal_apps__ratio_bnpl_approved',
       'previous_internal_apps__last_bnpl_app_to_application_days',
       'previous_internal_apps__first_bnpl_app_to_application_days',
       'previous_internal_apps__account_to_application_days',
       'previous_internal_apps__n_sf_apps',
       'previous_internal_apps__n_bnpl_apps',
       'previous_internal_apps__n_bnpl_approved_apps',
       'previous_internal_apps__n_inquiries_l3m',
       'previous_internal_apps__n_inquiries_l6m'],
      dtype='object')

In [9]:
formatted_date = datetime.now().strftime("%Y%m")
features_df.to_pickle(DATA_PATH / f"intermedian/{formatted_date}_previous_internal_apps_features.pickle")