# Overview of the Competition

The goal of this competition is to predict which clients are more likely to default on their loans. The evaluation will favor solutions that are stable over time. A default occurs when a borrower stops making required payments on a debt.

## 1. Description

The absence of a credit history might mean a lot of things, including young age or a preference for cash. Without traditional data, someone with little to no credit history is likely to be denied. Consumer finance providers must accurately determine which clients can repay a loan and which cannot and data is key. If data science could help better predict one’s repayment capabilities, loans might become more accessible to those who may benefit from them the most.
Currently, consumer finance providers use various statistical and machine learning methods to predict loan risk. These models are generally called scorecards. In the real world, clients' behaviors change constantly, so every scorecard must be updated regularly, which takes time. The scorecard's stability in the future is critical, as a sudden drop in performance means that loans will be issued to worse clients on average. The core of the issue is that loan providers aren't able to spot potential problems any sooner than the first due dates of those loans are observable. Given the time it takes to redevelop, validate, and implement the scorecard, stability is highly desirable. There is a trade-off between the stability of the model and its performance, and a balance must be reached before deployment.
Founded in 1997, competition host Home Credit is an international consumer finance provider focusing on responsible lending primarily to people with little or no credit history. Home Credit broadens financial inclusion for the unbanked population by creating a positive and safe borrowing experience. We previously ran a competition with Kaggle that you can see here.
Your work in helping to assess potential clients' default risks will enable consumer finance providers to accept more loan applications. This may improve the lives of people who have historically been denied due to lack of credit history.

## 2. Evaluation

Submissions are evaluated using a gini stability metric, where the gini score is calculated as $\text{gini} = 2 \cdot \text{AUC} - 1$ for predictions corresponding to each WEEK_NUM.


\$$\text{gini} = 2 \cdot \text{AUC} - 1$$


A linear regression, 𝑎⋅𝑥+𝑏, is fit through the weekly gini scores, and a falling_rate is calculated as min(0,𝑎). This is used to penalize models that drop off in predictive ability.

Finally, the variability of the predictions are calculated by taking the standard deviation of the residuals from the above linear regression, applying a penalty to model variablity.

The final metric is calculated as:

$$\text{stability metric} = \text{mean}(\text{gini}) + 88.0 \cdot \min(0,a) - 0.5 \cdot \text{std}(\text{residuals})$$


## 3. Submission File

For each case_id in the test set, you must predict a probability for the target score. Submission file must be named submission.csv. The file should contain a header and have the following format:

case_id,score:

57543, 0.1

57544, 0.9

57545, 0.5

etc.

# Problem Approach

# 1. Understanding the dataset (27GB)

## base files:
Base tables store the basic information about the observation and case_id. This is a unique identification of every observation and you need to use it to join the other tables to base tables.
### train_base:
case_id: unique case number. You'll need this ID to join relevant tables to the base tab

date_decision: This refers to the date when a decision was made regarding the approval of the loan.
___
WEEK_NUM: This is the week number used for aggregation. In the test sample, WEEK_NUM continues sequentially from the last training value of WEEK_NUM.

Purpose: It's used for aggregating or analyzing data on a weekly basis. For example, you could sum sales figures, average temperatures, or count incidents week by week using this field.
    
Sequential Continuation: The mention of "continues sequentially from the last training value of.
WEEK_NUM implies that if you're splitting your dataset into training and test samples (a common practice in machine learning), the week numbering doesn't restart at the beginning of the test sample. Instead, it continues from where the training sample left off. This ensures consistency in week numbering across the dataset and avoids confusion in temporal analysis.
___

MONTH: This column represents the month and is intended for aggregation purposes.

Purpose: Similar to WEEK_NUM, but for monthly aggregation. It facilitates the analysis of data by month, allowing for the examination of trends, patterns, or anomalies on a month-by-month basis. This can be particularly useful for identifying seasonal effects, planning, budgeting, or comparing month-over-month changes.
___

target: This is the target value, determined after a certain period based on whether or not the client defaulted on the specific credit case (loan).


### train_applprev_1.0 important features
annuity_853A,Monthly annuity(repayment per month) for previous applications.

num_group1: This is an indexing column used for the historical records of case_id in both depth=1 and depth=2 tables.

byoccupationinc_3656910L,Applicant's income from previous applications.


### train_applprev_1.1

actualdpd_943P: Days Past Due (DPD) of previous contract (actual).

num_group2: This is the second indexing column for depth=2 tables' historical records of case_id. The order of num_group1 and num_group2 is important and will be clarified in feature definitions.

### train_applprev_2


In [21]:
import pandas as pd
import polars as pl
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score 

datapath_train = '/Users/dino.dervisevic/Desktop/Portfolio/Kaggle_Comp/Credit_Home_Risk_Prague/csv_files/train'


In [22]:
train_data_applprev_1 = pl.read_csv(f'{datapath_train}/train_applprev_1_0.csv')
train_data_applprev_2 = pl.read_csv(f'{datapath_train}/train_applprev_1_1.csv')
train_data_applprev_3 = pl.read_csv(f'{datapath_train}/train_applprev_2.csv')

In [31]:
train_data_applprev_1

case_id,actualdpd_943P,annuity_853A,approvaldate_319D,byoccupationinc_3656910L,cancelreason_3545846M,childnum_21L,creationdate_885D,credacc_actualbalance_314A,credacc_credlmt_575A,credacc_maxhisbal_375A,credacc_minhisbal_90A,credacc_status_367L,credacc_transactions_402L,credamount_590A,credtype_587L,currdebt_94A,dateactivated_425D,district_544M,downpmt_134A,dtlastpmt_581D,dtlastpmtallstes_3545839D,education_1138M,employedfrom_700D,familystate_726L,firstnonzeroinstldate_307D,inittransactioncode_279L,isbidproduct_390L,isdebitcard_527L,mainoccupationinc_437A,maxdpdtolerance_577P,num_group1,outstandingdebt_522A,pmtnum_8L,postype_4733339M,profession_152M,rejectreason_755M,rejectreasonclient_4145042M,revolvingaccount_394A,status_219L,tenor_203L
i64,f64,f64,str,f64,str,f64,str,f64,f64,f64,f64,str,f64,f64,str,f64,str,str,f64,str,str,str,str,str,str,str,bool,bool,f64,f64,i64,f64,f64,str,str,str,str,f64,str,f64
2,0.0,640.2,,,"""a55475b1""",0.0,"""2013-04-03""",,0.0,,,,,10000.0,"""CAL""",,,"""P136_108_173""",0.0,,,"""P97_36_170""","""2010-02-15""","""SINGLE""","""2013-05-04""","""CASH""",false,,8200.0,,0,,24.0,"""a55475b1""","""a55475b1""","""a55475b1""","""a55475b1""",,"""D""",24.0
2,0.0,1682.4,,,"""a55475b1""",0.0,"""2013-04-03""",,0.0,,,,,16000.0,"""CAL""",,,"""P136_108_173""",0.0,,,"""P97_36_170""","""2010-02-15""","""SINGLE""","""2013-05-04""","""CASH""",false,,8200.0,,1,,12.0,"""a55475b1""","""a55475b1""","""a55475b1""","""a55475b1""",,"""D""",12.0
3,0.0,6140.0,,,"""P94_109_143""",,"""2019-01-07""",,0.0,,,,,59999.8,"""CAL""",,,"""P131_33_167""",0.0,,,"""P97_36_170""","""2018-05-15""","""MARRIED""","""2019-02-07""","""CASH""",false,,11000.0,,0,,12.0,"""a55475b1""","""a55475b1""","""P94_109_143""","""a55475b1""",,"""D""",12.0
4,0.0,2556.6,,,"""P24_27_36""",,"""2019-01-08""",,0.0,,,,,40000.0,"""CAL""",,,"""P194_82_174""",0.0,,,"""a55475b1""",,,"""2019-02-08""","""CASH""",false,,16000.0,,0,,24.0,"""a55475b1""","""a55475b1""","""a55475b1""","""a55475b1""",,"""T""",24.0
5,0.0,,,,"""P85_114_140""",,"""2019-01-16""",,,,,,,,,,,"""P54_133_26""",,,,"""a55475b1""",,,,,false,,62000.0,,0,,,"""a55475b1""","""a55475b1""","""a55475b1""","""a55475b1""",,"""T""",
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
2651092,0.0,3000.0,"""2019-12-30""",,"""a55475b1""",,"""2019-12-30""",53300.0,53300.0,0.0,0.0,"""CL""",0.0,53998.0,"""COL""",53998.0,,"""P147_21_170""",0.0,,,"""a55475b1""",,,"""2020-01-30""","""POS""",false,,60000.0,,0,53998.0,18.0,"""P177_117_192""","""a55475b1""","""a55475b1""","""a55475b1""",,"""N""",18.0
2651092,0.0,3119.2,"""2012-12-12""",25000.0,"""a55475b1""",1.0,"""2012-12-12""",,0.0,,,,,25740.0,"""COL""",0.0,"""2012-12-20""","""P147_21_170""",0.0,,,"""P97_36_170""","""2004-10-15""","""MARRIED""","""2013-01-11""","""POS""",false,,10000.0,0.0,7,0.0,12.0,"""P149_40_170""","""a55475b1""","""a55475b1""","""a55475b1""",,"""K""",12.0
2651092,0.0,4366.0,"""2017-11-09""",,"""a55475b1""",,"""2017-11-09""",,0.0,,,,,19638.0,"""COL""",0.0,"""2017-11-15""","""P147_21_170""",0.0,"""2018-04-03""","""2018-04-03""","""a55475b1""",,"""LIVING_WITH_PA…","""2017-12-10""","""POS""",false,,40000.0,0.0,4,0.0,5.0,"""P60_146_156""","""a55475b1""","""a55475b1""","""a55475b1""",,"""K""",5.0
2651092,0.0,4496.6,"""2019-02-17""",,"""a55475b1""",,"""2019-02-17""",,0.0,,,,,20840.0,"""COL""",0.0,"""2019-02-18""","""P147_21_170""",0.0,"""2019-07-31""","""2019-07-31""","""a55475b1""",,,"""2019-03-17""","""POS""",false,,60000.0,0.0,2,0.0,6.0,"""P149_40_170""","""a55475b1""","""a55475b1""","""a55475b1""",,"""K""",6.0


In [30]:
train_data_applprev_2

case_id,actualdpd_943P,annuity_853A,approvaldate_319D,byoccupationinc_3656910L,cancelreason_3545846M,childnum_21L,creationdate_885D,credacc_actualbalance_314A,credacc_credlmt_575A,credacc_maxhisbal_375A,credacc_minhisbal_90A,credacc_status_367L,credacc_transactions_402L,credamount_590A,credtype_587L,currdebt_94A,dateactivated_425D,district_544M,downpmt_134A,dtlastpmt_581D,dtlastpmtallstes_3545839D,education_1138M,employedfrom_700D,familystate_726L,firstnonzeroinstldate_307D,inittransactioncode_279L,isbidproduct_390L,isdebitcard_527L,mainoccupationinc_437A,maxdpdtolerance_577P,num_group1,outstandingdebt_522A,pmtnum_8L,postype_4733339M,profession_152M,rejectreason_755M,rejectreasonclient_4145042M,revolvingaccount_394A,status_219L,tenor_203L
i64,f64,f64,str,f64,str,f64,str,f64,f64,f64,f64,str,f64,f64,str,f64,str,str,f64,str,str,str,str,str,str,str,bool,bool,f64,f64,i64,f64,f64,str,str,str,str,f64,str,f64
40704,0.0,7204.6,,,"""P94_109_143""",,"""2018-11-20""",,0.0,,,,,54000.0,"""CAL""",,,"""P147_6_101""",0.0,,,"""a55475b1""",,,"""2018-12-20""","""CASH""",false,,40000.0,,0,,12.0,"""P46_145_78""","""a55475b1""","""P198_131_9""","""P94_109_143""",,"""D""",12.0
40734,0.0,3870.2,,,"""P94_109_143""",,"""2019-12-26""",,0.0,,,,,50000.0,"""CAL""",,,"""P111_148_100""",0.0,,,"""a55475b1""",,,"""2020-01-26""","""CASH""",false,,50000.0,,0,,18.0,"""P149_40_170""","""a55475b1""","""P45_84_106""","""P94_109_143""",,"""D""",18.0
40737,0.0,2324.4001,,1.0,"""a55475b1""",0.0,"""2014-07-17""",,0.0,,,,,30000.0,"""CAL""",0.0,,"""a55475b1""",0.0,,,"""P97_36_170""","""2014-01-15""","""MARRIED""","""2014-08-17""","""CASH""",false,,16000.0,,0,0.0,18.0,"""P46_145_78""","""a55475b1""","""a55475b1""","""a55475b1""",,"""D""",18.0
40791,0.0,2320.8,,1.0,"""a55475b1""",0.0,"""2014-12-28""",,0.0,,,,,27830.0,"""COL""",0.0,,"""a55475b1""",0.0,,,"""P97_36_170""","""2013-04-15""","""SINGLE""","""2015-01-28""","""POS""",false,,16000.0,,1,0.0,12.0,"""P60_146_156""","""a55475b1""","""a55475b1""","""a55475b1""",,"""D""",12.0
40791,0.0,2541.2,,1.0,"""a55475b1""",0.0,"""2014-12-28""",,0.0,,,,,58239.8,"""COL""",0.0,,"""a55475b1""",0.0,,,"""P97_36_170""","""2013-04-15""","""SINGLE""","""2015-01-28""","""POS""",false,,22000.0,,2,0.0,24.0,"""P177_117_192""","""a55475b1""","""a55475b1""","""a55475b1""",,"""D""",24.0
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
2703453,0.0,2827.2,"""2019-12-18""",,"""a55475b1""",,"""2019-12-18""",,0.0,,,,,40000.0,"""CAL""",34550.855,"""2019-12-23""","""P123_6_84""",0.0,,"""2020-10-09""","""a55475b1""",,,"""2020-01-18""","""CASH""",false,,50000.0,0.0,1,46806.6,30.0,"""P46_145_78""","""a55475b1""","""a55475b1""","""a55475b1""",,"""A""",30.0
2703453,0.0,3197.6,"""2014-08-15""",33059.0,"""a55475b1""",0.0,"""2014-08-15""",179.424,0.0,398.02402,198.024,"""AC""",14.0,60000.0,"""CAL""",0.0,"""2014-08-15""","""P123_6_84""",0.0,"""2018-08-07""","""2018-08-07""","""P97_36_170""",,"""MARRIED""","""2014-09-15""","""CASH""",false,,28000.0,33.0,5,0.0,48.0,"""P177_117_192""","""a55475b1""","""a55475b1""","""a55475b1""",,"""K""",48.0
2703453,0.0,5981.4,"""2018-11-14""",,"""a55475b1""",,"""2018-11-14""",,0.0,,,,,123800.0,"""CAL""",0.0,"""2018-11-15""","""P123_6_84""",0.0,"""2019-12-17""","""2019-12-17""","""a55475b1""",,,"""2018-12-15""","""CASH""",false,,76000.0,0.0,2,0.0,30.0,"""P177_117_192""","""a55475b1""","""a55475b1""","""a55475b1""",,"""K""",30.0
2703454,0.0,2986.8,"""2020-06-21""",,"""a55475b1""",,"""2020-06-21""",,0.0,,,,,15998.0,"""COL""",5631.406,"""2020-06-23""","""P48_127_19""",0.0,,"""2020-10-19""","""a55475b1""",,,"""2020-07-21""","""POS""",false,,24000.0,0.0,0,5919.2,6.0,"""P177_117_192""","""a55475b1""","""a55475b1""","""a55475b1""",,"""A""",6.0


In [28]:
train_data_applprev_3

case_id,cacccardblochreas_147M,conts_type_509L,credacc_cards_status_52L,num_group1,num_group2
i64,str,str,str,i64,i64
2,,"""PRIMARY_MOBILE…",,0,0
2,,"""EMPLOYMENT_PHO…",,0,1
2,,"""PRIMARY_MOBILE…",,1,0
2,,"""EMPLOYMENT_PHO…",,1,1
3,,"""PHONE""",,0,0
…,…,…,…,…,…
2703454,"""a55475b1""",,,0,1
2703454,"""a55475b1""","""PRIMARY_MOBILE…",,1,0
2703454,"""a55475b1""","""HOME_PHONE""",,1,1
2703454,"""a55475b1""",,,1,2
