# <center> African Credit Scoring Challenge</center>

<p align="center">
    <img src="https://www.digest.tz/wp-content/uploads/2024/11/dfeb749b-962d-4e6c-8fb0-33f54ceaffd5_880x660.webp" width="50%" height="50%">
</p>

<p style="text-align:justify">Financial institutions need to predict loan defaults to mitigate risk and optimise lending decisions. In Africa’s rapidly growing financial markets, with diverse customer demographics and dynamic economic conditions, accurately assessing default risk is more important than ever.

In this challenge, we want you to develop a robust, generalisable machine learning model to predict the likelihood of loan defaults for both existing customers and new applicants. Beyond accurate predictions, we encourage yous to innovate by incorporating unique factors relevant to each financial market.

The objective of this challenge is to develop a machine learning model that accurately predicts the probability of loan default.

The top 10 winners, in addition to submitting their solution, will need to design and submit a credit scoring function, using their model's outputs and probabilities. This step involves binning model outputs into risk categories and proposing a scalable credit risk score.

By accurately predicting loan defaults, your work will enable the client organisation to create a credit scoring solution to evaluate risk more effectively, improving decisionmaking, reducing financial losses associated with high-risk lending, and allowing for expansion into new financial markets.

The challenge provider is a private asset manager that operates in several financial markets across Africa.
</p>

In [1]:
import os, random, sys, time, warnings 
warnings.filterwarnings('ignore')
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import sklearn

from sklearn.model_selection import train_test_split, StratifiedKFold

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from rgf.sklearn import RGFClassifier
from sklearn.ensemble import GradientBoostingClassifier, AdaBoostClassifier, ExtraTreesClassifier, BaggingClassifier, VotingClassifier, StackingClassifier
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, roc_auc_score, confusion_matrix
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS
from sklearn.linear_model import RidgeClassifier

In [2]:
seed = 2022
np.random.seed(seed)
random.seed(seed)

In [3]:
# Loading the train dataset
path="../Data/"
train = pd.read_csv(path+'Train.csv')
test = pd.read_csv(path+'Test.csv')
economic_indicators = pd.read_csv(path+'economic_indicators.csv')
# Display the first few rows of the datasets and their shape
display("Train", train.head(), train.shape, "Test", test.head(), test.shape, "Economic Indicators", economic_indicators.head(), economic_indicators.shape)

'Train'

Unnamed: 0,ID,customer_id,country_id,tbl_loan_id,lender_id,loan_type,Total_Amount,Total_Amount_to_Repay,disbursement_date,due_date,duration,New_versus_Repeat,Amount_Funded_By_Lender,Lender_portion_Funded,Lender_portion_to_be_repaid,target
0,ID_266671248032267278,266671,Kenya,248032,267278,Type_1,8448.0,8448.0,2022-08-30,2022-09-06,7,Repeat Loan,120.85,0.014305,121.0,0
1,ID_248919228515267278,248919,Kenya,228515,267278,Type_1,25895.0,25979.0,2022-07-30,2022-08-06,7,Repeat Loan,7768.5,0.3,7794.0,0
2,ID_308486370501251804,308486,Kenya,370501,251804,Type_7,6900.0,7142.0,2024-09-06,2024-09-13,7,Repeat Loan,1380.0,0.2,1428.0,0
3,ID_266004285009267278,266004,Kenya,285009,267278,Type_1,8958.0,9233.0,2022-10-20,2022-10-27,7,Repeat Loan,2687.4,0.3,2770.0,0
4,ID_253803305312267278,253803,Kenya,305312,267278,Type_1,4564.0,4728.0,2022-11-28,2022-12-05,7,Repeat Loan,1369.2,0.3,1418.0,0


(68654, 16)

'Test'

Unnamed: 0,ID,customer_id,country_id,tbl_loan_id,lender_id,loan_type,Total_Amount,Total_Amount_to_Repay,disbursement_date,due_date,duration,New_versus_Repeat,Amount_Funded_By_Lender,Lender_portion_Funded,Lender_portion_to_be_repaid
0,ID_269404226088267278,269404,Kenya,226088,267278,Type_1,1919.0,1989.0,2022-07-27,2022-08-03,7,Repeat Loan,575.7,0.3,597.0
1,ID_255356300042267278,255356,Kenya,300042,267278,Type_1,2138.0,2153.0,2022-11-16,2022-11-23,7,Repeat Loan,0.0,0.0,0.0
2,ID_257026243764267278,257026,Kenya,243764,267278,Type_1,8254.0,8304.0,2022-08-24,2022-08-31,7,Repeat Loan,207.0,0.025079,208.0
3,ID_264617299409267278,264617,Kenya,299409,267278,Type_1,3379.0,3379.0,2022-11-15,2022-11-22,7,Repeat Loan,1013.7,0.3,1014.0
4,ID_247613296713267278,247613,Kenya,296713,267278,Type_1,120.0,120.0,2022-11-10,2022-11-17,7,Repeat Loan,36.0,0.3,36.0


(18594, 15)

'Economic Indicators'

Unnamed: 0,Country,Indicator,YR2001,YR2002,YR2003,YR2004,YR2005,YR2006,YR2007,YR2008,...,YR2014,YR2015,YR2016,YR2017,YR2018,YR2019,YR2020,YR2021,YR2022,YR2023
0,Ghana,"Inflation, consumer prices (annual %)",41.509496,9.360932,29.77298,18.042739,15.438992,11.679184,10.734267,16.49464,...,15.489616,17.14997,17.454635,12.371922,7.808765,7.14364,9.88729,9.971089,31.255895,38.106966
1,Cote d'Ivoire,"Inflation, consumer prices (annual %)",4.361529,3.077265,3.296807,1.457988,3.88583,2.467191,1.892006,6.308528,...,0.448682,1.2515,0.723178,0.685881,0.359409,-1.106863,2.425007,4.091952,5.276167,4.387117
2,Kenya,"Inflation, consumer prices (annual %)",5.738598,1.961308,9.815691,11.624036,10.312778,14.453734,9.75888,26.239817,...,6.878155,6.582154,6.29725,8.00565,4.689806,5.239638,5.405162,6.107936,7.659863,7.671396
3,Ghana,"Official exchange rate (LCU per US$, period av...",0.716305,0.792417,0.866764,0.899495,0.905209,0.915107,0.932619,1.052275,...,2.896575,3.714642,3.909817,4.350533,4.585325,5.217367,5.595708,5.8057,8.2724,11.020408
4,Cote d'Ivoire,"Official exchange rate (LCU per US$, period av...",732.397693,693.713226,579.897426,527.338032,527.258363,522.425625,478.633718,446.000041,...,493.75733,591.211698,592.605615,580.65675,555.446458,585.911013,575.586005,554.530675,623.759701,606.56975


(27, 25)

## Due date years

In [6]:
pd.to_datetime(train["disbursement_date"]).dt.year.value_counts()

disbursement_date
2022    64405
2024     2970
2023     1240
2021       39
Name: count, dtype: int64

In [15]:
(pd.to_datetime(train["due_date"]) - pd.to_datetime(train["disbursement_date"])).dt.days.value_counts()

7      64973
14      1567
30       958
90       249
60       201
       ...  
912        1
366        1
12         1
273        1
243        1
Name: count, Length: 64, dtype: int64