# Lion`s Den ING Risk Modelling Challenge 2024
## Preliminary Task

> Imagine that you are a credit risk model developer working for a large bank. Recently, it was noticed that the predictive power of current tool, used to estimate the probability of default for customers applying for an Instalment Loan product, has significantly decreased. Your direct manager asked you to create a model which will replace the current one. Your work will be vital to the company since retail term loan portfolio is one of the largest in the institution. Your colleague, Barbara, has already prepared data for you.
(...)

> In this task, your aim is to build* a **logistic regression** and a challenger model that will allow you to precisely quantify probability of default of bank’s clients, namely retail customers applying for term loans, and to present modelling and prediction results.
### Tasks:
- build a logistic regression in Python or Julia
- build a challenger model that will quantify the __probability__ of default of bank's client
- present modelling and prediction results

# Summary of collected data
Available datasets:
- development_sample.csv - train data (50000 records, 36 features), included targets. For model training purposes.
- testing_sample.csv - test data (5000 records, 36 features), also included targets. Just for validation purposes.

## Variables:
- id	Application ID (NUMERICAL)
- customer_id	Customer ID (NUMERICAL)
- application_date	Application date (DATETIME) 
- **target	"Default indicator: 1. loan went into default, 0. facility performing *Missing for rejected applications*** (BINARY)
- application_status	Application status (Approved/Rejected) (BINARY)
- Var1 — Number of applicants (NUMERICAL)
- Var2 — Loan purpose: (CATEGORICAL NOMINAL)
    - 1 - Car Loan
    - 2	- House Renovation
    - 3	- Short Cash
- Var3 — Distribution channel: (CATEGORICAL NOMINAL)
    - 1 - Direct
    - 2 - Broker
    - 3 - Online
- Var4 — Application amount (NUMERICAL)
- Var5 — Credit duration (months) (NUMERICAL)
- Var6 — Payment frequency: (CATEGORICAL ORDINAL)
    - 1	- monthly
    - 3	- quarterly
    - 6	- bi-yearly
- Var7 — Installment amount (NUMERICAL)
- Var8 — Value of the goods (car) (NUMERICAL)
- Var9 — Application data: income of main applicant (NUMERICAL)
- Var10 — Application data: income of second applicant (NUMERICAL)
- Var11 — Application data: profession of main applicant (CATEGORICAL NOMINAL): 
    - 1	- Pensioneer
    - 2	- Government
    - 3	- Military
    - 4	- Self Employed
    - 5	- Employee
    - 6	- Business Owner
    - 7	- Unemployed
- Var12 — Application data: profession of second applicant (CATEGORICAL NOMINAL): 
    - 1	- Pensioneer
    - 2	- Government
    - 3	- Military
    - 4	- Self Employed
    - 5	- Employee
    - 6	- Business Owner
    - 7	- Unemployed
- Var13 — Application data: employment date (main applicant) (DATE)
- Var14 — Application data: marital status of main applicant (CATEGORICAL NOMINAL):
    - 0	- Single
    - 1	- Married
    - 2	- Informal relationship
    - 3	- Divorced
    - 4 - Widowed
- Var15 — Application data: number of children of main applicant (NUMERICAL)
- Var16 — Application data: number of dependences of main applicant (NUMERICAL)
- Var17 — Spendings estimation (NUMERICAL)
- Var18 — Property ownership for property renovation (BINARY)
- Var19 — Clasification of the vehicle (Car, Motorbike)
- Var20 — Number of requests during the last 3 months (External data) (NUMERICAL)
- Var21 — Number of requests during the last 6 months (External data) (NUMERICAL)
- Var22 — Number of requests during the last 9 months (External data) (NUMERICAL)
- Var23 — Number of requests during the last 12 months (External data) (NUMERICAL)
- Var24 — Limit on credit card (NUMERICAL)
- Var25 — Amount on current account (NUMERICAL)
- Var26 — Amount on savings account (NUMERICAL)
- Var27 — Arrear in last 3 months (indicator) 
- Var28 — Arrear in last 12 months (indicator)
- Var29 — Credit bureau score (Exterval data) (NUMERICAL)
- Var30 — Average income (Exterval data) (NUMERICAL)


## Import of datasets:

In [1]:
import pandas as pd

train_data = pd.read_csv('https://files.challengerocket.com/files/lions-den-ing-2024/development_sample.csv')
test_data = pd.read_csv('https://files.challengerocket.com/files/lions-den-ing-2024/testing_sample.csv')


Visualizing 3 rows of train dataset:

In [14]:
train_data.head(3).T

Unnamed: 0,0,1,2
ID,11034977,11034978,11034979
customer_id,32537148,32761663,32701063
application_date,01Feb2010 0:00:00,01Feb2010 0:00:00,01Feb2010 0:00:00
target,0.0,0.0,0.0
Application_status,Approved,Approved,Approved
Var1,1,1,2
Var2,2.0,1.0,3.0
Var3,1,2,1
Var4,7800,11100,2400
Var5,99,78,15


In [15]:
test_data.head(3).T

Unnamed: 0,0,1,2
ID,36034977,36034978,36034979
customer_id,32653719,32832365,32544742
application_date,03Feb2010 0:00:00,04Feb2010 0:00:00,07Feb2010 0:00:00
target,0.0,0.0,0.0
Application_status,Approved,Approved,Approved
Var1,1,2,1
Var2,3.0,2.0,3.0
Var3,1.0,1.0,1.0
Var4,4800,6800,4600
Var5,15,18,18


In [16]:
pd.__version__

'1.5.3'