# **Implementation of a scoring model**
## **Notebook 6/6 - Assembling TEST data for the dashboard**

This notebook is organized as follows:

**0. Set up**
- 0.1 Loading libraries and useful functions
- 0.2 Loading the dataset
- 0.3 Separation of the dataset
- 0.3 Model loading
    
**1. Model exploitation: Predictions on new data**
- 1.1 Transformation of data for consumption by the model
- 1.2 Predictions by model
- 1.3 Export of predictions

___
### 0. SETUP

In this first step, the working framework is put in place, that is to say:
- The necessary Python libraries and packages are loaded
- Useful functions are defined
- The dataset is loaded
___

___
#### 0.1 LOADING LIBRARIES AND USEFUL FUNCTIONS

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
import pandas as pd
from joblib import load

In [3]:
from sys import path
path.append("./Resources/functions")

import helper_functions as hf

____
#### 0.2 LOADING THE DATASET

In order to demonstrate the entire model exploitation chain, this part will use the TEST application_test.csv dataset.

Since these data were not used in any of the different stages of developing the model, they make it possible to show the use of the model with new data.

In [4]:
app_test = pd.read_csv("./Resources/datasets/origin/application_test.csv")

bureau_balance = pd.read_csv("./Resources/datasets/origin/bureau_balance.csv")

bureau = pd.read_csv("./Resources/datasets/origin/bureau.csv")

card = pd.read_csv("./Resources/datasets/origin/credit_card_balance.csv")

installments = pd.read_csv("./Resources/datasets/origin/installments_payments.csv")

cash = pd.read_csv("./Resources/datasets/origin/POS_CASH_balance.csv")

prev_app = pd.read_csv("./Resources/datasets/origin/previous_application.csv")

___

#### 0.3 DATASET SEPARATION

The dataset is split into two:
- data for which the credit score has already been calculated, which will be used when the dashboard is started, to be browsed (_browse).
- data for which the credit score will be calculated live by the dashboard (_new).

In [5]:
app_test_browse = app_test[:int(len(app_test)/2)]

bureau_browse = bureau[bureau["SK_ID_CURR"].isin(app_test_browse["SK_ID_CURR"])]
bureau_balance_browse = bureau_balance[bureau_balance["SK_ID_BUREAU"].isin(bureau_browse["SK_ID_BUREAU"])]
card_browse = card[card["SK_ID_CURR"].isin(app_test_browse["SK_ID_CURR"])]
installments_browse = installments[installments["SK_ID_CURR"].isin(app_test_browse["SK_ID_CURR"])]
cash_browse = cash[cash["SK_ID_CURR"].isin(app_test_browse["SK_ID_CURR"])]
prev_app_browse = prev_app[prev_app["SK_ID_CURR"].isin(app_test_browse["SK_ID_CURR"])]

In [6]:
app_test_new = app_test[int(len(app_test)/2):]

bureau_new = bureau[bureau["SK_ID_CURR"].isin(app_test_new["SK_ID_CURR"])]
bureau_balance_new = bureau_balance[bureau_balance["SK_ID_BUREAU"].isin(bureau_new["SK_ID_BUREAU"])]
card_new = card[card["SK_ID_CURR"].isin(app_test_new["SK_ID_CURR"])]
installments_new = installments[installments["SK_ID_CURR"].isin(app_test_new["SK_ID_CURR"])]
cash_new = cash[cash["SK_ID_CURR"].isin(app_test_new["SK_ID_CURR"])]
prev_app_new = prev_app[prev_app["SK_ID_CURR"].isin(app_test_new["SK_ID_CURR"])]

___
#### 0.4 LOADING THE MODEL

In [7]:
model = load('../lgbm_trained_model_whole_dataset.joblib')

___
### 1. EXPLOITATION OF THE MODEL: PREDICTIONS ON NEW DATA

___
#### 1.1 TRANSFORMATION OF DATA FOR CONSUMPTION BY THE MODEL

In [8]:
model_ready_df = hf.transform_data(app_test_browse, 
                                   bureau_browse,                                   
                                   bureau_balance_browse,
                                   card_browse, 
                                   cash_browse, 
                                   installments_browse,
                                   prev_app_browse)

model_ready_df.shape

Original Memory Usage: 0.02 gb.
New Memory Usage: 0.01 gb.
Original Memory Usage: 0.15 gb.
New Memory Usage: 0.08 gb.
There are 0 columns with greater than 90% missing values.
Original Memory Usage: 0.04 gb.
New Memory Usage: 0.01 gb.
There are 6 columns with greater than 90% missing values.
Original Memory Usage: 0.05 gb.
New Memory Usage: 0.03 gb.
There are 0 columns with greater than 90% missing values.
Original Memory Usage: 0.06 gb.
New Memory Usage: 0.03 gb.
There are 0 columns with greater than 90% missing values.
Original Memory Usage: 0.07 gb.
New Memory Usage: 0.04 gb.
There are 0 columns with greater than 90% missing values.


(24372, 141)

___
#### 1.2 PREDICTIONS BY THE MODEL

In [9]:
# Predictions on the test data
credit_score_predictions = model.predict_proba(model_ready_df)[:, 1]

In [10]:
predictions_df = pd.DataFrame(data=credit_score_predictions*100)\
                   .rename(columns={0:"Credit Score"})

In [11]:
predictions_df = pd.concat([app_test_browse["SK_ID_CURR"], predictions_df], axis=1)

___
#### 1.3 EXPORT OF DATA AND PREDICTIONS

In [12]:
# Exporting predictions
predictions_df.to_csv("./Resources/datasets/assembled/dashboard/browse/predictions_test.csv")

In [13]:
# Removing anomaly from already processed data
app_test_browse['DAYS_EMPLOYED'].replace({365243: np.nan}, inplace = True)

# Exporting browse data
app_test_browse.to_csv("./Resources/datasets/assembled/dashboard/browse/application_test.csv")
bureau_browse.to_csv("./Resources/datasets/assembled/dashboard/browse/bureau.csv")
bureau_balance_browse.to_csv("./Resources/datasets/assembled/dashboard/browse/bureau_balance.csv")
card_browse.to_csv("./Resources/datasets/assembled/dashboard/browse/card.csv")
installments_browse.to_csv("./Resources/datasets/assembled/dashboard/browse/installments.csv")
cash_browse.to_csv("./Resources/datasets/assembled/dashboard/browse/cash.csv")
prev_app_browse.to_csv("./Resources/datasets/assembled/dashboard/browse/prev_app.csv")

In [14]:
# Exporting new data
app_test_new.to_csv("./Resources/datasets/assembled/dashboard/new/application_test.csv")
bureau_new.to_csv("./Resources/datasets/assembled/dashboard/new/bureau.csv")
bureau_balance_new.to_csv("./Resources/datasets/assembled/dashboard/new/bureau_balance.csv")
card_new.to_csv("./Resources/datasets/assembled/dashboard/new/card.csv")
installments_new.to_csv("./Resources/datasets/assembled/dashboard/new/installments.csv")
cash_new.to_csv("./Resources/datasets/assembled/dashboard/new/cash.csv")
prev_app_new.to_csv("./Resources/datasets/assembled/dashboard/new/prev_app.csv")