#  <span style='color:DarkBlue'>P7 - Implémentez un modèle de scoring</span>

<div style="text-align:center">
    <img src="images/logo_proj7_credit.png" width="50%">
</div>

 # <span class='bg-primary'>P7_06 - DASHBOARD</span>

Ce notebook traite de la préparation des différents champs du jeu de test pour affichage dans le dashboard.

En particulier :
- **jeu de test** : récupération des champs à afficher dans le dashboard sous une forme compréhensible (ex : âge en années plutôt qu'en nombre de jour, sexe en Féminin/Masculin plutôt que 0/1).
    - situer le patient parmi les défaillants ou non défaillants sur les variables âge, sexe, catégorie socio-professionnelle, revenus, montant du crédit, durée du crédit...

## <span style='background: PowderBlue'>1. Introduction</span>

*****
**Mission**
*****
**Développer un modèle de scoring de la probabilité de défaut de paiement du client** pour étayer la décision d'accorder ou non un prêt à un client potentiel en s’appuyant sur des sources de données variées (données comportementales, données provenant d'autres institutions financières...).

*****
**Objectifs**
*****
- analyser le jeu de données,
- Construire **un modèle de scoring** qui donnera une prédiction sur la probabilité de faillite d'un client de façon automatique,
- réaliser un **dashboard interactif** permettant aux chargés de clientèles d'expliquer de manière transparente la décision d'octroi ou non de crédit


In [1]:
# Chargement des librairies
import datetime
import sys
import time
from datetime import datetime
import warnings
import numpy as np
import pandas as pd
import pickle
from joblib import dump, load
from collections import Counter

# Librairies personnelles
import fonctions_data
# Visualisation
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
import seaborn as sns
import plotly.graph_objects as go


# Interprétation
import shap

# Warnings
import warnings
from warnings import simplefilter
simplefilter(action='ignore', category=FutureWarning)

%matplotlib inline
# Chargement à chaud des librairies personnelles
%load_ext autoreload
%autoreload 2
# Valide code pep8
%load_ext pycodestyle_magic
# %pycodestyle_on
# %pycodestyle_off

# Versions
print('Version des librairies utilisées :')
#print('jyquickhelper         : ' + jyquickhelper.__version__)
print('Python                : ' + sys.version)
print('NumPy                 : ' + np.version.full_version)
print('Pandas                : ' + pd.__version__)
print('Matplotlib            : ' + matplotlib.__version__)
print('Seaborn               : ' + sns.__version__)
#print('Scikit-learn          : ' + sklearn.__version__)
now = datetime.now().isoformat()
print('Lancé le           : ' + now)

Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)


Version des librairies utilisées :
Python                : 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)]
NumPy                 : 1.24.4
Pandas                : 1.5.3
Matplotlib            : 3.7.5
Seaborn               : 0.13.2
Lancé le           : 2024-06-02T18:22:07.410458


## <span style='background:PowderBlue'>2. Initialisation (test/meilleur modèle)</span>

#### <span style='background:mistyrose'>**Chargement du jeu de données du train set sans pré-processing**</span>

In [2]:
application_train = pd.read_csv('home_credit_default_risk/application_train.csv')
application_train.shape
#OneDrive/Bureau/P7_Modelisation_risque_defaut_credit/home_credit_default_risk/application_train.csv

(307511, 122)

In [3]:
# Correction valeurs aberrantes
application_train['DAYS_EMPLOYED'][application_train['DAYS_EMPLOYED']
                                   == 365243] = 0


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [4]:
# Correction : difficile d'imputer le sexe par le mode de cette catégorie
# Comme il n'y a que 4 clients avec un sexe non renseigné, on supprime ces
# valeurs
application_train = \
    application_train[application_train['CODE_GENDER'] != 'XNA']
application_train.shape

(307507, 122)

#### <span style='background:mistyrose'>**Chargement du jeu de données du test set sans pré-processing**</span>

In [5]:
application_test = pd.read_csv('home_credit_default_risk/application_test.csv')
application_test.shape

(48744, 121)

In [6]:
# Correction valeurs aberrantes
application_test['DAYS_EMPLOYED'][application_test['DAYS_EMPLOYED']
                                  == 365243] = 0


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


#### <span style='background:mistyrose'>**Chargement du jeu de données du train set issu du pré-processing et de la feature sélection**</span>

- **train_set** : jeu de données étiquetté (variable cible = TARGET) du train set, issu de la phase de pré-processing (nettoyage, gestion des valeurs manquantes, gestion des valeurs aberrantes, encodage, merge de tous les fichiers, feature engineering et feature sélection).

In [7]:
fic_sav_train_set_sans = \
    'pickle_files/train_set.pickle'
# Chargement de train_set
with open(fic_sav_train_set_sans, 'rb') as df_appli_train_set_sans:
    train_set = pickle.load(df_appli_train_set_sans)
train_set.shape

(307507, 218)

#### <span style='background:mistyrose'>**Chargement du jeu de données du test set issu du pré-processing et de la feature sélection**</span>

- **test_set** : jeu de données non étiquetté (sans la variable cible TARGET) du test set, issu de la phase de pré-processing (nettoyage, gestion des valeurs manquantes, gestion des valeurs aberrantes, encodage, merge de tous les fichiers, feature engineering et feature sélection).

In [8]:
fic_sav_test_set_sans = \
    'pickle_files/test_set.pickle'
# Chargement de test_set
with open(fic_sav_test_set_sans, 'rb') as df_appli_test_set_sans:
    test_set = pickle.load(df_appli_test_set_sans)
test_set.shape

(48744, 217)

#### <span style='background:mistyrose'>**Chargement du meilleur modèle LightGBM**</span>

In [9]:
# Chargement du meilleur modèle
fic_best_model = 'pickle_files/best_model.pickle'
with open(fic_best_model, 'rb') as df_best_model:
    best_model = pickle.load(df_best_model)
best_model

In [10]:
filename =  'pickle_files/best_model.pickle'
outfile = open(filename,'wb')
pickle.dump(best_model, outfile)
outfile.close()

## <span style='background:PowderBlue'>3. Scoring (prédictions/Probabilités)</span>

In [64]:
# Copie du jeu de test
test = test_set.copy(deep=True)
# Sauvegarde des identifiants des clients inutiles aux prédictions
id_client = test['SK_ID_CURR']
# Préparation de la matrice à soumettre aux prédictions 
X_test = test.drop('SK_ID_CURR', axis=1)

In [65]:
# Score des prédictions de probabiltés
y_proba = best_model.predict_proba(X_test)[:, 1]

In [66]:
# Prédictions pour le client :
# classe 0 : non défaillant, classe 1 : défaillant
y_pred = best_model.predict(X_test)

In [67]:
# Constitution du dataframe pour merge
df_score = pd.DataFrame({'SK_ID_CURR' : id_client,
                         'PRED_CLASSE_CLIENT' : y_pred,
                         'SCORE_CLIENT' : y_proba,
                         'SCORE_CLIENT_%' : np.round(y_proba * 100, 1)})
df_score.head(3)

Unnamed: 0,SK_ID_CURR,PRED_CLASSE_CLIENT,SCORE_CLIENT,SCORE_CLIENT_%
0,100001,0,0.403917,40.4
1,100005,0,0.476231,47.6
2,100013,0,0.298046,29.8


In [68]:
# Ajout des prédictions au dataframe du jeu de test
df_dashboard = df_score.merge(test_set, on='SK_ID_CURR', how='left')
df_dashboard.head(3)

Unnamed: 0,SK_ID_CURR,PRED_CLASSE_CLIENT,SCORE_CLIENT,SCORE_CLIENT_%,CREDIT_TO_ANNUITY_RATIO,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,INSTAL_AMT_PAYMENT_DIFF_MIN,POS_CNT_INSTALMENT_MIN,PREV_WEEKDAY_APPR_PROCESS_START_SUNDAY_MEAN,...,PREV_NAME_YIELD_GROUP_MEAN,CC_CNT_DRAWINGS_ATM_CURRENT_MEAN,POS_LOAN_COMPLETED_MEAN,PREV_AMT_DECLINED_SUM,INSTAL_AMT_INSTALMENT_MIN,INSTAL_AMT_PAYMENT_RATIO_MEAN,CC_CNT_DRAWINGS_CURRENT_VAR,PREV_LOAN_RATE_MEAN,PREV_DAYS_DECISION_MEAN,EXT_SOURCE_2
0,100001,0,0.403917,40.4,0.4023,0.0,0.0,0.0,4.0,0.0,...,4.0,0.2106,0.45,1048.5,3951.0,1.0,0.3333,0.1661,-1740.0,0.924
1,100005,0,0.476231,47.6,-0.0482,0.0,0.0,0.0,9.0,0.0,...,2.0,0.2106,0.0909,4464.0,4813.2,1.0,0.3333,0.1199,-536.0,0.341
2,100013,0,0.298046,29.8,2.422,0.0,0.003832,0.0,6.0,0.0,...,2.5,0.2556,0.2893,-61051.5,67.5,0.935484,1.321,0.1257,-837.5,0.8184


In [69]:
# Remplacement des dates de naissances, et de durée d'emploi bruts initiaux
df_dashboard['DAYS_BIRTH'] = application_test['DAYS_BIRTH']
df_dashboard['DAYS_EMPLOYED'] = application_test['DAYS_EMPLOYED']
df_dashboard['AMT_CREDIT'] = application_test['AMT_CREDIT']
df_dashboard['AMT_ANNUITY'] = application_test['AMT_ANNUITY']

In [70]:
# Sauvegarde de df_dashboard
fic_sav_df_dashboard ='pickle_files/df_dashboard_1.pickle'
with open(fic_sav_df_dashboard, 'wb') as f:
    pickle.dump(df_dashboard, f, pickle.HIGHEST_PROTOCOL)

In [71]:
# Chargement du dataframe df_dashboard_1
fic_sav_df_dashboard_1 ='pickle_files/df_dashboard_1.pickle'
with open(fic_sav_df_dashboard_1, 'rb') as df_dashboard_1:
    df_dashboard = pickle.load(df_dashboard_1)
df_dashboard.shape

(48744, 220)

## <span style='background:PowderBlue'>4. Traitement des données informatives</span>

### <span style='background:orange'>4.1. Informations sur le client</span>

Les données informatives sur le client à rendre disponible et lisible pour le chargé de clientèle :

| Variable | Description | Dans test_set et non transformé? |
| --- | --- | --- |
| <p style='text-align: justify; color:blue'>**AMT_INCOME_TOTAL**</p> | <p style='text-align: justify;'>Total des revenus</p> | Non => à  ajouter |
| <p style='text-align: justify; color:green'>**AMT_CREDIT**</p> | <p style='text-align: justify;'>Montant du crédit</p> | Oui |
| <p style='text-align: justify; color:blue'>**NAME_FAMILY_STATUS**</p> | <p style='text-align: justify;'>Statut familial</p> | Non => à  ajouter |
| <p style='text-align: justify; color:orange'>**DAYS_BIRTH**</p> | <p style='text-align: justify;'>Âge</p> | Oui mais passer de jours en années |
| <p style='text-align: justify; color:orange'>**DAYS_EMPLOYED**</p> | <p style='text-align: justify;'>Ancienneté dans l'emploi</p> | Oui mais passer de jours en années |
| <p style='text-align: justify; color:green'>**AMT_ANNUITY**</p> | <p style='text-align: justify;'>Annuités</p> | Oui |
| <p style='text-align: justify; color:blue'>**AMT_GOODS_PRICE**</p> | <p style='text-align: justify;'>Montant du bien pour le crédit</p> | Non => à  ajouter |
| <p style='text-align: justify; color:blue'>**CNT_CHILDREN**</p> | <p style='text-align: justify;'>Nombre d'enfants</p> | Non => à  ajouter |
| <p style='text-align: justify; color:blue'>**NAME_CONTRACT_TYPE**</p> | <p style='text-align: justify;'>Type de prêt</p> | Non (encodage) => à  ajouter |
| <p style='text-align: justify; color:blue'>**NAME_EDUCATION_TYPE**</p> | <p style='text-align: justify;'>Niveau d'éducation</p> | Non (encodage) => à  ajouter |
| <p style='text-align: justify; color:blue'>**NAME_HOUSING_TYPE**</p> | <p style='text-align: justify;'>Type de logement</p> | Non (encodage) => à  ajouter |
| <p style='text-align: justify; color:blue'>**NAME_INCOME_TYPE**</p> | <p style='text-align: justify;'>Type de revenu</p> | Non (encodage) => à  ajouter |
| <p style='text-align: justify; color:orange'>**CODE_GENDER**</p> | <p style='text-align: justify;'>Sexe</p> | Oui mais transformer '0' en 'Féminin' et '1' en 'Masculin' |

In [72]:
df_infos = application_test[['SK_ID_CURR',
                            'AMT_INCOME_TOTAL', 'NAME_FAMILY_STATUS',
                            'AMT_GOODS_PRICE', 'CNT_CHILDREN',
                            'NAME_CONTRACT_TYPE', 'NAME_EDUCATION_TYPE',
                            'NAME_HOUSING_TYPE', 'NAME_INCOME_TYPE']]
df_infos.head(3)

Unnamed: 0,SK_ID_CURR,AMT_INCOME_TOTAL,NAME_FAMILY_STATUS,AMT_GOODS_PRICE,CNT_CHILDREN,NAME_CONTRACT_TYPE,NAME_EDUCATION_TYPE,NAME_HOUSING_TYPE,NAME_INCOME_TYPE
0,100001,135000.0,Married,450000.0,0,Cash loans,Higher education,House / apartment,Working
1,100005,99000.0,Married,180000.0,0,Cash loans,Secondary / secondary special,House / apartment,Working
2,100013,202500.0,Married,630000.0,0,Cash loans,Higher education,House / apartment,Working


In [73]:
# Ajout des varaibles manquantes au dataframe du dashboard
df_dashboard = df_infos.merge(df_dashboard, on='SK_ID_CURR', how='left')
df_dashboard.head(3)

Unnamed: 0,SK_ID_CURR,AMT_INCOME_TOTAL_x,NAME_FAMILY_STATUS,AMT_GOODS_PRICE,CNT_CHILDREN_x,NAME_CONTRACT_TYPE_x,NAME_EDUCATION_TYPE,NAME_HOUSING_TYPE,NAME_INCOME_TYPE,PRED_CLASSE_CLIENT,...,PREV_NAME_YIELD_GROUP_MEAN,CC_CNT_DRAWINGS_ATM_CURRENT_MEAN,POS_LOAN_COMPLETED_MEAN,PREV_AMT_DECLINED_SUM,INSTAL_AMT_INSTALMENT_MIN,INSTAL_AMT_PAYMENT_RATIO_MEAN,CC_CNT_DRAWINGS_CURRENT_VAR,PREV_LOAN_RATE_MEAN,PREV_DAYS_DECISION_MEAN,EXT_SOURCE_2
0,100001,135000.0,Married,450000.0,0,Cash loans,Higher education,House / apartment,Working,0,...,4.0,0.2106,0.45,1048.5,3951.0,1.0,0.3333,0.1661,-1740.0,0.924
1,100005,99000.0,Married,180000.0,0,Cash loans,Secondary / secondary special,House / apartment,Working,0,...,2.0,0.2106,0.0909,4464.0,4813.2,1.0,0.3333,0.1199,-536.0,0.341
2,100013,202500.0,Married,630000.0,0,Cash loans,Higher education,House / apartment,Working,0,...,2.5,0.2556,0.2893,-61051.5,67.5,0.935484,1.321,0.1257,-837.5,0.8184


In [74]:
# Retirer les caractères '_x' des noms de colonnes
df_dashboard.columns = df_dashboard.columns.str.replace('_x', '')

In [75]:
df_dashboard.head(3)

Unnamed: 0,SK_ID_CURR,AMT_INCOME_TOTAL,NAME_FAMILY_STATUS,AMT_GOODS_PRICE,CNT_CHILDREN,NAME_CONTRACT_TYPE,NAME_EDUCATION_TYPE,NAME_HOUSING_TYPE,NAME_INCOME_TYPE,PRED_CLASSE_CLIENT,...,PREV_NAME_YIELD_GROUP_MEAN,CC_CNT_DRAWINGS_ATM_CURRENT_MEAN,POS_LOAN_COMPLETED_MEAN,PREV_AMT_DECLINED_SUM,INSTAL_AMT_INSTALMENT_MIN,INSTAL_AMT_PAYMENT_RATIO_MEAN,CC_CNT_DRAWINGS_CURRENT_VAR,PREV_LOAN_RATE_MEAN,PREV_DAYS_DECISION_MEAN,EXT_SOURCE_2
0,100001,135000.0,Married,450000.0,0,Cash loans,Higher education,House / apartment,Working,0,...,4.0,0.2106,0.45,1048.5,3951.0,1.0,0.3333,0.1661,-1740.0,0.924
1,100005,99000.0,Married,180000.0,0,Cash loans,Secondary / secondary special,House / apartment,Working,0,...,2.0,0.2106,0.0909,4464.0,4813.2,1.0,0.3333,0.1199,-536.0,0.341
2,100013,202500.0,Married,630000.0,0,Cash loans,Higher education,House / apartment,Working,0,...,2.5,0.2556,0.2893,-61051.5,67.5,0.935484,1.321,0.1257,-837.5,0.8184


**Modification des dates en années au lieu de jours (positif)**

In [76]:
# Transformer DAYS_BIRTH en années positives
df_dashboard['YEAR_BIRTH'] = \
    np.trunc(np.abs(df_dashboard['DAYS_BIRTH'] / 365)).astype('int8')

In [77]:
# Transformer DAYS_EMPLOYED en années positives
df_dashboard['YEAR_EMPLOYED'] = \
    np.trunc(np.abs(df_dashboard['DAYS_EMPLOYED'] / 365)).astype('int8')

In [78]:
# Transformer sexe : 0 = Féminin et 1 = Masculin
df_dashboard['SEXE'] = ['Féminin' if row == 0 else 'Masculin'
                        for row in df_dashboard['CODE_GENDER']] 

In [79]:
df_dashboard.head(3)

Unnamed: 0,SK_ID_CURR,AMT_INCOME_TOTAL,NAME_FAMILY_STATUS,AMT_GOODS_PRICE,CNT_CHILDREN,NAME_CONTRACT_TYPE,NAME_EDUCATION_TYPE,NAME_HOUSING_TYPE,NAME_INCOME_TYPE,PRED_CLASSE_CLIENT,...,PREV_AMT_DECLINED_SUM,INSTAL_AMT_INSTALMENT_MIN,INSTAL_AMT_PAYMENT_RATIO_MEAN,CC_CNT_DRAWINGS_CURRENT_VAR,PREV_LOAN_RATE_MEAN,PREV_DAYS_DECISION_MEAN,EXT_SOURCE_2,YEAR_BIRTH,YEAR_EMPLOYED,SEXE
0,100001,135000.0,Married,450000.0,0,Cash loans,Higher education,House / apartment,Working,0,...,1048.5,3951.0,1.0,0.3333,0.1661,-1740.0,0.924,52,6,Féminin
1,100005,99000.0,Married,180000.0,0,Cash loans,Secondary / secondary special,House / apartment,Working,0,...,4464.0,4813.2,1.0,0.3333,0.1199,-536.0,0.341,49,12,Masculin
2,100013,202500.0,Married,630000.0,0,Cash loans,Higher education,House / apartment,Working,0,...,-61051.5,67.5,0.935484,1.321,0.1257,-837.5,0.8184,54,12,Masculin


In [81]:
# Sauvegarde de df_dashboard_2
fic_sav_df_dashboard ='pickle_files/df_dashboard_2.pickle'
with open(fic_sav_df_dashboard, 'wb') as f:
    pickle.dump(df_dashboard, f, pickle.HIGHEST_PROTOCOL)

In [82]:
# Chargement du dataframe df_dashboard_2
fic_sav_df_dashboard_2 ='pickle_files/df_dashboard_2.pickle'
with open(fic_sav_df_dashboard_2, 'rb') as df_dashboard_2:
    df_dashboard = pickle.load(df_dashboard_2)
df_dashboard.shape

(48744, 231)

In [83]:
# Traits stricts du client
df_info_client = df_dashboard[['SK_ID_CURR', 'YEAR_BIRTH', 'SEXE',
                               'NAME_FAMILY_STATUS',
                               'CNT_CHILDREN', 'NAME_EDUCATION_TYPE',
                               'NAME_INCOME_TYPE', 'YEAR_EMPLOYED',
                               'AMT_INCOME_TOTAL']]

In [84]:
df_info_client = df_info_client.rename(columns = {
     'YEAR_BIRTH' : 'Âge (ans)',
     'SEXE' : 'Sexe',
     'NAME_FAMILY_STATUS' : 'Statut familial',
     'CNT_CHILDREN' : 'Nbre enfants',
     'NAME_EDUCATION_TYPE' : 'Niveau éducation',
     'NAME_INCOME_TYPE' : 'Type revenu',
     'YEAR_EMPLOYED' : 'Ancienneté emploi',
     'AMT_INCOME_TOTAL' : 'Revenus ($)'})
df_info_client.head(3)

Unnamed: 0,SK_ID_CURR,Âge (ans),Sexe,Statut familial,Nbre enfants,Niveau éducation,Type revenu,Ancienneté emploi,Revenus ($)
0,100001,52,Féminin,Married,0,Higher education,Working,6,135000.0
1,100005,49,Masculin,Married,0,Secondary / secondary special,Working,12,99000.0
2,100013,54,Masculin,Married,0,Higher education,Working,12,202500.0


In [85]:
# Réduction de la taille des variables pour optimisation mémoire
df_info_client = fonctions_data.reduce_mem_usage(df_info_client)

-------------------------------------------------------------------------------
Memory usage du dataframe: 3.07 MB
Memory usage après optimization: 2.37 MB
Diminution de 22.7%
-------------------------------------------------------------------------------


In [86]:
# Sauvegarde de df_info_client
fic_sav_df_info_client ='pickle_files/df_info_client.pickle'
with open(fic_sav_df_info_client, 'wb') as f:
    pickle.dump(df_info_client, f, pickle.HIGHEST_PROTOCOL)

In [87]:
# Chargement du dataframe df_info_client
fic_sav_df_info_client ='pickle_files/df_info_client.pickle'
with open(fic_sav_df_info_client, 'rb') as df_info_client:
    df_info_client = pickle.load(df_info_client)
df_info_client.shape

(48744, 9)

In [88]:
filename = 'pickle_files/df_info_client.pickle'
outfile = open(filename,'wb')
pickle.dump(df_info_client, outfile)
outfile.close()

In [89]:
# Infos principales client
client_id = 100001
client_info = df_info_client[df_info_client['SK_ID_CURR'
                                           ] == client_id].iloc[:, 2:]
client_info.style.hide_index()

Sexe,Statut familial,Nbre enfants,Niveau éducation,Type revenu,Ancienneté emploi,Revenus ($)
Féminin,Married,0,Higher education,Working,6,135000.0


##### **Informations sur la demande de prêt du client**

In [90]:
# Informations sur la demande prêt
df_pret_client = df_dashboard[['SK_ID_CURR', 'NAME_CONTRACT_TYPE',
                               'AMT_CREDIT',
                               'AMT_ANNUITY', 'AMT_GOODS_PRICE',
                               'NAME_HOUSING_TYPE']]

In [91]:
df_pret_client = df_pret_client.rename(columns = {
     'NAME_CONTRACT_TYPE' : 'Type de prêt',
     'AMT_CREDIT' : 'Montant du crédit ($)',
     'AMT_ANNUITY' : 'Annuités ($)',
     'AMT_GOODS_PRICE' : 'Montant du bien ($)',
     'NAME_HOUSING_TYPE' : 'Type de logement'})
df_pret_client.head(3)

Unnamed: 0,SK_ID_CURR,Type de prêt,Montant du crédit ($),Annuités ($),Montant du bien ($),Type de logement
0,100001,Cash loans,568800.0,20560.5,450000.0,House / apartment
1,100005,Cash loans,222768.0,17370.0,180000.0,House / apartment
2,100013,Cash loans,663264.0,69777.0,630000.0,House / apartment


In [92]:
# Réduction de la taille des variables pour optimisation mémoire
df_pret_client = reduce_mem_usage(df_pret_client)

-------------------------------------------------------------------------------
Memory usage du dataframe: 2.60 MB
Memory usage après optimization: 1.86 MB
Diminution de 28.6%
-------------------------------------------------------------------------------


In [93]:
# Sauvegarde de df_pret_client
fic_sav_df_pret_client ='pickle_files/df_pret_client.pickle'
with open(fic_sav_df_pret_client, 'wb') as f:
    pickle.dump(df_pret_client, f, pickle.HIGHEST_PROTOCOL)

In [94]:
# Chargement du dataframe df_pret_client
fic_sav_df_pret_client ='pickle_files/df_pret_client.pickle'
with open(fic_sav_df_pret_client, 'rb') as df_pret_client:
    df_pret_client = pickle.load(df_pret_client)
df_pret_client.shape

(48744, 6)