# 📊 Prédiction d'approbation de prêt bancaire avec régression logistique 

<div style="text-align: center;">
  <img src="image.png" alt="Image centrée" >
</div>

## <span style="color:purple"> I-chargement des données</span>


In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns


In [44]:
data=pd.read_csv("loan_dataset.csv")
data.tail()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
609,LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
610,LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
611,LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
612,LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y
613,LP002990,Female,No,0,Graduate,Yes,4583,0.0,133.0,360.0,0.0,Semiurban,N


#####  <span style="color:green">description des variables</span>


- Loan_ID : Identifiant du prêt
- Gendre : Sexe du demandeur 
- Married : status marital (oui ou non)
- Dependents : Nombre de personnes à charge
- Education : niveau d'éducation
- Self_Employed : travail indépendant ou non 
- ApplicantIncome : revenu du demandeur 
- CoapplicantIncome : Revenu du co-demandeur 
- LoanAmount :  Montant du prêt demandé
- Loan_Amount_Term: durée du prêt
- Credit_History :  Historique de crédit (1 = bon, 0 = mauvais)
- Property_area : Zone de résidence (urbaine, semi-urbaine, rurale)
- loan_status :situation du prêt (la demande est approuvée ou non )

In [6]:
data.columns

Index(['Loan_ID', 'Gender', 'Married', 'Dependents', 'Education',
       'Self_Employed', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount',
       'Loan_Amount_Term', 'Credit_History', 'Property_Area', 'Loan_Status'],
      dtype='object')

In [30]:
data.shape

(614, 13)

-  <span style="color:green">variable cible</span>  : *Loan_status* 
- <span style="color:green">les variables explicatives</span> : *Gendre*, *Married*, *Dependents*, *Education*, *Self_Employed*, *ApplicantIncome*, *CoapplicantIncome*, *LoanAmount*, *Loan_Amount_Term*, *Credit_History*, *Property_area*

<span style="color:red">❓ Problématique : </span>
<blockquote style="color: black; font-style: italic; text-align: center;">
  Comment prédire efficacement l’approbation d’un prêt bancaire à partir des caractéristiques socio-économiques et financières d’un demandeur ?
</blockquote>

<span style="color:red">🎯 L'objectif : </span>
<blockquote style="color: black; font-style: italic; text-align: center;">
  développer un modèle de classification qui puisse, à partir d'informations comme le revenu, l'historique de crédit ou le statut familial, anticiper la décision finale d'approbation d’un prêt.
</blockquote>

## <span style="color:purple"> II-Préparation des données </span>

#####  <span style="color:green">Nettoyages des valeurs manquantes </span>

In [8]:
data.isnull().sum()

Loan_ID               0
Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64


> - les variables <span style="color:green">Gender, Married, Dependents, Self_Employed, LoanAmount, Loan_Amount_Term, Credit_history</span> contient des valeur manquantes 
> - pour régler ce probléme on va remplacer les valeurs manquantes par la valeur la plus fréquente , mediane , moyenne, etc. 

In [45]:
for col in ['Gender','Married','Dependents','Self_Employed','Loan_Amount_Term']:
    data[col].fillna(data[col].mode()[0],inplace=True)

In [46]:
data['LoanAmount'].fillna(data['LoanAmount'].median(),inplace=True)

La variable Credit_History contient un nombre important de valeurs manquantes. Les supprimer entraînerait une perte significative d'informations potentiellement utiles sur les clients. Pour éviter cela, nous choisissons de remplacer les valeurs manquantes par -1, ce qui permettra au modèle de considérer ces cas comme des informations inconnues plutôt que de les ignorer complètement.

In [47]:
data['Credit_History']=data['Credit_History'].fillna(-1)

In [48]:
data.isnull().sum()

Loan_ID              0
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
Loan_Status          0
dtype: int64

>  Toutes les valeurs manquantes ont été traitées avec des méthodes appropriées . Le dataset ne contient désormais plus de valeurs manquantes.

#####  <span style="color:green">Suppression des doublons </span>

In [35]:
data.duplicated()

0      False
1      False
2      False
3      False
4      False
       ...  
609    False
610    False
611    False
612    False
613    False
Length: 614, dtype: bool

In [20]:
data.duplicated().sum()

0

> Aucune ligne dupliquée n’a été détectée dans le dataset.

 #####  <span style="color:green">Encodage des variables catégorielles </span>

In [36]:
data.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,128.0,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


In [49]:
data['Loan_Status']=data['Loan_Status'].map({'Y':1,'N':0})

In [None]:
data['Gender']=data['Gender'].map({'Male':1,'Female':0})

In [None]:
data['Married']=data['Married'].map({'Yes':1,'No':0})

In [54]:
data['Education']=data['Education'].map({'Graduate':1,'Not Graduate':0})

In [56]:
data['Self_Employed']=data['Self_Employed'].map({'Yes':1,'No':0})

In [58]:
data['Dependents']=data['Dependents'].replace('3+',3).astype(int)

In [59]:
data.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,1,0,0,1,0,5849,0.0,128.0,360.0,1.0,Urban,1
1,LP001003,1,1,1,1,0,4583,1508.0,128.0,360.0,1.0,Rural,0
2,LP001005,1,1,0,1,1,3000,0.0,66.0,360.0,1.0,Urban,1
3,LP001006,1,1,0,0,0,2583,2358.0,120.0,360.0,1.0,Urban,1
4,LP001008,1,0,0,1,0,6000,0.0,141.0,360.0,1.0,Urban,1


## <span style="color:purple"> II-Analyse exploratoire des données (EDA) </span>

 #####  <span style="color:green">Statistiques descriptives </span>

In [60]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Loan_ID            614 non-null    object 
 1   Gender             614 non-null    int64  
 2   Married            614 non-null    int64  
 3   Dependents         614 non-null    int32  
 4   Education          614 non-null    int64  
 5   Self_Employed      614 non-null    int64  
 6   ApplicantIncome    614 non-null    int64  
 7   CoapplicantIncome  614 non-null    float64
 8   LoanAmount         614 non-null    float64
 9   Loan_Amount_Term   614 non-null    float64
 10  Credit_History     614 non-null    float64
 11  Property_Area      614 non-null    object 
 12  Loan_Status        614 non-null    int64  
dtypes: float64(4), int32(1), int64(6), object(2)
memory usage: 60.1+ KB


In [61]:
data.describe()

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Loan_Status
count,614.0,614.0,614.0,614.0,614.0,614.0,614.0,614.0,614.0,614.0,614.0
mean,0.81759,0.653094,0.7443,0.781759,0.13355,5403.459283,1621.245798,145.752443,342.410423,0.692182,0.687296
std,0.386497,0.476373,1.009623,0.413389,0.340446,6109.041673,2926.248369,84.107233,64.428629,0.613633,0.463973
min,0.0,0.0,0.0,0.0,0.0,150.0,0.0,9.0,12.0,-1.0,0.0
25%,1.0,0.0,0.0,1.0,0.0,2877.5,0.0,100.25,360.0,1.0,0.0
50%,1.0,1.0,0.0,1.0,0.0,3812.5,1188.5,128.0,360.0,1.0,1.0
75%,1.0,1.0,1.0,1.0,0.0,5795.0,2297.25,164.75,360.0,1.0,1.0
max,1.0,1.0,3.0,1.0,1.0,81000.0,41667.0,700.0,480.0,1.0,1.0
