# Loan eligibility prediction:

## Importing Libraries:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

## Reading data:

In [5]:
loan_dataframe=pd.read_csv("./Data/loan.csv")
loan_df= loan_dataframe.copy()
loan_df.sample(5)

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
438,Male,No,0,Graduate,Yes,10416,0.0,187.0,360.0,0.0,Urban,N
234,Male,Yes,1,Graduate,No,3155,1779.0,140.0,360.0,1.0,Semiurban,Y
538,Male,Yes,0,Not Graduate,No,2917,536.0,66.0,360.0,1.0,Rural,N
376,Male,Yes,3+,Graduate,No,8750,4996.0,130.0,360.0,1.0,Rural,Y
469,Male,Yes,0,Graduate,No,4333,2451.0,110.0,360.0,1.0,Urban,N


In [6]:
loan_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Gender             601 non-null    object 
 1   Married            611 non-null    object 
 2   Dependents         599 non-null    object 
 3   Education          614 non-null    object 
 4   Self_Employed      582 non-null    object 
 5   ApplicantIncome    614 non-null    int64  
 6   CoapplicantIncome  614 non-null    float64
 7   LoanAmount         592 non-null    float64
 8   Loan_Amount_Term   600 non-null    float64
 9   Credit_History     564 non-null    float64
 10  Property_Area      614 non-null    object 
 11  Loan_Status        614 non-null    object 
dtypes: float64(4), int64(1), object(7)
memory usage: 57.7+ KB


## Data cleaning:

### A : Handling Missing Values

In [7]:
loan_df.isnull().sum()

Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64

In [8]:
loan_df['Gender'] = loan_df['Gender'].fillna(loan_df['Gender'].mode()[0])
loan_df['Married'] = loan_df['Married'].fillna(loan_df['Married'].mode()[0])
loan_df['Dependents'] = loan_df['Dependents'].fillna(loan_df['Dependents'].mode()[0])
loan_df['Self_Employed'] = loan_df['Self_Employed'].fillna(loan_df['Self_Employed'].mode()[0])

loan_df['LoanAmount'] = loan_df['LoanAmount'].fillna(loan_df['LoanAmount'].median())
loan_df['Loan_Amount_Term'] = loan_df['Loan_Amount_Term'].fillna(loan_df['Loan_Amount_Term'].mode()[0])
loan_df['Credit_History'] = loan_df['Credit_History'].fillna(loan_df['Credit_History'].mode()[0]).astype('int')

**Handling Missing Values in loan_df**

In data preprocessing, handling missing values is crucial for building reliable machine learning models. In this section, we address the missing values in the loan_df DataFrame by filling them with appropriate statistics.

**Categorical Columns** 

For categorical columns, we fill missing values with the **mode** (the most frequent value). This approach is suitable as it retains the most common category in the dataset.

- **Gender**: Filled missing values with the mode of the Gender column.
- **Married**: Filled missing values with the mode of the Married column.
- **Dependents**: Filled missing values with the mode of the Dependents column.
- **Self_Employed**: Filled missing values with the mode of the Self_Employed column.

**Numerical Columns**

For numerical columns, we fill missing values with the **median**. The median is preferred over the mean in this case because it is less sensitive to outliers.

- **LoanAmount**: Filled missing values with the median of the LoanAmount column.
- **Loan_Amount_Term**: Filled missing values with the mode of the Loan_Amount_Term column.
- **Credit_History**: Filled missing values with the mode of the Credit_History column.


In [11]:
loan_df['Dependents'] = loan_df['Dependents'].replace({'3+': 3}) 
loan_df['Dependents'] = loan_df['Dependents'].astype(int)

In [12]:
loan_df.isnull().sum()

Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
Loan_Status          0
dtype: int64

In [13]:
loan_df.sample(5)

Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
212,Male,Yes,1,Graduate,Yes,7787,0.0,240.0,360.0,1,Urban,Y
475,Male,Yes,2,Graduate,Yes,16525,1014.0,150.0,360.0,1,Rural,Y
59,Male,Yes,2,Not Graduate,No,3357,2859.0,144.0,360.0,1,Urban,Y
194,Male,No,0,Graduate,No,4191,0.0,120.0,360.0,1,Rural,Y
218,Male,Yes,2,Graduate,No,5000,0.0,72.0,360.0,0,Semiurban,N
