## Loan Approval Prediction

### 1) Problem statement ###
*Design a machine learning model to predict loan approval status (Loan_Status) using demographic, financial, and property-related features to help banks streamline the loan approval process and reduce manual intervention.

### 2) Data Collection ###
* Dataset Source: 'data/loan_approval_dataset.csv'
* The data consists of 12 column and 1000 rows.

### 2.1 Import Data and Required Packages ###
### Importing Pandas, Numpy, Matplotlib, Seaborn and Warings Library.

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter('ignore')

In [11]:
df = pd.read_csv('../data/loan_approval_dataset.csv')
df.head()

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Gender,Married,Dependents,Education,Self_Employed,Property_Area,Loan_Status
0,2196,3326,253,36,1,Male,No,0,Graduate,Yes,Urban,Not Approved
1,15764,8460,252,120,1,Female,Yes,1,Graduate,Yes,Urban,Not Approved
2,19994,3671,146,60,0,Female,No,3+,Graduate,Yes,Semiurban,Approved
3,7370,1544,477,120,0,Female,Yes,0,Not Graduate,No,Urban,Not Approved
4,13064,3025,477,120,0,Female,No,0,Graduate,Yes,Semiurban,Approved


In [12]:
df.shape

(1000, 12)

### 2.2 Dataset information ###
* ApplicantIncome: 	Income of the primary applicant.
* CoapplicantIncome: Income of the co-applicant (if any).
* LoanAmount: Loan amount requested (in thousands).
* Loan_Amount_Term: Term of the loan in months.
* Credit_History: Whether the applicant has a good credit history (1 = Yes, 0 = No).
* Gender: Gender of the applicant.
* Married: Whether the applicant is married.
* Dependents: Number of dependents the applicant has.
* Education: Education level of the applicant.
* Self_Employed: Whether the applicant is self-employed.
* Property_Area: The type of area where the property is located.
* Loan_Status: Final loan decision (whether the loan is approved or not).

### 3. Data Checks to perform ###
* Check Missing values
* Check Duplicates
* Check data type
* Check the number of unique values of each column
* Check statistics of data set
* Check various categories present in the different categorical column

### 3.1 Check data types ###

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   ApplicantIncome    1000 non-null   int64 
 1   CoapplicantIncome  1000 non-null   int64 
 2   LoanAmount         1000 non-null   int64 
 3   Loan_Amount_Term   1000 non-null   int64 
 4   Credit_History     1000 non-null   int64 
 5   Gender             1000 non-null   object
 6   Married            1000 non-null   object
 7   Dependents         1000 non-null   object
 8   Education          1000 non-null   object
 9   Self_Employed      1000 non-null   object
 10  Property_Area      1000 non-null   object
 11  Loan_Status        1000 non-null   object
dtypes: int64(5), object(7)
memory usage: 93.9+ KB


### 3.2 Check Missing values ###

In [14]:
df.isnull().sum()

ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
Property_Area        0
Loan_Status          0
dtype: int64

### 3.3 Check statistics of data set ###

In [15]:
df.describe()

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
count,1000.0,1000.0,1000.0,1000.0,1000.0
mean,11254.596,5161.281,275.685,134.424,0.459
std,5197.772287,2860.984079,131.93029,121.210231,0.498566
min,2008.0,4.0,51.0,12.0,0.0
25%,6803.25,2736.25,162.0,36.0,0.0
50%,11280.5,5290.5,279.0,60.0,0.0
75%,16015.5,7631.5,388.25,240.0,1.0
max,19994.0,9992.0,500.0,360.0,1.0
