# **Loan Eligibility Prediction System**
This project is a machine learningâ€“based web application that predicts whether a loan application will be approved or rejected based on applicant details such as income, credit history, education, employment status, and property area.
The model is trained using a Random Forest classifier with proper data preprocessing and is deployed using Streamlit to allow real-time user input and prediction.

 **Problem Statement**
 
Financial institutions need a reliable system to determine whether a loan applicant is eligible for approval. Manual evaluation is time-consuming and error-prone. This project aims to automate loan eligibility prediction using machine learning techniques.

**Objectives**
* To analyze applicant data and identify key factors affecting loan approval
* To build a machine learning model for predicting loan eligibility
* To provide loan approval results with probability percentage
* To deploy the model using Streamlit for real-time user input


**Dataset Description**
The dataset used in this project is a loan application dataset obtained from Kaggle. It contains applicant demographic details, income information, credit history, and loan-related attributes.

**Features:**
* Applicant Income
* Coapplicant Income
* Loan Amount
* Loan Amount Term
* Credit History
* Gender, Marital Status, Education
* Property Area


**1. IMPORT REQUIRED LIBRARIES**

In [None]:
#data handling
import numpy as np
import pandas as pd

#data visualisation
import matplotlib.pyplot as plt
import seaborn as sns

#data precprocessing
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline


from sklearn.ensemble import RandomForestClassifier

#model evaluation
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_auc_score,
    roc_curve, precision_recall_curve
)


#to save model
import joblib


#style for plot
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")



: 

**2. LOADING DATASET**

In [None]:
#load data
df=pd.read_csv('/kaggle/input/loan-eligibility-prediction/Loan Eligibility Prediction.csv')
df.head()

**3. EXPLORATORY DATA ANALYSIS**

In [None]:

df.info()


In [None]:
df.describe()

In [None]:
df.shape


In [None]:
#checking for null values
df.isnull().sum()

In [None]:
#visualizing 
sns.countplot(x='Loan_Status',data=df)
plt.title("Loan Status Distribution")

plt.show()

In [None]:
#MARRIED VS LOAN _STATUS
sns.countplot(x='Married',hue='Loan_Status',data=df)
plt.show()

In [None]:
#education vs loan_status
sns.countplot(x='Education',hue='Loan_Status',data=df)
plt.show()

In [None]:
#gender vs loan_status

sns.countplot(x='Gender',hue='Loan_Status',data=df)
plt.show()

In [None]:
#Applicant_income with loan_status
plt.figure(figsize=(8,5))
sns.violinplot(x='Loan_Status', y='Applicant_Income', data=df)
plt.title("Applicant Income Distribution with Loan Status")
plt.show()


In [None]:
#relationship of credit history with loan status
credit_loan = pd.crosstab(df['Credit_History'], df['Loan_Status'], normalize='index') * 100
credit_loan.plot(kind='bar', stacked=True, figsize=(7,5))
plt.title("Loan Status Percentage by Credit History")
plt.ylabel("Percentage")
plt.xlabel("Credit History")
plt.legend(title="Loan Status")
plt.show()


In [None]:
#CORRELATION MATRIX
#feature feature analysis of numerical data only
plt.figure(figsize=(10,6))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.title("Correlation Analysis")
plt.show()


**4.Feature Engineering**
Feature engineering is the process of creating, transforming, or modifying features to improve model performance and prediction accuracy.

**Feature selection**:
From EDA, we learned that credit_history strongly affects loan_status and Applicant_Income and Loan amount also have some Effect.Customer_Id is useless for prediction so it should be dropped.

In [None]:
#drop customer_id as it is not necessary for model training
df.drop('Customer_ID',axis=1,inplace=True)
df.head()


In [None]:
#adding feature
#TOTAL INCOME
df['Total_Income']=df['Applicant_Income']+df['Coapplicant_Income']


#EMI
df['EMI']=df['Total_Income']/df['Loan_Amount_Term']

#income to loan ratio
df['Income_Loan_Ratio'] = df['Total_Income'] / df['Loan_Amount']
df.head()


**5.Handle Categorical Variables (Encoding)**

In [None]:
#Convert target variable
df['Loan_Status'] = df['Loan_Status'].astype(str).str.strip()
df['Loan_Status'] = df['Loan_Status'].map({'Y': 1, 'N': 0})

df.head()


In [None]:
#Encoded categorial features
le=LabelEncoder()
cat_features=   [ 'Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'Property_Area']

for col in cat_features:
    df[col]=le.fit_transform(df[col])


In [None]:
df.head()

In [None]:
#convert loan_amount_term(months) into years
df['Loan_Term_Years'] = df['Loan_Amount_Term'] / 12
df.drop('Loan_Amount_Term', axis=1, inplace=True)


In [None]:
print(df.columns)#checking the column name
df.head()


**5.Split Features and Target**

In [None]:
X=df.drop('Loan_Status',axis=1)
y=df['Loan_Status']
df.head()

In [None]:
print(y.head())


**6.Train_Test_Split**

In [None]:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


**7.TrainRandomClassifier**

In [None]:
rf = RandomForestClassifier(
    n_estimators=400,
    max_depth=12,
    min_samples_split=8,
    class_weight='balanced',
    random_state=42
)


rf.fit(X_train, y_train)

**8.Model Evaluation**

In [None]:

y_pred = rf.predict(X_test)


#Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", round(accuracy*100,2), "%")


#Precision
precision = precision_score(y_test, y_pred)
print("Precision:", round(precision,2))


#confusion matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

#F1-score
f1 = f1_score(y_test, y_pred)
print("F1 Score:", round(f1,2))


#Recall
recall = recall_score(y_test, y_pred)
print("Recall:", round(recall,2))


In [None]:
#plotting confusion matrix for visualization
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()


In [None]:
#Classification report
print(classification_report(y_test, y_pred))


**9.Add probability to DataFrame for analysis**

In [None]:
X_test_copy = X_test.copy()
X_test_copy['Actual_Loan_Status'] = y_test
X_test_copy['Approval_Probability_%'] = (rf.predict_proba(X_test)[:,1] * 100).round(2)

X_test_copy.head()


**10.Save the trained model**

In [None]:
joblib.dump(rf, "rf_model.pkl")