# Loan Approval Prediction System

This project predicts whether a loan should be approved or not based on applicant data using ML techniques.

## Steps:
1. Data Loading
2. Data Preprocessing
3. Exploratory Data Analysis (EDA)
4. Model Building
5. Model Evaluation
6. Conclusion

In [8]:
# Importing Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

In [9]:
# Load Dataset
df = pd.read_csv('loan_data.csv.csv')  # Replace with actual path to dataset
df.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


In [10]:
# Data Preprocessing
# Data Preprocessing - Fix Missing Values

# Fill missing numerical values with median
num_cols = df.select_dtypes(include=['float64', 'int64']).columns
for col in num_cols:
    df[col].fillna(df[col].median(), inplace=True)

# Fill missing categorical values with mode
cat_cols = df.select_dtypes(include=['object']).columns
for col in cat_cols:
    df[col].fillna(df[col].mode()[0], inplace=True)

# Label Encode Categorical Features
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for column in cat_cols:
    df[column] = le.fit_transform(df[column])


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].median(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].mode()[0], inplace=True)


In [11]:
# Feature Selection
X = df.drop(['Loan_Status'], axis=1)
y = df['Loan_Status']

In [12]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [13]:
# Train the model
model = LogisticRegression()
model.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [14]:
# Make predictions
y_pred = model.predict(X_test)

In [15]:
# Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.7560975609756098

Confusion Matrix:
 [[14 29]
 [ 1 79]]

Classification Report:
               precision    recall  f1-score   support

           0       0.93      0.33      0.48        43
           1       0.73      0.99      0.84        80

    accuracy                           0.76       123
   macro avg       0.83      0.66      0.66       123
weighted avg       0.80      0.76      0.72       123



In [16]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Predict on the test set
y_pred = model.predict(X_test)

# Accuracy Score
print("Accuracy:", accuracy_score(y_test, y_pred))

# Classification Report
print("Classification Report:\n", classification_report(y_test, y_pred))

# Confusion Matrix
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


Accuracy: 0.7560975609756098
Classification Report:
               precision    recall  f1-score   support

           0       0.93      0.33      0.48        43
           1       0.73      0.99      0.84        80

    accuracy                           0.76       123
   macro avg       0.83      0.66      0.66       123
weighted avg       0.80      0.76      0.72       123

Confusion Matrix:
 [[14 29]
 [ 1 79]]


In [19]:
['Gender', 'Married', 'Education', 'Self_Employed', 'ApplicantIncome', 
 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term', 
 'Credit_History', 'Property_Area', 'Dependents']


['Gender',
 'Married',
 'Education',
 'Self_Employed',
 'ApplicantIncome',
 'CoapplicantIncome',
 'LoanAmount',
 'Loan_Amount_Term',
 'Credit_History',
 'Property_Area',
 'Dependents']

In [21]:
print(X.columns)


Index(['Loan_ID', 'Gender', 'Married', 'Dependents', 'Education',
       'Self_Employed', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount',
       'Loan_Amount_Term', 'Credit_History', 'Property_Area'],
      dtype='object')


In [22]:
# Drop Loan_ID if not relevant
X = df.drop(['Loan_Status', 'Loan_ID'], axis=1)
y = df['Loan_Status']


In [23]:
model = LogisticRegression()
model.fit(X, y)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [26]:
# Save the exact feature order used during training
feature_names = X.columns.tolist()
feature_names

['Gender',
 'Married',
 'Dependents',
 'Education',
 'Self_Employed',
 'ApplicantIncome',
 'CoapplicantIncome',
 'LoanAmount',
 'Loan_Amount_Term',
 'Credit_History',
 'Property_Area']

In [27]:
import pandas as pd

# Your feature order from training
feature_names = ['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed',
                 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount',
                 'Loan_Amount_Term', 'Credit_History', 'Property_Area']

# New data sample (modify values as needed)
new_data = pd.DataFrame([[
    1,    # Gender (1 = Male)
    1,    # Married (1 = Yes)
    0,    # Dependents
    0,    # Education (0 = Graduate)
    0,    # Self_Employed (0 = No)
    6000, # ApplicantIncome
    1500, # CoapplicantIncome
    120,  # LoanAmount
    360,  # Loan_Amount_Term
    1,    # Credit_History
    2     # Property_Area (Urban)
]], columns=feature_names)

# Predict
prediction = model.predict(new_data)

# Result
if prediction[0] == 1:
    print("✅ Loan Approved")
else:
    print("❌ Loan Denied")


✅ Loan Approved
