#                                 PROJECT TITLE:

# 📊 Loan Eligibility Prediction Project

This project aims to build a Machine Learning model that can predict whether a loan applicant is eligible for a loan based on details such as income, credit history, marital status, and more.

We will use a dataset from a loan company that contains customer and loan details. The final goal is to create a web application using Streamlit where users can input their data and check if they are eligible for a loan.


##  About the Dataset

The dataset contains information about individuals who applied for home loans. It includes their personal, financial, and credit-related information. These features are used to predict whether their loan application will be approved or not.

###  Features in the dataset:

#### 🧍 Personal Details:
- `Gender`: Male/Female
- `Married`: Applicant's marital status
- `Dependents`: Number of dependents
- `Education`: Graduate or Not Graduate
- `Self_Employed`: Whether the applicant is self-employed

#### 💰 Financial Details:
- `ApplicantIncome`: Income of the main applicant
- `CoapplicantIncome`: Income of the co-applicant (if any)

#### 💼 Loan Details:
- `LoanAmount`: Requested loan amount (in ₹1,000s)
- `Loan_Amount_Term`: Duration of the loan in months  
  *(e.g., 360 = 30 years, 120 = 10 years)*

#### 🧾 Credit Information:
- `Credit_History`: Whether the applicant has a credit history  
  *(1 = has history, 0 = no history)*

#### 🏡 Property Info:
- `Property_Area`: Area type (Urban, Semiurban, or Rural)

---

### 🎯 Target Variable:
- `Loan_Status`: Whether the loan was approved  
  - `'Y'` = Approved  
  - `'N'` = Not Approved

---
### 🙋 Why I Chose This Dataset:
- I selected this dataset because:

  - Although the dataset is about home loans, I personally have a student loan, which sparked my curiosity about how loan approval systems work in          general.

  - It connects machine learning with real-life decision-making in finance.

  - It’s perfect for building an end-to-end ML project — from data cleaning, analysis, and model building to deployment using Streamlit.

For a detailed explanation of each column, see the table below.



## Dataset Source
- The dataset used in this project was sourced from [Kaggle - Loan Prediction Dataset](https://www.kaggle.com/datasets/altruistdelhite04/loan-prediction-problem-dataset?select=train_u6lujuX_CVtuZ9i.csv).
- I customized the code , performed my own analysis,and deployed the project using Streamlit to understand the complete Machine Learning lifecycle.


###  Column Descriptions

| Column Name         | Description                                 |
|---------------------|---------------------------------------------|
| Loan_ID             | Unique loan ID                              |
| Gender              | Male/Female                                 |
| Married             | Applicant's marital status                  |
| Dependents          | Number of dependents                        |
| Education           | Graduate/Not Graduate                       |
| Self_Employed       | Self-employed or not                        |
| ApplicantIncome     | Income of the applicant                     |
| CoapplicantIncome   | Income of the co-applicant (if any)         |
| LoanAmount          | Loan amount requested (in thousands)        |
| Loan_Amount_Term    | Duration of loan (in months)                |
| Credit_History      | 1 = Good, 0 = Bad                           |
| Property_Area       | Urban / Semiurban / Rural                   |
| Loan_Status         | Target variable: Y = Approved, N = Not Approved |


## Project Workflow Overview 
- 1. Data Cleaning & Handling Missing Values 
- 2. Exploratory Data Analysis (EDA)
  3. Feature Engineering
  4. Model Building & Evaluation
  5. Web App Deployment using Streaamlit

## Business Impact 
- An accurate loan eligibility prediction system can help banks and lenders reduce risk,streamline processing ,and improve decision-making based on customer data. This adds value by increasing approval accuracy and customer satisfaction.

###  Initial Insights

- The dataset contains 614 rows and 13 columns.
- There is a mix of categorical and numerical variables:
  - Categorical: Gender, Married, Dependents, Education, etc.
  - Numerical: ApplicantIncome, CoapplicantIncome, LoanAmount, etc.
- Missing values are present in several columns such as
 `Gender`,`Married`,`Dependents`,`Self_Employed`,`LoanAmount`,`Loan_Amount_Term`,`Credit_history`(to handle in Day 3)
- ApplicantIncome ranges from very low to quite high, indicating a diverse applicant pool.
- LoanAmount values are relatively small (usually under 400), possibly due to being recorded in thousands.
- The target variable `Loan_Status` is slightly imbalanced, with more approvals ('Y') than rejections ('N').

These insights guide the preprocessing steps and help in selecting appropriate models and evaluation metrics.
- we'll explore the relationships between features and loan approval in the next steps.


### Feature Relevance & Business Understanding 
- Understanding how each feature relates to loan approval is essential for building a realistic and effective ML model. Here's a breakdown:

#### 🧍 ApplicantIncome:
- Represents the main applicant’s income.
- A higher income generally suggests better ability to repay the loan.
-However, a low income doesn't always result in rejection — other factors like CoapplicantIncome and LoanAmount also play a role.

#### 🧑‍🤝‍🧑 CoapplicantIncome
- Income from a joint applicant (e.g., spouse, family).
- Can boost total repayment capacity.
- Useful when the primary applicant has a low income.

#### 💰 LoanAmount
- The amount requested as a loan (in ₹1,000s).
- Should be reasonable compared to the total income.
- High LoanAmount with low income may reduce chances of approval

#### 🕒 Loan_Amount_Term
- Duration to repay the loan (in months).
- Longer terms (e.g., 360 months) reduce monthly payments (EMIs), making the loan easier to repay.
- Shorter terms increase EMIs, requiring a higher monthly income.

#### 📈 Credit_History
- Indicates past loan repayment behavior:

   - 1: Good repayment history.

   - 0: No or poor repayment history.

- One of the most influential features. Applicants with a credit history of 1 are far more likely to get loan approval.

#### 👨‍👩‍👧 Dependents
- Number of dependents (children or family members).
- More dependents = greater financial responsibility = potential repayment burden.
- Applicants with fewer dependents may be seen as less risky.

#### 💼 Self_Employed
- Indicates whether the applicant is self-employed.
- May imply irregular income, so some banks consider it riskier than salaried income.
- However, high-income self-employed applicants can still qualify easily.

#### 🏘️ Property_Area
- Indicates where the property is located: Urban, Semiurban, or Rural.
- Urban and semiurban areas might offer better infrastructure or resale value.
- Rural areas may need stronger financials to offset perceived risk.

In [8]:
import numpy as np # linear algebra
import pandas as pd # data processing

In [10]:
df=pd.read_csv("Loandataset.csv")
df

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y
...,...,...,...,...,...,...,...,...,...,...,...,...,...
609,LP002978,Female,No,0,Graduate,No,2900,0.0,71.0,360.0,1.0,Rural,Y
610,LP002979,Male,Yes,3+,Graduate,No,4106,0.0,40.0,180.0,1.0,Rural,Y
611,LP002983,Male,Yes,1,Graduate,No,8072,240.0,253.0,360.0,1.0,Urban,Y
612,LP002984,Male,Yes,2,Graduate,No,7583,0.0,187.0,360.0,1.0,Urban,Y


In [12]:
df.shape

(614, 13)

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Loan_ID            614 non-null    object 
 1   Gender             601 non-null    object 
 2   Married            611 non-null    object 
 3   Dependents         599 non-null    object 
 4   Education          614 non-null    object 
 5   Self_Employed      582 non-null    object 
 6   ApplicantIncome    614 non-null    int64  
 7   CoapplicantIncome  614 non-null    float64
 8   LoanAmount         592 non-null    float64
 9   Loan_Amount_Term   600 non-null    float64
 10  Credit_History     564 non-null    float64
 11  Property_Area      614 non-null    object 
 12  Loan_Status        614 non-null    object 
dtypes: float64(4), int64(1), object(8)
memory usage: 62.5+ KB


In [16]:
df.isnull().sum()

Loan_ID               0
Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64

In [18]:
df.describe()

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
count,614.0,614.0,592.0,600.0,564.0
mean,5403.459283,1621.245798,146.412162,342.0,0.842199
std,6109.041673,2926.248369,85.587325,65.12041,0.364878
min,150.0,0.0,9.0,12.0,0.0
25%,2877.5,0.0,100.0,360.0,1.0
50%,3812.5,1188.5,128.0,360.0,1.0
75%,5795.0,2297.25,168.0,360.0,1.0
max,81000.0,41667.0,700.0,480.0,1.0
