# Why I Chose the Loan Approval Prediction Project Title

  . I chose the Loan Approval Prediction project title because it reflects the core objective of the project: to develop a machine learning model that can predict the approval or rejection of loan applications based on various factors such as borrower income, loan size, interest rates, and other financial attributes.

This project combines data science and finance, addressing a critical use case in the banking and lending industry, where automation of loan approval can greatly improve efficiency and reduce human bias. The goal of predicting loan approval with high accuracy could help lenders make faster, more informed decisions, thereby benefiting both lenders and borrowers. The title also emphasizes the practical application of machine learning in real-world financial systems.

By predicting the outcome of loan applications, this project aims to contribute to financial inclusivity, helping people with good credit but limited access to traditional banking services, while also improving risk management for lenders.

# Loan Approval Prediction - Project Report
  ## 1. Project Title
  . Loan Approval Prediction
  ## 2. Project Overview
  The Loan Approval Prediction project aims to build a machine learning model to predict whether a loan application will be approved or denied. Using a dataset with various features such as loan size, interest rate, borrower income, debt-to-income ratio, and more, we train a machine learning model (Random Forest Classifier) to make these predictions. The primary goal is to automate the loan approval process, providing more accurate and efficient decision-making for lenders.

## 3. Project Objectives

To predict loan approval based on borrower data.
To explore and preprocess the dataset to handle missing values and scale features.
To train two machine learning models (Logistic Regression and Random Forest Classifier) and compare their performance.
To deploy the trained model as an API using FastAPI to provide real-time predictions.

## 4. Technologies Used

FastAPI: Framework for building the API to serve the model.
Uvicorn: ASGI server for running the FastAPI application.
Scikit-learn: Machine learning library used for building and training the model.
Pandas: Library for data manipulation and preprocessing.
Joblib: Used to serialize the trained model.
Seaborn/Matplotlib: Visualization libraries for plotting graphs.
Python 3.x: Programming language for the project.

## 5. Data Preprocessing
   ### Dataset Description
   The dataset consists of the following features:

loan_size: The size of the loan the borrower is requesting.
interest_rate: The interest rate associated with the loan.
borrower_income: The annual income of the borrower.
debt_to_income: The ratio of the borrower’s debt compared to their income.
num_of_accounts: The number of financial accounts the borrower has.
derogatory_marks: Number of derogatory marks (e.g., bankruptcies, missed payments).
total_debt: The total debt the borrower has.
loan_status: Target variable (1 = Approved, 0 = Not Approved).

### Handling Missing Values
 Missing values are handled using the mean for numerical columns and mode for categorical columns.

df.fillna(df.mean(), inplace=True)
df.fillna(df.mode().iloc[0], inplace=True)


## Feature Scaling

The numerical features are scaled using StandardScaler to standardize the values and make the data ready for machine learning algorithms.

numerical_cols = ['loan_size', 'interest_rate', 'borrower_income', 
                  'debt_to_income', 'num_of_accounts', 'derogatory_marks', 
                  'total_debt']

scaler = StandardScaler()
df[numerical_cols] = scaler.fit_transform(df[numerical_cols])
## Splitting the Data
The dataset is split into training (80%) and testing (20%) sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## 6. Model Building
Logistic Regression

A Logistic Regression model is trained to predict whether the loan will be approved or not. After training, we evaluated the model's performance using the testing set.
log_classifier = LogisticRegression()
log_classifier.fit(X_train, y_train)
y_pred_log = log_classifier.predict(X_test)

Results:
Training Accuracy: 0.85
Testing Accuracy: 0.83
Classification Report:
The classification report includes precision, recall, and F1-score for each class.
Random Forest Classifier
A Random Forest Classifier is also used to predict loan approval. This model was trained with 500 estimators and evaluated with the test set.

rf_model = RandomForestClassifier(random_state=42, n_estimators=500)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
Results:
Training Accuracy: 0.90
Testing Accuracy: 0.87
Classification Report:
The classification report for Random Forest is included with better precision for both classes compared to Logistic Regression.
## 7. Confusion Matrix and Feature Importance
Confusion Matrix
For the Random Forest model, we plot the confusion matrix to visualize the classification performance:

cm = confusion_matrix(y_test, y_pred_rf)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
The confusion matrix shows the counts of true positives, true negatives, false positives, and false negatives.

Feature Importance
Using the Random Forest model, we calculated feature importances to identify which features had the most significant impact on the loan approval decision.

importances = rf_model.feature_importances_
feature_names = X.columns
sorted(zip(importances, feature_names), reverse=True)
Feature Importance Results:
interest_rate: 33.9% - Most influential feature for approval.
borrower_income: 16.8% - Important factor in predicting approvals.
total_debt: 15.9% - High debt can lead to rejection.
debt_to_income: 14.2% - A lower DTI ratio is favorable.
loan_size: 13.7% - Larger loans may have a higher risk of rejection.
num_of_accounts: 5.3% - Mild impact on approval chances.
derogatory_marks: 0.01% - Least influential for the model.
## 8. Deployment

The FastAPI app is deployed using Render (or any other cloud platform such as Heroku or Railway). The model is loaded and used to predict loan approvals via the /predict endpoint.

uvicorn main:app --reload

Request: Send a POST request with the following JSON body:
{
  "loan_size": 4000.0,
  "interest_rate": 3.5,
  "borrower_income": 150000.0,
  "debt_to_income": 0.10,
  "num_of_accounts": 8,
  "derogatory_marks": 0,
  "total_debt": 2000.0
}
Response:
{
  "prediction": "Approved",
  "approval_probability": 0.418
}
9. Conclusion
The Loan Approval Prediction API leverages machine learning to automate the decision-making process in loan approvals. The Random Forest model outperforms Logistic Regression in terms of accuracy and feature importance, making it more suitable for this problem.

This API can be deployed and used to make predictions in real-world applications, providing fast and reliable loan approval decisions. Future work can include retraining the model with a larger dataset, tuning hyperparameters, and further improving model accuracy.

10. Future Work
Model Enhancement: Use more advanced models like XGBoost or LightGBM to improve accuracy.
Hyperparameter Tuning: Use grid search or random search to optimize model parameters.
Data Augmentation: Use more data to enhance model performance.