# **Logistic Regression: Predicting Loan Approval**

## Scenario
You are a data scientist at a financial institution tasked with developing a model to predict loan approval status. The insights derived from this model will help streamline the loan application process, ensuring faster and more accurate decisions.

## Dataset Overview
The dataset includes the following columns:
- `Gender`: Gender of the applicant (`1` for male, `0` for female).
- `Married`: Marital status (`1` for married, `0` for unmarried).
- `Dependents`: Number of dependents the applicant has.
- `Education`: Educational status (`1` for graduate, `0` for not graduate).
- `Self_Employed`: Employment status (`1` for self-employed, `0` for not self-employed).
- `ApplicantIncome`: Income of the primary applicant.
- `CoapplicantIncome`: Income of the co-applicant.
- `LoanAmount`: Loan amount requested (in thousands).
- `Loan_Amount_Term`: Term of the loan (in days).
- `Credit_History`: Credit history of the applicant (`1` for clear, `0` for poor history).
- `Loan_Status`: Target variable; loan approval status (`1` for approved, `0` for not approved).

## Your Challenge
Build a logistic regression model to predict whether a loan application will be approved (`Loan_Status`).

## Why Logistic Regression?
Logistic regression is well-suited for this scenario because:
1. The target variable (`Loan_Status`) is categorical, making logistic regression ideal for binary classification problems.
2. It allows us to assess the impact of various input features on the probability of loan approval.
3. It provides interpretable coefficients, enabling better understanding and communication of insights.

## Identifying Input and Target Variables
- **Input Variables**: `Gender`, `Married`, `Dependents`, `Education`, `Self_Employed`, `ApplicantIncome`, `CoapplicantIncome`, `LoanAmount`, `Loan_Amount_Term`, `Credit_History`.
- **Target Variable**: `Loan_Status`.



## Step 1: Import Libraries

In [39]:
import pandas as pd  # For data manipulation and analysis
import numpy as np  # For numerical operations
from sklearn.model_selection import train_test_split  # For splitting the dataset
from sklearn.linear_model import LogisticRegression  # Logistic regression model
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix  # Evaluation metrics

## Step 2: Load the Dataset

In [41]:
# Load the dataset
file_path = 'loan_data.csv'
data = pd.read_csv(file_path)

# Display the first few rows
print(data.head())


    Loan_ID  Gender  Married  Dependents  Education  Self_Employed  \
0  LP001003       1        1           1          1              0   
1  LP001005       1        1           0          1              1   
2  LP001006       1        1           0          0              0   
3  LP001008       1        0           0          1              0   
4  LP001013       1        1           0          0              0   

   ApplicantIncome  CoapplicantIncome  LoanAmount  Loan_Amount_Term  \
0             4583             1508.0       12800               360   
1             3000                0.0        6600               360   
2             2583             2358.0       12000               360   
3             6000                0.0       14100               360   
4             2333             1516.0        9500               360   

   Credit_History  Loan_Status  
0               1            0  
1               1            1  
2               1            1  
3               1   

## Step 3: Data Preparation

In [43]:
# Separate input features and target variable
X = data[['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed','ApplicantIncome','CoapplicantIncome','LoanAmount','Loan_Amount_Term','Credit_History']] # Input Variables
y = data['Loan_Status'] # Target Variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


## Step 4: Train the Model

In [45]:
# Initialise and train the logistic regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Display coefficients and intercept
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

Coefficients: [[ 1.08307243e-01  1.73873508e-01 -3.31518171e-02  2.97222957e-01
   4.25276371e-01 -5.14464980e-05  1.10292206e-04  4.85393910e-05
   4.99770477e-04  3.50806871e+00]]
Intercept: [-3.03875858]


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


## Step 5: Evaluate the Model

In [47]:
# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Confusion matrix and classification report
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.80
Confusion Matrix:
 [[14 16]
 [ 3 60]]
Classification Report:
               precision    recall  f1-score   support

           0       0.82      0.47      0.60        30
           1       0.79      0.95      0.86        63

    accuracy                           0.80        93
   macro avg       0.81      0.71      0.73        93
weighted avg       0.80      0.80      0.78        93



### Result Interpretation 
The model has an accuracy of 0.80, which means it correctly predicts whether a loan should be approved or not in 80% of the cases. For every 100 predictions, 80 are accurate, reflecting the model's overall reliability.

## Step 6: Make Predictions with New Data

In [63]:
# Example new data for prediction
new_data = pd.DataFrame({
    'Gender': [1, 0],
    'Married': [1, 0],
    'Dependents': [1, 0],
    'Education': [0, 1],
    'Self_Employed': [1, 0],
    'ApplicantIncome': [10000, 8000],
    'CoapplicantIncome': [2000, 8],
    'LoanAmount': [14000, 3000],
    'Loan_Amount_Term': [360, 120],
    'Credit_History':[1,0],
})
new_data


Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
0,1,1,1,0,1,10000,2000,14000,360,1
1,0,0,0,1,0,8000,8,3000,120,0


- **Individual 1 (Row 1):**  
  - **Gender:** Male (1)  
  - **Married:** Yes (1)  
  - **Dependents:** 1 dependent  
  - **Education:** Not Graduate (0)  
  - **Self Employed:** Yes (1)  
  - **Applicant Income:** 10,000  
  - **Coapplicant Income:** 2,000  
  - **Loan Amount:** 14,000  
  - **Loan Amount Term:** 360 months  
  - **Credit History:** Clear (1)  


- **Individual 2 (Row 2):**  
  - **Gender:** Female (0)  
  - **Married:** No (0)  
  - **Dependents:** No dependents (0)  
  - **Education:** Graduate (1)  
  - **Self Employed:** No (0)  
  - **Applicant Income:** 8,000  
  - **Coapplicant Income:** 8  
  - **Loan Amount:** 3,000  
  - **Loan Amount Term:** 120 months  
  - **Credit History:** Poor (0)  


In [65]:
# Predict the loan status for new data
predicted_classes = model.predict(new_data)
print("Predicted Classes:", predicted_classes)

Predicted Classes: [1 0]


## What's Next?

Up next, we will learn about Decision Trees, a powerful and intuitive algorithm used for both classification and regression tasks. Stay tuned! 