# **K-Nearest Neighbours (KNN): Predicting Loan Approval**

## Scenario
You are a data analyst at a financial institution tasked with developing a model to predict loan approval status. The insights derived from this model will help streamline the loan application process, ensuring faster and more accurate decisions.

## Dataset Overview
The dataset includes the following columns:
- `Gender`: Gender of the applicant (`1` for male, `0` for female).
- `Married`: Marital status (`1` for married, `0` for unmarried).
- `Dependents`: Number of dependents the applicant has.
- `Education`: Educational status (`1` for graduate, `0` for not graduate).
- `Self_Employed`: Employment status (`1` for self-employed, `0` for not self-employed).
- `ApplicantIncome`: Income of the primary applicant.
- `CoapplicantIncome`: Income of the co-applicant.
- `LoanAmount`: Loan amount requested (in thousands).
- `Loan_Amount_Term`: Term of the loan (in days).
- `Credit_History`: Credit history of the applicant (`1` for clear, `0` for poor history).
- `Loan_Status`: Target variable; loan approval status (`1` for approved, `0` for not approved).

## Your Challenge
Build a **K-Nearest Neighbours (KNN) model** to predict whether a loan application will be approved (`Loan_Status`).

## Why K-Nearest Neighbours?
K-Nearest Neighbours is well-suited for this scenario because:
1. It is a simple and intuitive algorithm that predicts outcomes based on the most similar data points (neighbours) in the dataset.
2. It handles both categorical and numerical data effectively, making it versatile for datasets with mixed data types.
3. It does not make assumptions about the data distribution, making it robust for complex or non-linear relationships.
4. It provides flexibility through the choice of the number of neighbours (`k`), allowing fine-tuning for better performance.

## Identifying Input and Target Variables
- **Input Variables**: `Gender`, `Married`, `Dependents`, `Education`, `Self_Employed`, `ApplicantIncome`, `CoapplicantIncome`, `LoanAmount`, `Loan_Amount_Term`, `Credit_History`.
- **Target Variable**: `Loan_Status`.



## Step 1: Import Libraries

In [3]:
import pandas as pd  # For data manipulation and analysis
from sklearn.model_selection import train_test_split  # For splitting the dataset
from sklearn.neighbors import KNeighborsClassifier  # KNN model
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix  # Evaluation metrics

## Step 2: Load the Dataset

In [4]:
# Load the dataset
file_path = 'loan_data.csv'
data = pd.read_csv(file_path)

# Display the first few rows
print(data.head())


    Loan_ID  Gender  Married  Dependents  Education  Self_Employed  \
0  LP001003       1        1           1          1              0   
1  LP001005       1        1           0          1              1   
2  LP001006       1        1           0          0              0   
3  LP001008       1        0           0          1              0   
4  LP001013       1        1           0          0              0   

   ApplicantIncome  CoapplicantIncome  LoanAmount  Loan_Amount_Term  \
0             4583             1508.0       12800               360   
1             3000                0.0        6600               360   
2             2583             2358.0       12000               360   
3             6000                0.0       14100               360   
4             2333             1516.0        9500               360   

   Credit_History  Loan_Status  
0               1            0  
1               1            1  
2               1            1  
3               1   

## Step 3: Data Preparation

In [7]:
# Separate input features and target variable
X = data[['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed','ApplicantIncome','CoapplicantIncome','LoanAmount','Loan_Amount_Term','Credit_History']] # Input Variables
y = data['Loan_Status'] # Target Variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


## Step 4: Train the Model

In [9]:
# Initialise and train the KNN model
model = KNeighborsClassifier(n_neighbors=5)
model.fit(X_train, y_train)

## Step 5: Evaluate the Model

In [11]:
# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Confusion matrix and classification report
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.67
Confusion Matrix:
 [[ 5 25]
 [ 6 57]]
Classification Report:
               precision    recall  f1-score   support

           0       0.45      0.17      0.24        30
           1       0.70      0.90      0.79        63

    accuracy                           0.67        93
   macro avg       0.57      0.54      0.52        93
weighted avg       0.62      0.67      0.61        93



### Result Interpretation 
The model has an accuracy of 0.67, which means it correctly predicts whether a loan should be approved or not in 67% of the cases.

## Step 6: Make Predictions with New Data

In [13]:
# Example new data for prediction
new_data = pd.DataFrame({
    'Gender': [1, 0],
    'Married': [1, 0],
    'Dependents': [1, 0],
    'Education': [0, 1],
    'Self_Employed': [1, 0],
    'ApplicantIncome': [10000, 8000],
    'CoapplicantIncome': [2000, 8],
    'LoanAmount': [14000, 3000],
    'Loan_Amount_Term': [360, 120],
    'Credit_History':[1,0],
})
new_data


Unnamed: 0,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
0,1,1,1,0,1,10000,2000,14000,360,1
1,0,0,0,1,0,8000,8,3000,120,0


- **Individual 1 (Row 1):**  
  - **Gender:** Male (1)  
  - **Married:** Yes (1)  
  - **Dependents:** 1 dependent  
  - **Education:** Not Graduate (0)  
  - **Self Employed:** Yes (1)  
  - **Applicant Income:** 10,000  
  - **Coapplicant Income:** 2,000  
  - **Loan Amount:** 14,000  
  - **Loan Amount Term:** 360 months  
  - **Credit History:** Clear (1)  


- **Individual 2 (Row 2):**  
  - **Gender:** Female (0)  
  - **Married:** No (0)  
  - **Dependents:** No dependents (0)  
  - **Education:** Graduate (1)  
  - **Self Employed:** No (0)  
  - **Applicant Income:** 8,000  
  - **Coapplicant Income:** 8  
  - **Loan Amount:** 3,000  
  - **Loan Amount Term:** 120 months  
  - **Credit History:** Poor (0)  


In [15]:
# Predict the loan status for new data
predicted_classes = model.predict(new_data)
print("Predicted Classes:", predicted_classes)

Predicted Classes: [1 1]


## What's Next?

Up next, we will learn about K-Means Clustering, a popular and effective algorithm used for unsupervised learning tasks. It groups data points into clusters based on their similarity, helping us uncover hidden patterns and relationships in the data. Stay tuned!