## 📊 Dataset Overview
###  Column	Description
* Age	Applicant's age
* Income	Annual income
* LoanAmount	Requested loan amount
* LoanTermMonths	Loan term duration (in months)
* CreditScore	Credit score (range: 300–850)
* Employed	Employment status: Yes or No
* Approved	Target variable: 1 = Approved, 0 = Rejected

## 🧠 Goal:
### Use this dataset to demonstrate hyperparameter tuning with GridSearchCV, especially for a model like Decision Tree or Random Forest, where parameters like:

* max_depth

* min_samples_split

* n_estimators (in Random Forest)

can be optimized.

## ✅ Full Python Code: GridSearchCV for Decision Tree

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report

# Load the dataset
df = pd.read_csv("loan_approval_dataset.csv")
df.head(10)


Unnamed: 0,Age,Income,LoanAmount,LoanTermMonths,CreditScore,Employed,Approved
0,63,110798,47327,12,742,Yes,0
1,20,32214,38744,60,585,No,0
2,46,56937,28754,60,531,Yes,1
3,52,100849,49974,24,763,No,1
4,56,39006,19543,12,668,No,1
5,35,38115,35325,48,466,No,1
6,37,70827,3747,12,840,Yes,0
7,60,63100,48956,12,665,Yes,0
8,40,118296,24327,12,671,No,1
9,51,112619,21550,12,475,No,1


In [3]:
df.shape

(200, 7)

In [4]:
# Features and target
X = df.drop("Approved", axis=1)
y = df["Approved"]

# Categorical and numerical columns
categorical = ["Employed"]
numerical = ["Age", "Income", "LoanAmount", "LoanTermMonths", "CreditScore"]

# Preprocessing pipeline
preprocessor = ColumnTransformer(
    transformers=[("cat", OneHotEncoder(drop="first"), categorical)],
    remainder="passthrough"
)

# Full modeling pipeline
pipeline = Pipeline([
    ("preprocessing", preprocessor),
    ("model", DecisionTreeClassifier(random_state=42))
])

# Define hyperparameter grid
param_grid = {
    "model__max_depth": [3, 5, 7, None],
    "model__min_samples_split": [2, 5, 10],
    "model__criterion": ["gini", "entropy"]
}

# GridSearchCV
grid_search = GridSearchCV(
    estimator=pipeline,
    param_grid=param_grid,
    cv=5,
    scoring="accuracy",
    verbose=1,
    n_jobs=-1
)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the model
grid_search.fit(X_train, y_train)

# Evaluate the best model
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

# Output best params and performance
print("Best Hyperparameters:", grid_search.best_params_)
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Fitting 5 folds for each of 24 candidates, totalling 120 fits
Best Hyperparameters: {'model__criterion': 'gini', 'model__max_depth': 5, 'model__min_samples_split': 10}

Classification Report:
               precision    recall  f1-score   support

           0       0.33      0.07      0.12        14
           1       0.65      0.92      0.76        26

    accuracy                           0.62        40
   macro avg       0.49      0.50      0.44        40
weighted avg       0.54      0.62      0.54        40



## 🔍 What This Code Shows:
### How hyperparameters (like max_depth, criterion) affect model accuracy.

### How to use GridSearchCV for exhaustive tuning using 5-fold cross-validation.

### A classification report to evaluate the best-found model.

## 🧠 What Happens:
* Each customer goes through the same preprocessing steps (OneHotEncoder) as the training data.

* The trained Decision Tree (with optimal hyperparameters) makes a prediction.

* The model returns 1 for approved, 0 for rejected.

## ✅ Full Working Code Example:

| Customer | Age | Income | LoanAmount | LoanTermMonths | CreditScore | Employed |
| -------- | --- | ------ | ---------- | -------------- | ----------- | -------- |
| A        | 30  | 50,000 | 10,000     | 36             | 700         | Yes      |
| B        | 47  | 80,000 | 20,000     | 60             | 650         | No       |
| C        | 60  | 90,000 | 60,000     | 24             | 720         | No       |
| D        | 25  | 30,000 | 5,000      | 24             | 720         | Yes      |

## ✅ Code to Predict Loan Approval for These Customers:
### You’ll use the best pipeline trained earlier to predict their approval status:

In [6]:
import pandas as pd

# Define the 3 new customers as a DataFrame
new_customers = pd.DataFrame([
    {"Age": 30, "Income": 50000, "LoanAmount": 10000, "LoanTermMonths": 36, "CreditScore": 700, "Employed": "Yes"},
    {"Age": 47, "Income": 63100, "LoanAmount": 20000, "LoanTermMonths": 60, "CreditScore": 650, "Employed": "No"},
    {"Age": 60, "Income": 63100, "LoanAmount": 90000, "LoanTermMonths": 60, "CreditScore": 840, "Employed": "No"},
    {"Age": 25, "Income": 30000, "LoanAmount": 5000,  "LoanTermMonths": 24, "CreditScore": 720, "Employed": "Yes"}
])

# Use the trained best model pipeline (from GridSearchCV)
predictions = best_model.predict(new_customers)

# Display predictions
for i, prediction in enumerate(predictions, 1):
    result = "Approved ✅" if prediction == 1 else "Rejected ❌"
    print(f"Customer {chr(64+i)}: {result}")


Customer A: Approved ✅
Customer B: Rejected ❌
Customer C: Rejected ❌
Customer D: Approved ✅
