Step 6: Supervised Machine Learning Models


This notebook trains and compares supervised learning models
to predict loan approval outcomes based on historical data.

1. IMPORTS

In [26]:
import pandas as pd
import numpy as np

df = pd.read_csv(
    r"E:\ALL Documents\LEVEL 6 Completed\Projects\Week 1 AI & ML & Linux\end-to-end-explainable-ai-system\data\raw\loan-prediction-dataset.csv"
)

from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    confusion_matrix,
    classification_report,
    roc_auc_score
)

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

2. LOAD CLEANED DATA

In [27]:
df = pd.read_csv(
    r"E:/ALL Documents/LEVEL 6 Completed/Projects/Week 1 AI & ML & Linux/end-to-end-explainable-ai-system/data/processed/cleaned_data.csv"
)

X = df.drop(columns=["Loan_Status"])
y = df["Loan_Status"]


3. TRAIN / TEST SPLIT

In [28]:
X = pd.get_dummies(X, drop_first=True)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

In [29]:
X_train.dtypes

ApplicantIncome                                                                                                                                                                                                                                                    int64
CoapplicantIncome                                                                                                                                                                                                                                                float64
LoanAmount                                                                                                                                                                                                                                                       float64
Credit_History                                                                                                                                                                                               

4. MODEL DEFINITIONS

In [30]:
models = {
    "Logistic Regression": LogisticRegression(max_iter=1000),
    "Decision Tree": DecisionTreeClassifier(random_state=42),
    "SVM": SVC(probability=True),
    "KNN": KNeighborsClassifier(n_neighbors=5)
}

5. TRAIN & EVALUATE

In [31]:
results = {}

for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    y_prob = model.predict_proba(X_test)[:, 1]

    results[name] = {
        "Confusion Matrix": confusion_matrix(y_test, y_pred),
        "ROC AUC": roc_auc_score(y_test, y_prob),
        "Report": classification_report(y_test, y_pred)
    }

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [32]:
model.fit(X_train, y_train)

6. VIEW RESULTS

In [33]:
for model, metrics in results.items():
    print(f"\n{model}")
    print("ROC AUC:", metrics["ROC AUC"])
    print(metrics["Report"])


Logistic Regression
ROC AUC: 0.7299418604651163
              precision    recall  f1-score   support

           0       0.95      0.42      0.58        43
           1       0.76      0.99      0.86        80

    accuracy                           0.79       123
   macro avg       0.85      0.70      0.72       123
weighted avg       0.83      0.79      0.76       123


Decision Tree
ROC AUC: 0.6405523255813954
              precision    recall  f1-score   support

           0       0.62      0.42      0.50        43
           1       0.73      0.86      0.79        80

    accuracy                           0.71       123
   macro avg       0.68      0.64      0.65       123
weighted avg       0.69      0.71      0.69       123


SVM
ROC AUC: 0.5392441860465116
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        43
           1       0.65      1.00      0.79        80

    accuracy                           0.65       123
   

7. Saving Output

In [42]:
df.to_csv("E:/ALL Documents/LEVEL 6 Completed/Projects/Week 1 AI & ML & Linux/end-to-end-explainable-ai-system/model/logistic_regression_model.csv", index=False)

8. INTERPRETATION

Model Comparison Observations

- Logistic Regression provides strong baseline performance
- Decision Trees offer higher interpretability
- SVM captures complex boundaries but is less explainable
- KNN performance depends heavily on feature scaling