**Programmer: python_scripts (Abhijith Warrier)**

**PYTHON SCRIPT TO *PREDICT LOAN APPROVAL USING SUPERVISED MACHINE LEARNING ON TABULAR FINANCIAL DATA*. üß†üè¶üìä**

This script demonstrates how machine learning is used in **banking and fintech** to predict whether a loan application should be approved. We model applicant risk using historical data and evaluate performance using business-relevant metrics.

---

## **üì¶ Install Required Packages**

**Install core ML and data processing libraries.**

In [None]:
pip install pandas numpy scikit-learn matplotlib seaborn

---

## **üß© Load the Loan Dataset**

**We assume a UCI/Kaggle-style loan approval dataset in CSV format.**

In [None]:
import pandas as pd

df = pd.read_csv("datasets/loan_data.csv")   # typical loan dataset
df.head()

Common features include:

- applicant income
- co-applicant income
- loan amount
- credit history
- employment status
- target: loan approval (0 = rejected, 1 = approved)

---

## **üîç Basic Data Inspection**

**Understand missing values and class distribution.**

In [None]:
print(df.info())
print(df["Loan_Status"].value_counts())

Loan datasets often contain missing values and mild class imbalance.

---

## **üßπ Handle Missing Values**

**Fill missing numeric and categorical values.**

In [None]:
from sklearn.impute import SimpleImputer

num_cols = df.select_dtypes(include="number").columns
cat_cols = df.select_dtypes(exclude="number").columns.drop("Loan_Status")

num_imputer = SimpleImputer(strategy="median")
cat_imputer = SimpleImputer(strategy="most_frequent")

df[num_cols] = num_imputer.fit_transform(df[num_cols])
df[cat_cols] = cat_imputer.fit_transform(df[cat_cols])

---

## **üî§ Encode Categorical Features**

**Convert categorical fields into numeric form.**

In [None]:
from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
for col in cat_cols:
    df[col] = encoder.fit_transform(df[col])

---

## **‚úÇÔ∏è Train/Test Split**

**Split features and labels with stratification.**

In [None]:
from sklearn.model_selection import train_test_split

X = df.drop("Loan_Status", axis=1)
y = df["Loan_Status"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    stratify=y,
    random_state=42
)

---

## **ü§ñ Train a Loan Approval Model**

**We use Logistic Regression as a strong baseline for financial decisions.**

In [None]:
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=6260)
model.fit(X_train, y_train)

---

## **üìä Evaluate Model Performance**

**Evaluate predictions using classification metrics.**

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Precision is especially important to avoid approving risky loans.

---

## **üìà ROC‚ÄìAUC for Approval Decisions**

**Measure how well the model separates approved vs rejected applicants.**

In [None]:
from sklearn.metrics import roc_auc_score

y_proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_proba)

print("ROC‚ÄìAUC:", auc)

ROC‚ÄìAUC helps evaluate ranking quality independent of thresholds.

---

## **üß™ Why This Matters in Finance**

- Loan approval directly affects financial risk
- False positives are costly (bad loans)
- Interpretability is required for compliance
- Threshold tuning matters more than raw accuracy

---

## **Key Takeaways**

1. Loan approval prediction is a core fintech ML use case.
2. Handling missing and categorical data is critical.
3. Logistic Regression provides interpretable risk scores.
4. Precision and ROC‚ÄìAUC matter more than accuracy.
5. ML assists decision-making ‚Äî final approval rules still apply.

---