Model Inference for Loan Approval Prediction

This notebook performs model inference using the trained XGBoost model (`xgb_pipeline_model.pkl`). The objective is to predict loan approval status (`Approved` or `Rejected`) for new, raw data representing loan applicants. The model was trained to assist banks in making informed loan approval decisions by evaluating features such as income, assets, CIBIL score, and employment status. The inference data is provided in its raw format, and the model's pipeline handles all preprocessing (e.g., outlier capping, scaling, encoding).

Dataset Context: The original dataset (`loan_approval_dataset.csv`) contains 11 features (after dropping `loan_id`): `no_of_dependents`, `education`, `self_employed`, `income_annum`, `loan_amount`, `loan_term`, `cibil_score`, `residential_assets_value`, `commercial_assets_value`, `luxury_assets_value`, and `bank_asset_value`. The target is `loan_status` (`Approved` or `Rejected`). For inference, we use a small sample of new data mimicking this structure.

Objective: To test the trained model on new, unseen data and interpret the predictions in the context of loan approval, ensuring the model generalizes well to real-world scenarios.

## 1. Import Libraries

Import only the libraries needed for loading the model, handling data, and making predictions. No unused libraries are included to ensure readability.

In [1]:
import pandas as pd
import joblib
import numpy as np


## 2. Load the Saved Model

Load the trained XGBoost model pipeline (`xgb_pipeline_model.pkl`) saved from the training notebook. The pipeline includes preprocessing steps (outlier capping, scaling, encoding) and the XGBoost classifier.

In [2]:
# Load the saved model
model = joblib.load('xgb_pipeline_model.pkl')
print('Model loaded successfully.')

Model loaded successfully.


## 3. Prepare New Data

Create a small sample of new, raw data for inference. The data matches the original dataset’s structure (11 features, excluding `loan_status`). Values are realistic and unprocessed (raw format, no scaling or encoding).

In [3]:
# Define new data for inference (raw values)
new_data = pd.DataFrame({
    'no_of_dependents': [3, 1, 5],
    'education': ['Graduate', 'Not Graduate', 'Graduate'],
    'self_employed': ['No', 'Yes', 'Yes'],
    'income_annum': [5000000, 2500000, 9000000],
    'loan_amount': [12000000, 4000000, 25000000],
    'loan_term': [12, 8, 20],
    'cibil_score': [700, 400, 850],
    'residential_assets_value': [4000000, 1500000, 12000000],
    'commercial_assets_value': [1000000, 0, 4000000],
    'luxury_assets_value': [8000000, 3000000, 20000000],
    'bank_asset_value': [2000000, 800000, 6000000]
})

# Display the new data
print('New Data for Inference:')
new_data

New Data for Inference:


Unnamed: 0,no_of_dependents,education,self_employed,income_annum,loan_amount,loan_term,cibil_score,residential_assets_value,commercial_assets_value,luxury_assets_value,bank_asset_value
0,3,Graduate,No,5000000,12000000,12,700,4000000,1000000,8000000,2000000
1,1,Not Graduate,Yes,2500000,4000000,8,400,1500000,0,3000000,800000
2,5,Graduate,Yes,9000000,25000000,20,850,12000000,4000000,20000000,6000000


## 4. Make Predictions

Use the loaded model to predict loan approval status. The pipeline automatically handles preprocessing. Convert numerical predictions (0 or 1) to categorical labels (`Rejected` or `Approved`) for interpretability.

In [4]:
# Make predictions
predictions = model.predict(new_data)

# Convert numerical predictions to categorical labels (0 = Rejected, 1 = Approved)
label_map = {0: 'Rejected', 1: 'Approved'}
predictions_categorical = [label_map[pred] for pred in predictions]

# Add predictions to the new data
new_data['predicted_loan_status'] = predictions_categorical

# Display predictions
print('Predictions for New Data:')
new_data

Predictions for New Data:


Unnamed: 0,no_of_dependents,education,self_employed,income_annum,loan_amount,loan_term,cibil_score,residential_assets_value,commercial_assets_value,luxury_assets_value,bank_asset_value,predicted_loan_status
0,3,Graduate,No,5000000,12000000,12,700,4000000,1000000,8000000,2000000,Approved
1,1,Not Graduate,Yes,2500000,4000000,8,400,1500000,0,3000000,800000,Rejected
2,5,Graduate,Yes,9000000,25000000,20,850,12000000,4000000,20000000,6000000,Approved


## 5. Interpretation of Results

The predictions show whether each applicant's loan is likely to be Approved or Rejected based on their features:
- High CIBIL Score Applicants: Applicants with CIBIL scores above 700 (e.g., 700 or 850) and substantial assets are likely approved, indicating lower credit risk.
- Low CIBIL Score Applicants: Applicants with lower CIBIL scores (e.g., 400) or high loan amounts relative to income may be rejected due to higher risk.

Business Context: For a bank like 'GrowEasy,' these predictions streamline loan approval decisions. Approved applicants can proceed with loan processing, while rejected applicants can receive feedback (e.g., improve credit score) to enhance future eligibility. The model’s high F1-score (0.9841 on test data) ensures reliable predictions, balancing approval of eligible applicants and rejection of risky ones.

Insights:
- The model heavily weighs cibil_score, consistent with EDA findings where higher scores correlated with approvals.
- The pipeline’s preprocessing ensures raw input data is handled consistently, making it suitable for real-world use.

Next Steps: Integrate the model into a web app for real-time predictions, allowing applicants to input details and receive instant approval decisions.