This project aims to automate the process of classifying customer credit scores into categories (e.g., Good, Standard, Poor). By leveraging historical financial data and banking behavior, we built a robust machine learning pipeline that handles complex data cleaning, feature engineering, and high-performance classification.
The model is trained on a comprehensive financial dataset consisting of 100,000 records for training and 50,000 records for testing.
- Financials: Annual Income, Monthly In-hand Salary, Outstanding Debt.
- Banking Habits: Number of Bank Accounts, Number of Credit Cards, Interest Rate, Delay from due date.
- Credit History: Credit History Age, Number of Credit Inquiries, Credit Mix.
- Behavioral: Monthly Balance, Payment Behaviour, Amount Invested Monthly.
- Anomaly Detection: Cleaned structural errors in categorical fields like Occupation, SSN, and Payment_Behaviour using customer-specific mode imputation.
- Numerical Cleaning: Handled "dirty" numerical strings (e.g., stripping non-numeric characters from Age and Annual_Income).
- Feature Transformation: Converted Credit_History_Age from strings (e.g., "22 years and 1 month") into a continuous float format (22.01) for model compatibility.
- Imputation Strategy: Used median and mode values grouped by Customer_ID to preserve individual financial profiles.
- Random Forest Classifier
- Gradient Boosting (GBM)
- AdaBoost
- XGBoost (Optimized for high performance)
- Accuracy: Overall correctness of the prediction.
- F1-Score: To ensure a balance between precision and recall across all credit classes.
- Classification Report: Detailed breakdown of performance per class.
After extensive data cleaning and feature engineering, four major classification models were tested. XGBoost emerged as the most balanced and accurate model for this dataset.
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| XGBoostClassifier | 0.7470 | 0.7477 | 0.7470 | 0.7470 |
| RandomForestClassifier | 0.7386 | 0.7658 | 0.7386 | 0.7426 |
| GradientBoostingClassifier | 0.7239 | 0.7255 | 0.7239 | 0.7236 |
| AdaBoostClassifier | 0.6460 | 0.6474 | 0.6484 | 0.6450 |