Credit_Score_Prediction_Machine_Learning_Project

Project Overview

This project aims to automate the process of classifying customer credit scores into categories (e.g., Good, Standard, Poor). By leveraging historical financial data and banking behavior, we built a robust machine learning pipeline that handles complex data cleaning, feature engineering, and high-performance classification.

Dataset Description

The model is trained on a comprehensive financial dataset consisting of 100,000 records for training and 50,000 records for testing.

Target Variable: Credit_Score

Key Features:

Financials: Annual Income, Monthly In-hand Salary, Outstanding Debt.
Banking Habits: Number of Bank Accounts, Number of Credit Cards, Interest Rate, Delay from due date.
Credit History: Credit History Age, Number of Credit Inquiries, Credit Mix.
Behavioral: Monthly Balance, Payment Behaviour, Amount Invested Monthly.

Data Preprocessing & Engineering

A significant portion of this project focuses on data quality and feature transformation:

Anomaly Detection: Cleaned structural errors in categorical fields like Occupation, SSN, and Payment_Behaviour using customer-specific mode imputation.
Numerical Cleaning: Handled "dirty" numerical strings (e.g., stripping non-numeric characters from Age and Annual_Income).
Feature Transformation: Converted Credit_History_Age from strings (e.g., "22 years and 1 month") into a continuous float format (22.01) for model compatibility.
Imputation Strategy: Used median and mode values grouped by Customer_ID to preserve individual financial profiles.

Model Architecture

The project explores several ensemble learning techniques to find the most accurate classifier:

Random Forest Classifier
Gradient Boosting (GBM)
AdaBoost
XGBoost (Optimized for high performance)

Results & Evaluation

The models were evaluated using standard classification metrics:

Accuracy: Overall correctness of the prediction.
F1-Score: To ensure a balance between precision and recall across all credit classes.
Classification Report: Detailed breakdown of performance per class.

Model Performance Evaluation

After extensive data cleaning and feature engineering, four major classification models were tested. XGBoost emerged as the most balanced and accurate model for this dataset.

Model	Accuracy	Precision	Recall	F1-Score
XGBoostClassifier	0.7470	0.7477	0.7470	0.7470
RandomForestClassifier	0.7386	0.7658	0.7386	0.7426
GradientBoostingClassifier	0.7239	0.7255	0.7239	0.7236
AdaBoostClassifier	0.6460	0.6474	0.6484	0.6450

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Credit_Card_Prediction.xls		Credit_Card_Prediction.xls
Credit_Score_Prediction_Machine_Learning_Project.ipynb		Credit_Score_Prediction_Machine_Learning_Project.ipynb
README.md		README.md
Train and Test dataset.zip		Train and Test dataset.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit_Score_Prediction_Machine_Learning_Project

Project Overview

Dataset Description

The model is trained on a comprehensive financial dataset consisting of 100,000 records for training and 50,000 records for testing.

Target Variable: Credit_Score

Key Features:

Data Preprocessing & Engineering

A significant portion of this project focuses on data quality and feature transformation:

Model Architecture

The project explores several ensemble learning techniques to find the most accurate classifier:

Results & Evaluation

The models were evaluated using standard classification metrics:

Model Performance Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Credit_Score_Prediction_Machine_Learning_Project

Project Overview

Dataset Description

The model is trained on a comprehensive financial dataset consisting of 100,000 records for training and 50,000 records for testing.

Target Variable: Credit_Score

Key Features:

Data Preprocessing & Engineering

A significant portion of this project focuses on data quality and feature transformation:

Model Architecture

The project explores several ensemble learning techniques to find the most accurate classifier:

Results & Evaluation

The models were evaluated using standard classification metrics:

Model Performance Evaluation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages