# **Project:** **Credit Card Score Prediction with Machine Learning**

## **Project Type:** Classification
### **Contribution**: Individual

# **Project Summary:**


### **About Data-**
* Banks and credit card companies calculate your credit score to determine your creditworthiness. It helps banks and credit card companies immediately to issue loans to customers with good creditworthiness. Today banks and credit card companies use Machine Learning algorithms to classify all the customers in their database based on their credit history.

* The project focuses on predicting credit scores using a multiclass classification approach, leveraging a dataset of 150,000 records and 28 features.

* The dataset is already divided into training and testing sets, and the analysis adheres to the data science lifecycle. The primary goal is to explore the data, preprocess it, engineer meaningful features, and build a robust predictive model to categorize individuals based on their credit scores effectively.

* The project also aims to derive actionable insights into factors influencing credit scores, ensuring the model's practical relevance and interpretability.

## **Project Tasks:**

### **1. Data Understanding:**
* Familiarize with the dataset structure, size, and feature types.
* Identify the target variable and its unique classes.
* Assess data quality, including missing values, outliers, and data distributions.

### **2. Data Preprocessing**
**Training Data:**
* Handle missing values using appropriate imputation techniques.
* Detect and manage outliers based on statistical thresholds.
* Normalize or scale numerical features.
* Encode categorical variables effectively.

**Test Data:**
* Apply preprocessing transformations learned from the training data to ensure consistency.

### **3. Exploratory Data Analysis (EDA)**
**Perform comprehensive EDA on the training dataset:**
* Analyze feature distributions and relationships.
* Study the distribution and balance of the target variable.
* Visualize correlations and dependencies between features.
* Identify trends and patterns to guide feature engineering.

### **4. Feature Engineering**
* Engineer new features or transformations to enhance predictive power.
* Select key features based on statistical methods, correlation analysis, and domain knowledge.

### **5. Model Building**
* Experiment with suitable multiclass classification models (e.g., Logistic Regression, Random Forest, Gradient Boosting).
* Evaluate and optimize models through hyperparameter tuning.

### **5. Model Evaluation**
* Assess model performance using metrics appropriate for multiclass classification:
  * Accuracy
  * Precision, Recall, F1-score (macro/micro/weighted)
  * Confusion Matrix
  * ROC-AUC for multiclass classification (if applicable).
* Compare performance across models and select the most effective one.

### **7. Insights and Deployment**
* Extract and document insights related to credit score drivers from feature importance analysis.
* Prepare the model for real-world application or integration.
* Present findings and recommendations in a structured, professional format.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

**Loading Dataset**

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
train_cs = pd.read_csv('/content/drive/MyDrive/train.csv')
test_cs = pd.read_csv('/content/drive/MyDrive/test.csv')

pd.set_option('display.max_columns', None)

  train_cs = pd.read_csv('/content/drive/MyDrive/train.csv')


In [8]:
train_cs.sample(5)

Unnamed: 0,ID,Customer_ID,Month,Name,Age,SSN,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
70621,0x1b3cb,CUS_0x61f8,June,Rothackerr,19,705-61-0374,Developer,19397.6,,10,6,19,9,"Personal Loan, Home Equity Loan, Payday Loan, ...",49,22,1.08,9.0,Bad,1285.51,34.432247,19 Years and 3 Months,Yes,136.357457,57.05404079208591,Low_spent_Large_value_payments,230.53516906244164,Poor
63746,0x18b84,CUS_0x9778,March,Gerry Shiht,42,548-90-8373,Developer,35263.09,2808.590833,2,3,8,2,"Payday Loan, and Personal Loan",5,3,8.85,1000.0,Good,856.33,25.059982,28 Years and 6 Months,No,41.677359,60.32297699984833,High_spent_Medium_value_payments,428.8587472449599,Good
10325,0x527f,CUS_0x5bf5,June,Karen Freifelda,43,031-67-3915,Lawyer,41896.88,3664.406667,5,7,1,4_,"Mortgage Loan, Home Equity Loan, Not Specified...",22,9,6.9,5.0,Good,317.73,30.617052,,No,132.318109,51.3313676748882,High_spent_Medium_value_payments,432.7911903764679,Poor
63160,0x18816,CUS_0xc1e8,January,Jasonc,35,501-97-0944,Developer,35255.26,2667.938333,2,3,3,4,"Credit-Builder Loan, Auto Loan, Student Loan, ...",6,6,11.24,0.0,Good,88.52,35.406354,29 Years and 8 Months,No,76.879734,,Low_spent_Medium_value_payments,378.6939662407752,Standard
64733,0x1914b,CUS_0xbae9,June,Schombergu,31,348-98-9125,Scientist,106520.68,8789.723333,4,4,4052,2,"Mortgage Loan, and Home Equity Loan",6,16,14.39,8.0,Standard,653.6,29.076537,14 Years and 9 Months,Yes,98.337812,319.99590143913906,Low_spent_Small_value_payments,750.638620125865,Poor


In [9]:
test_cs.sample(5)

Unnamed: 0,ID,Customer_ID,Month,Name,Age,SSN,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance
13614,0xb590,CUS_0x42dc,November,Yinkaz,29,517-51-2372,Writer,28397.31_,2119.4425,7,4,7,2,"Student Loan, and Not Specified",30,12.0,9.4,4.0,Standard,1398.33,39.005697,27 Years and 6 Months,No,27.767024,48.90074559268657,High_spent_Large_value_payments,375.27648020935686
7448,0x6d52,CUS_0x882f,September,Nate Raymondl,20,900-53-6542,Engineer,38847.8_,,7,7,25,7,"Debt Consolidation Loan, Mortgage Loan, Debt C...",57,19.0,19.87,,Bad,3401.56,33.427384,7 Years and 3 Months,Yes,123.916548,72.43715006608541,High_spent_Large_value_payments,365.1779681015973
10195,0x8d7d,CUS_0x6c52,December,Laurence Frostt,34,873-88-8377,Musician,100241.67,8121.4725,8,5,10,4,"Student Loan, Auto Loan, Not Specified, and Mo...",22,17.0,1.8,3.0,Good,209.63,25.491983,22 Years and 3 Months,No,260.608105,212.2054362266757,High_spent_Medium_value_payments,589.3337087375054
22485,0x11d87,CUS_0xc4f2,October,Lu Jianxinm,33,525-81-0658,_______,26886.51,2517.5425,5,3,9,6,"Student Loan, Personal Loan, Home Equity Loan,...",16,,13.88,6.0,Standard,1129.83,28.79121,,Yes,106.632566,49.94397910112815,High_spent_Large_value_payments,335.17770453670863
29630,0x17140,CUS_0x61a5,November,,40,590-89-8290,Media_Manager,116184.44,,5,7,5,1,Payday Loan,17,,12.88,3.0,Good,804.9,30.653189,,No,91.250492,__10000__,High_spent_Medium_value_payments,928.8855257123056


In [6]:
print(train_cs.shape)
print(test_cs.shape)

(100000, 28)
(50000, 27)


In [7]:
print(set(train_cs.columns) - set(test_cs.columns))

{'Credit_Score'}


# **Data Understanding:**

---

### **Columns in `train_cs` and `test_cs`:**

1. **ID**: Unique identifier for each record (can be dropped as it's not useful for modeling).
2. **Customer_ID**: Unique identifier for customers (might be dropped for the same reason as ID).
3. **Month**: Month of the record, possibly related to time-based trends.
4. **Name**: Customer's name (typically irrelevant for modeling; likely dropped).
5. **Age**: Customer's age, a potential predictor of credit behavior.
6. **SSN**: Social Security Number (usually dropped as it’s sensitive and not useful for prediction).
7. **Occupation**: Type of occupation, likely categorical, useful for understanding financial behavior.
8. **Annual_Income**: Total annual income, a direct indicator of financial stability.
9. **Monthly_Inhand_Salary**: Monthly take-home salary, a key financial feature.
10. **Num_Bank_Accounts**: Number of bank accounts held by the customer.
11. **Num_Credit_Card**: Number of credit cards owned by the customer.
12. **Interest_Rate**: Applicable interest rate, potentially influencing financial behavior.
13. **Num_of_Loan**: Number of loans the customer holds.
14. **Type_of_Loan**: Categorical feature describing the type of loan.
15. **Delay_from_due_date**: Days of delay in payment (useful for identifying credit risk).
16. **Num_of_Delayed_Payment**: Number of delayed payments in the past.
17. **Changed_Credit_Limit**: Whether the credit limit has changed recently.
18. **Num_Credit_Inquiries**: Number of recent credit inquiries, which could indicate creditworthiness.
19. **Credit_Mix**: Describes the type of credit the customer holds (e.g., revolving, installment).
20. **Outstanding_Debt**: Total outstanding debt, a critical measure of financial health.
21. **Credit_Utilization_Ratio**: Ratio of used credit to available credit, an important indicator of credit risk.
22. **Credit_History_Age**: Age of the customer's credit history, indicating financial maturity.
23. **Payment_of_Min_Amount**: Indicator of whether the customer pays at least the minimum due.
24. **Total_EMI_per_month**: Total monthly EMI payments, indicating financial commitments.
25. **Amount_invested_monthly**: Amount the customer invests monthly, showing financial discipline.
26. **Payment_Behaviour**: Behavioral indicator of how the customer makes payments (categorical).
27. **Monthly_Balance**: Customer's monthly balance, a key indicator of financial stability.
28. **Credit_Score**: Target variable—customer's credit score (multiclass classification target).

---

* There are three credit scores that banks and credit card companies use to label their customers
  * Good
  * Standard
  * Poor

* A person with a good credit score will get loans from any bank and financial institution. For the task of Credit Score Classification, we need a labelled dataset with credit scores.

* Since the Credit_Score column is the target variable and it's missing from test_cs, that's expected for test data—it doesn't contain the target for predictions.