##  Credit Scoring System

###  Business Understanding

####  Objective
The goal of this project is to build a Credit Scoring system that classifies customers into **Good**, **Average**, or **Poor** credit categories based on their financial and behavioral data.

#### Requirements & Expectations

- Understand the key business questions and challenges related to credit risk management:
  - Who are high-risk customers likely to default on loans or payments?
  - What factors most influence the credit score?
  - How can the credit scoring model help banks reduce losses and approve credit more accurately?

- Deliverables:
  - A cleaned and preprocessed dataset ready for modeling
  - A predictive model for credit scoring with evaluated performance metrics
  - Visualizations to help interpret customer credit behaviors
  - A user-friendly interface or report for displaying credit score results

- Expected Outcome:
  - A reliable credit scoring solution that supports decision-making in financial institutions
  - Improved accuracy in identifying customers with potential payment delays or defaults

---

####  Data Overview

The dataset contains customer financial and behavioral information:

| Feature                | Description                                               |
|------------------------|-----------------------------------------------------------|
| ID                     | Unique ID of the record                                   |
| Customer_ID            | Unique ID of the customer                                 |
| Month                  | Month of the year                                        |
| Name                   | The name of the person                                   |
| Age                    | The age of the person                                    |
| SSN                    | Social Security Number                                   |
| Occupation             | The occupation of the person                             |
| Annual_Income          | The annual income                                       |
| Monthly_Inhand_Salary  | Monthly in-hand salary                                  |
| Num_Bank_Accounts      | Number of bank accounts                                  |
| Num_Credit_Card        | Number of credit cards                                  |
| Interest_Rate          | Interest rate on credit card                            |
| Num_of_Loan            | Number of loans taken                                   |
| Type_of_Loan           | Types of loans                                         |
| Delay_from_due_date    | Average days delayed from due date                      |
| Num_of_Delayed_Payment | Number of payments delayed                              |
| Changed_Credit_Card    | Percentage change in credit card limit                  |
| Num_Credit_Inquiries   | Number of credit card inquiries                         |
| Credit_Mix             | Classification of credit mix                            |
| Outstanding_Debt       | Outstanding debt                                       |
| Credit_Utilization_Ratio | Credit utilization ratio                              |
| Credit_History_Age     | Age of credit history                                 |
| Payment_of_Min_Amount  | Whether minimum amount was paid (Yes/No)                |
| Total_EMI_per_month    | Total EMI per month                                     |
| Amount_invested_monthly | Monthly amount invested                                 |
| Payment_Behaviour      | Payment behaviour                                      |
| Monthly_Balance        | Monthly balance left                                   |
| Credit_Score           | The credit score (target variable)                      |

---

### Installation
- numpy
- pandas
- matplotlib
- seaborn
- sklearn

# Thư viện

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv('/content/credit.csv')
df.head()

Unnamed: 0,ID,Customer_ID,Month,Name,Age,SSN,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,...,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
0,5634,3392,1,Aaron Maashoh,23.0,821000265.0,Scientist,19114.12,1824.843333,3.0,...,Good,809.98,26.82262,265.0,No,49.574949,21.46538,High_spent_Small_value_payments,312.494089,Good
1,5635,3392,2,Aaron Maashoh,23.0,821000265.0,Scientist,19114.12,1824.843333,3.0,...,Good,809.98,31.94496,266.0,No,49.574949,21.46538,Low_spent_Large_value_payments,284.629162,Good
2,5636,3392,3,Aaron Maashoh,23.0,821000265.0,Scientist,19114.12,1824.843333,3.0,...,Good,809.98,28.609352,267.0,No,49.574949,21.46538,Low_spent_Medium_value_payments,331.209863,Good
3,5637,3392,4,Aaron Maashoh,23.0,821000265.0,Scientist,19114.12,1824.843333,3.0,...,Good,809.98,31.377862,268.0,No,49.574949,21.46538,Low_spent_Small_value_payments,223.45131,Good
4,5638,3392,5,Aaron Maashoh,23.0,821000265.0,Scientist,19114.12,1824.843333,3.0,...,Good,809.98,24.797347,269.0,No,49.574949,21.46538,High_spent_Medium_value_payments,341.489231,Good


# Hiểu dữ liệu

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 28 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   ID                        100000 non-null  int64  
 1   Customer_ID               100000 non-null  int64  
 2   Month                     100000 non-null  int64  
 3   Name                      100000 non-null  object 
 4   Age                       100000 non-null  float64
 5   SSN                       100000 non-null  float64
 6   Occupation                100000 non-null  object 
 7   Annual_Income             100000 non-null  float64
 8   Monthly_Inhand_Salary     100000 non-null  float64
 9   Num_Bank_Accounts         100000 non-null  float64
 10  Num_Credit_Card           100000 non-null  float64
 11  Interest_Rate             100000 non-null  float64
 12  Num_of_Loan               100000 non-null  float64
 13  Type_of_Loan              100000 non-null  ob

- Bộ dữ liệu gồm có 100000 bản ghi và 28 đặc trưng, mỗi bản ghi tương ứng với 1 khách hàng

In [6]:
# thống kê:
df.describe()

Unnamed: 0,ID,Customer_ID,Month,Age,SSN,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,...,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Total_EMI_per_month,Amount_invested_monthly,Monthly_Balance
count,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,...,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0,100000.0
mean,80631.5,25982.66664,4.5,33.31634,500461700.0,50505.123449,4197.270835,5.36882,5.53357,14.53208,...,21.08141,13.31312,10.470323,5.79825,1426.220376,32.285173,221.22046,107.699208,55.101315,392.697586
std,43301.486619,14340.543051,2.291299,10.764812,290826700.0,38299.422093,3186.432497,2.593314,2.067098,8.74133,...,14.80456,6.237166,6.609481,3.867826,1155.129026,5.116875,99.680716,132.267056,39.006932,201.652719
min,5634.0,1006.0,1.0,14.0,81349.0,7005.93,303.645417,0.0,0.0,1.0,...,0.0,0.0,0.5,0.0,0.23,20.0,1.0,0.0,0.0,0.00776
25%,43132.75,13664.5,2.75,24.0,245168600.0,19342.9725,1626.594167,3.0,4.0,7.0,...,10.0,9.0,5.38,3.0,566.0725,28.052567,144.0,29.268886,27.959111,267.615983
50%,80631.5,25777.0,4.5,33.0,500688600.0,36999.705,3095.905,5.0,5.0,13.0,...,18.0,14.0,9.4,5.0,1166.155,32.305784,219.0,66.462304,45.15655,333.865366
75%,118130.25,38385.0,6.25,42.0,756002700.0,71683.47,5957.715,7.0,7.0,20.0,...,28.0,18.0,14.85,8.0,1945.9625,36.496663,302.0,147.392573,71.295797,463.215683
max,155629.0,50999.0,8.0,56.0,999993400.0,179987.28,15204.633333,11.0,11.0,34.0,...,62.0,25.0,29.98,17.0,4998.07,50.0,404.0,1779.103254,434.191089,1183.930696


- độ tuổi trung bình của khách hàng là 33 tuổi; thấp nhất là 14 tuổi; cao nhất 56 tuổi
- thu nhập trung bình hằng năm của khách hàng là 50505 đô; thấp nhất là 7005 đô; cao nhất là 179987 đô

In [7]:
# thống kê dữ liệu định tính:
df.describe(include='object')

Unnamed: 0,Name,Occupation,Type_of_Loan,Credit_Mix,Payment_of_Min_Amount,Payment_Behaviour,Credit_Score
count,100000,100000,100000,100000,100000,100000,100000
unique,10128,15,6261,3,3,6,3
top,Jessicad,Lawyer,No Data,Standard,Yes,Low_spent_Small_value_payments,Standard
freq,48,7096,11408,45848,52326,28616,53174


- có 10128 đối tượng khách hàng
- điểm tín dụng(Credit_Score) có 3 mức
- mức điểm tín dụng chiếm nhiều nhất là: Standard(tiêu chuẩn) với số lượng 53174 bản ghi


In [9]:
df["Credit_Mix"].unique()

array(['Good', 'Standard', 'Bad'], dtype=object)

In [8]:
# đếm số Standard tại cột Credit_Score:
df['Credit_Score'].value_counts()

Unnamed: 0_level_0,count
Credit_Score,Unnamed: 1_level_1
Standard,53174
Poor,28998
Good,17828


# TIỀN XỬ LÝ DỮ LIỆU

# kiểm tra missing data

In [5]:
df.isnull().sum()

Unnamed: 0,0
ID,0
Customer_ID,0
Month,0
Name,0
Age,0
SSN,0
Occupation,0
Annual_Income,0
Monthly_Inhand_Salary,0
Num_Bank_Accounts,0


- bộ dữ liệu không có giá trị thiếu

# giải quyết dữ liệu không nhất quán