<a href="https://colab.research.google.com/github/adinath7l/CreditScoreEDA/blob/main/Copy_of_Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -The goal of this project is to understand which financial, behavioral, and demographic factors influence a customer’s credit score. Using the dataset provided, the project aims to identify relationships between income, credit utilization, delayed payments, number of loans, and overall credit health.



##### **Project Type**    - EDA
##### **Contribution**    - Individual

In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
!mkdir -p '/content/drive/MyDrive/CreditScoreProject'

In [None]:
import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/CreditScoreProject/dataset.csv')
df.head()

In [None]:
df.shape

In [None]:
df.columns

In [None]:
df.head(20)

In [None]:
df.tail()

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
df.isnull().sum()



*   Initial Data Understanding Notes


*   Dataset has 100000 rows and 28 columns.


*   Columns appear to include ['ID', 'Customer_ID', 'Month', 'Name', 'Age', 'SSN', 'Occupation',
       'Annual_Income', 'Monthly_Inhand_Salary', 'Num_Bank_Accounts',
       'Num_Credit_Card', 'Interest_Rate', 'Num_of_Loan', 'Type_of_Loan',
       'Delay_from_due_date', 'Num_of_Delayed_Payment', 'Changed_Credit_Limit',
       'Num_Credit_Inquiries', 'Credit_Mix', 'Outstanding_Debt',
       'Credit_Utilization_Ratio', 'Credit_History_Age',
       'Payment_of_Min_Amount', 'Total_EMI_per_month',
       'Amount_invested_monthly', 'Payment_Behaviour', 'Monthly_Balance',
       'Credit_Score']


*   No columns have missing values.

*   No columns need datatype fixes.

In [None]:
df.drop('SSN', axis=1, inplace=True)

In [None]:
df.sort_values(by='Customer_ID').loc[ : , ['Customer_ID', 'Name', 'Num_of_Delayed_Payment', 'Outstanding_Debt', 'Total_EMI_per_month', 'Monthly_Inhand_Salary', 'Credit_Score'] ]

In [None]:
df.groupby(['Customer_ID', 'Name']).agg(
    Num_of_Delayed_Payment=('Num_of_Delayed_Payment', 'mean'),
    Outstanding_Debt=('Outstanding_Debt', 'mean'),
    Total_EMI_per_month=('Total_EMI_per_month', 'mean'),
    Monthly_Inhand_Salary=('Monthly_Inhand_Salary', 'mean'),
    Credit_Score_Mode=('Credit_Score', lambda x: x.mode()[0])
)

In [None]:
df['Debt_Income_Ratio'] = df['Outstanding_Debt'] / df['Annual_Income']
df['Debt_Income_Ratio'].head()


In [None]:
df['EMI_Burden'] = df['Total_EMI_per_month'] / (df['Annual_Income'] / 12)
df['EMI_Burden'].head()

In [None]:
bins = [18, 30, 40, 50, 60, 100]
labels = ['18-30','30-40','40-50','50-60','60+']
df['Age_Group'] = pd.cut(df['Age'], bins=bins, labels=labels)
df['Age_Group'].value_counts()

# **Univariate Analysis**

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
plt.figure(figsize=(7,4))
sns.histplot(df['Annual_Income'], kde=True)
plt.title('Annual Income Distribution')
plt.xlabel('Annual Income')
plt.ylabel('Count')
plt.show()

In [None]:
plt.figure(figsize=(7,4))
sns.histplot(df['Age'], bins=20, kde=True)
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

In [None]:
plt.figure(figsize=(7,4))
sns.histplot(df['Credit_Utilization_Ratio'], kde=True)
plt.title('Credit Utilization Ratio')
plt.xlabel('Utilization Ratio')
plt.show()

In [None]:
plt.figure(figsize=(7,4))
sns.histplot(df['Num_of_Delayed_Payment'], bins=15)
plt.title('Delayed Payments')
plt.xlabel('Number of Delayed Payments')
plt.show()

In [None]:
plt.figure(figsize=(7,4))
sns.histplot(df['Outstanding_Debt'], kde=True)
plt.title('Outstanding Debt Distribution')
plt.xlabel('Outstanding Debt')
plt.show()

Here are the concise insights from the last four graphs:

*   **Annual Income Distribution**: Shows income spread; reveals typical income brackets and skew (e.g., more lower-income individuals).
*   **Age Distribution**: Highlights dominant age groups in the customer base, valuable for demographic targeting.
*   **Credit Utilization Ratio**: Indicates how much credit is used versus available; high values suggest potential financial strain, low values indicate responsible credit use.
*   **Delayed Payments**: Measures frequency of late payments, a key risk indicator. Reveals the proportion of customers with zero, few, or many delays.

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(data=df, x='Annual_Income', y='Outstanding_Debt', alpha=0.6)
plt.title('Relationship between Annual Income and Outstanding Debt')
plt.xlabel('Annual Income')
plt.ylabel('Outstanding Debt')
plt.show()

In [None]:
plt.figure(figsize=(7,4))
sns.histplot(data=df, x='Credit_Utilization_Ratio', y='Num_of_Delayed_Payment')
plt.title('Utilization Ratio vs Delayed Payments')
plt.xlabel('Utilization Ratio')
plt.ylabel('Delayed Payments')
plt.show()

In [None]:
plt.figure(figsize=(7,4))
sns.barplot(data=df, x='Age_Group', y='Outstanding_Debt', estimator='mean')
plt.title('Average Debt by Age Group')
plt.xlabel('Age Group')
plt.ylabel('Average Outstanding Debt')
plt.show()

In [None]:
plt.figure(figsize=(7,4))
monthly = df.groupby('Month')['EMI_Burden'].mean().reset_index()

sns.lineplot(data=monthly, x='Month', y='EMI_Burden', marker='o')
plt.title('Monthly Trend of EMI Burden')
plt.xlabel('Month')
plt.ylabel('Average EMI Burden')
plt.show()

## **Bivariate Analysis Insights**

*   **Income and debt show correlation** — some high earners still keep large debts.
*   **Higher credit utilization doesn't corresponds with more delayed payments.**
*   **Age groups show similar debt behavior till 50 age**.
*   **EMI burden shows mild seasonality** across months.

In [None]:
numeric_cols = ['Annual_Income','Outstanding_Debt','Debt_Income_Ratio',
                'Credit_Utilization_Ratio','Num_of_Delayed_Payment',
                'Total_EMI_per_month']

df[numeric_cols].corr()

In [None]:
corr_cols = [
    'Annual_Income',
    'Num_Credit_Card',
    'Num_of_Delayed_Payment',
    'Credit_Utilization_Ratio',
    'Outstanding_Debt',
    'Debt_Income_Ratio',
    'Monthly_Inhand_Salary'
]

# Calculate the correlation matrix
correlation_matrix = df[corr_cols].corr()

# Create a mask for correlations that are not strong enough
mask = (correlation_matrix.abs() < 0.4)

plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", mask=mask)
plt.title('Correlation Matrix with Highlighted Strong Correlations (Abs > 0.4)')
plt.show()

In [None]:
df.groupby('Age_Group')['Num_of_Delayed_Payment'].mean()

In [None]:
plt.figure(figsize=(10,5))
sns.boxplot(data=df, x='Occupation', y='Num_of_Delayed_Payment')
plt.xticks(rotation=45)
plt.title('Delayed Payments by Occupation')
plt.show()

## Remarks on Correlation Analysis

*   **Num_Credit_Card vs Delayed Payments**: Correlation weaker than expected → having more cards doesn’t always mean irresponsibility.
*   **Outstanding debt vs Delayed Payments**: Moderate correlation → financial pressure increases payment delays.
*   Debt levels doesn't differ across occupations.

## **5. Solution to Business Objective**

# **Project Summary -**

Write the summary here within 500-600 words.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

### Dataset Loading

In [None]:
# Load Dataset

### Dataset First View

In [None]:
# Dataset First Look

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

### Dataset Information

In [None]:
# Dataset Info

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

In [None]:
# Visualizing the missing values

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

In [None]:
# Dataset Describe

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***

# Task
- text_cell: |-
    ## **Bivariate Analysis Insights**

    *   **Annual Income and Outstanding Debt**: There appears to be a direct relationship between annual income and outstanding debt, where individuals with higher annual incomes tend to have higher outstanding debt. This could imply that higher earners have access to more credit or are more likely to take on larger financial commitments.
    *   **Credit Utilization Ratio and Delayed Payments**: The analysis indicates that higher credit utilization ratios are often associated with a greater number of delayed payments. This suggests that customers who use a significant portion of their available credit are more prone to financial difficulties and late payments.
    *   **Age Group and Outstanding Debt**: The average outstanding debt varies significantly across different age groups. This variation can reflect different life stages, such as younger individuals accumulating educational or early career debt, and middle-aged individuals potentially having higher debt due to mortgages or family expenses.
    *   **Monthly Trend of EMI Burden**: The monthly trend of EMI burden shows fluctuations over time. This could highlight periods of increased financial strain for customers or reveal seasonal patterns in their loan repayment obligations.

## Add Bivariate Insights Text Cell

### Subtask:
Add a new text cell with the provided bivariate insights.


## Summary:

### Data Analysis Key Findings

*   There is a direct relationship between annual income and outstanding debt, suggesting that individuals with higher incomes tend to have more debt, possibly due to greater access to credit or larger financial commitments.
*   Higher credit utilization ratios are associated with an increased number of delayed payments, indicating that customers using a significant portion of their available credit are more susceptible to financial difficulties.
*   Outstanding debt varies across different age groups, reflecting diverse life stages and associated financial obligations (e.g., educational debt for younger individuals, mortgages for middle-aged individuals).
*   The monthly trend of EMI (Equated Monthly Installment) burden shows fluctuations, which could point to seasonal patterns or specific periods of financial strain for customers.

### Insights or Next Steps

*   Further investigation into the types of debt incurred by different income and age groups could provide a more nuanced understanding of credit behavior and risk.
*   Developing targeted financial counseling or credit management programs for customers with high credit utilization ratios could help mitigate the risk of delayed payments.
