# Credit Card Customer Dataset - Full EDA Report

## 1. Import Required Libraries

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='whitegrid')


## 2. Load and Prepare Data

In [None]:

# Load data
df = pd.read_csv('/mnt/data/CC GENERAL.csv')

# Drop missing values
df_clean = df.dropna()
df_clean.shape


## 3. Exploratory Data Analysis

### 3.1 Histograms

In [None]:

df_clean.hist(bins=30, figsize=(20, 15), edgecolor='black')
plt.suptitle('Histograms of Features', fontsize=20)
plt.show()


**Observations:**
- Most features are right-skewed.
- High number of customers have low financial activity; few have extremely high values.

### 3.2 Boxplots

In [None]:

plt.figure(figsize=(20, 10))
sns.boxplot(data=df_clean)
plt.xticks(rotation=90)
plt.title('Boxplot for All Features')
plt.show()


**Observations:**
- Clear presence of outliers in `BALANCE`, `PURCHASES`, `CASH_ADVANCE`, and `CREDIT_LIMIT`.
- Features have varying spreads; some are highly variable.

### 3.3 Scatterplots

#### 3.3.1 Balance vs Credit Limit

In [None]:

plt.figure(figsize=(8, 6))
sns.scatterplot(x='CREDIT_LIMIT', y='BALANCE', data=df_clean)
plt.title('Balance vs Credit Limit')
plt.show()


**Observations:**
- Positive relationship: customers with higher credit limits tend to have higher balances.
- Several customers maintain low balances even with high limits.

#### 3.3.2 Purchases vs Payments

In [None]:

plt.figure(figsize=(8, 6))
sns.scatterplot(x='PAYMENTS', y='PURCHASES', data=df_clean)
plt.title('Purchases vs Payments')
plt.show()


**Observations:**
- Purchases and payments are positively correlated.
- Few customers have extremely high payments compared to their purchases.

## 4. Identified Relationships and Trends


- **BALANCE vs CREDIT_LIMIT:** Positive correlation (linear clusters)
- **PURCHASES vs PAYMENTS:** Positive correlation with some spread
- **General Trend:** Customers with higher financial activity also engage in higher payments and balances.
- **Data Skewness:** Most features show right skewness; normalization recommended.
- **Outliers:** Present significantly in high-value customers; careful handling needed.


## 5. Conclusion and Recommendations


- Strong positive relationships exist between financial amount features.
- Outlier handling and normalization are necessary steps before modeling.
- Customers show diverse behaviors, suggesting clustering for segmentation.
- Future work: Apply machine learning models for clustering or prediction tasks.
