# 2. Exploratory Data Analysis - Credit Card Data

This notebook performs an EDA on the credit card transaction dataset. This dataset is already highly processed (PCA features) and extremely imbalanced.

In [None]:
%load_ext autoreload
%autoreload 2

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from src.data.loading import load_creditcard_data
from src.data.cleaning import handle_missing_values, validate_data
from src.visualization.eda import (
    plot_univariate, 
    plot_class_distribution, 
    plot_correlation_matrix
)

## Load Data

In [None]:
cc_df = load_creditcard_data()
print(f"Credit Card Data Shape: {cc_df.shape}")
cc_df.head()

## Data Cleaning and Validation

In [None]:
if validate_data(cc_df):
    print("Data validation passed.")
else:
    print("Data validation failed!")

# This dataset is usually clean, but let's be sure
cc_df = handle_missing_values(cc_df)

## Class Distribution

The imbalance in this dataset is typically extreme.

In [None]:
plot_class_distribution(cc_df, col='Class')

## Univariate Analysis

Analyzing Time and Amount distributions.

In [None]:
plot_univariate(cc_df, col='Time', kind='hist')
plot_univariate(cc_df, col='Amount', kind='hist')

## Correlation Analysis

Exploring correlations between PCA components and the target.

In [None]:
plot_correlation_matrix(cc_df)

## Conclusion

- Highly imbalanced dataset.
- PCA components (V1-V28) are decorrelated by design.
- Standard scaling or robust scaling might be needed for the 'Amount' feature later.