# 02 - Exploratory Data Analysis (Micro + Macro)

In this notebook, I personally explore the German Credit dataset along with Eurostat Non-Performing Loans data to uncover meaningful patterns in credit risk. All analysis below reflects my approach and reasoning, fully reproducible for local, Colab, or Kaggle execution.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

from pathlib import Path
DATA_DIR = Path('../data')
PROCESSED_DIR = DATA_DIR / 'processed'
RAW_DIR = DATA_DIR / 'raw'

PROCESSED_DIR.mkdir(exist_ok=True, parents=True)
RAW_DIR.mkdir(exist_ok=True, parents=True)

## 1. Load German Credit Dataset (processed)

In [None]:
german_path = PROCESSED_DIR / 'german_credit_clean.csv'
if german_path.exists():
    credit_data = pd.read_csv(german_path)
    print('Loaded German Credit dataset:', credit_data.shape)
else:
    print('German Credit dataset not found. Please run 01_data_ingestion_cleaning.ipynb first.')

## 2. Load Eurostat NPL Dataset (macro)

In [None]:
eurostat_path = RAW_DIR / 'eurostat_npl.csv'
if eurostat_path.exists():
    macro_npl = pd.read_csv(eurostat_path)
    print('Loaded Eurostat NPL dataset:', macro_npl.shape)
else:
    print('Eurostat dataset not found. Please download CSV from Eurostat and save as data/raw/eurostat_npl.csv')

## 3. Overview of Micro Dataset

In [None]:
if 'credit_data' in locals():
    display(credit_data.info())
    display(credit_data.describe())
    display(credit_data.head())

## 4. Micro-level Visualizations (German Credit)

In [None]:
if 'credit_data' in locals():
    numeric_cols = credit_data.select_dtypes(include=[np.number]).columns.tolist()
    # Histograms
    credit_data[numeric_cols].hist(bins=20, figsize=(12,8))
    plt.suptitle('Distribution of Numeric Features (Micro)')
    plt.show()
    
    # Correlation heatmap
    plt.figure(figsize=(10,8))
    sns.heatmap(credit_data[numeric_cols].corr(), annot=True, fmt='.2f', cmap='coolwarm')
    plt.title('Correlation Heatmap (Micro)')
    plt.show()

## 5. Macro-level Visualizations (Eurostat NPL)

In [None]:
if 'macro_npl' in locals():
    # Assuming columns: Country, Year, NPL_ratio
    if set(['Country','Year','NPL_ratio']).issubset(macro_npl.columns):
        fig = px.line(macro_npl, x='Year', y='NPL_ratio', color='Country', title='Non-Performing Loan Ratio by Country')
        fig.show()
    else:
        print('Check column names for Eurostat dataset: expected Country, Year, NPL_ratio')

## 6. My Observations and Insights
- Observe correlations between numeric features and default.
- Note patterns in categorical variables vs default.
- Macro NPL trends may align with periods of higher default in micro-level data.