# 03 - Exploratory Data Analysis (EDA)

## Objective
Explore the cleaned price dataset to understand trends, volatility, and relationships between tickers.
Generate insights and plots that will later support dashboard visuals and project hypotheses.

## Inputs
- Cleaned dataset: `data/processed/<version>/clean_prices_<version>_latest.csv`

## Outputs
- EDA plots displayed in-notebook
- Optional saved figures to: `outputs/<version>/figures/`
- Summary stats (returns, volatility, drawdown)

## CRISP-DM Stage
Data Understanding

In [1]:
# Make the project root importable (so `import src...` works in notebooks)
import sys
from pathlib import Path

PROJECT_ROOT = Path("..").resolve()  # notebooks live in jupyter_notebooks/
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

print("Project root added to sys.path:", PROJECT_ROOT)

In [2]:
from pathlib import Path

import pandas as pd
import matplotlib.pyplot as plt

from src.config import DEFAULT_VERSION, get_paths

In [3]:
VERSION = DEFAULT_VERSION
paths = get_paths(VERSION)

PROCESSED_DIR = paths.processed_dir
OUTPUT_FIG_DIR = paths.outputs_dir / "figures"
OUTPUT_FIG_DIR.mkdir(parents=True, exist_ok=True)

data_path = PROCESSED_DIR / f"clean_prices_{VERSION}_latest.csv"
print("Loading:", data_path)

df = pd.read_csv(data_path, parse_dates=["Date"])

print("Shape:", df.shape)
print("Tickers:", sorted(df["Ticker"].unique().tolist()))
print("Date range:", df["Date"].min().date(), "to", df["Date"].max().date())
df.head()