# Notebook 2: Data Cleaning

This notebook covers the data cleaning phase.

## Objectives
- Handle missing values (forward/backward fill)
- Remove outliers using IQR method
- Align timestamps across datasets
- Handle futures rollover
- Calculate dynamic ATM strikes

In [None]:
import sys
sys.path.insert(0, '..')

import pandas as pd
import numpy as np
from pathlib import Path

from src.data.data_cleaner import DataCleaner

In [None]:
# Initialize cleaner
cleaner = DataCleaner()

# Clean all data
spot_clean, futures_clean, options_clean = cleaner.clean_all_data()

print(f"Cleaned spot data: {len(spot_clean)} records")
print(f"Cleaned futures data: {len(futures_clean)} records")
print(f"Cleaned options data: {len(options_clean)} records")

In [None]:
# Check for missing values
print("Missing values after cleaning:")
print(f"  Spot: {spot_clean.isnull().sum().sum()}")
print(f"  Futures: {futures_clean.isnull().sum().sum()}")
print(f"  Options: {options_clean.isnull().sum().sum()}")

In [None]:
# Display cleaned data statistics
print("\nSpot Data Statistics:")
display(spot_clean.describe())

## Summary

- Missing values handled with forward/backward fill
- Outliers removed using IQR method
- Timestamps aligned across all datasets
- Futures rollover handled with ratio adjustment