# 🛠 02_preprocessing.ipynb  
## Feature Engineering

### 1. Objective  
Turn raw prices into **useful numbers** (`features`) that help our models learn.

> _Like cutting up fruits (prices) into bite-size pieces (features) so they’re easier to taste (model)._

---

### 2. Features We Create  
1. **Log-return** = ln(Closeₜ / Closeₜ₋₁)  
   - Makes percent-changes additive and stabilizes volatility.  
2. **RSI (Relative Strength Index)**  
   - A 0–100 momentum score: how “overbought” or “oversold” the market is.  
3. **MACD**  
   - A trend indicator showing the difference between two moving averages.

---

### 3. Steps  
1. **Load** `data/raw/sp500.csv`.  
2. **Compute** each feature.  
3. **Drop** the first few rows with missing values.  
4. **Save** as `data/processed/fe

In [1]:
# notebooks/02_preprocessing.ipynb

import pandas as pd
import numpy as np
import ta
from pathlib import Path

# 1) Load the raw CSV
root = Path().resolve().parent        # project root
raw_path = root / "data" / "raw" / "sp500.csv"
df = pd.read_csv(raw_path, index_col="Date", parse_dates=True)

# 2) Compute daily log returns
df["return"] = np.log(df["Close"] / df["Close"].shift(1))

# 3) Add technical indicators
df["rsi"]  = ta.momentum.rsi(df["Close"], window=14)
df["macd"] = ta.trend.macd_diff(df["Close"])

# 4) Drop any rows with NaNs (first few rows will be empty)
df = df.dropna()

# 5) Save the processed features
proc_path = root / "data" / "processed" / "features.csv"
df.to_csv(proc_path)

# 6) Preview
df.head()


Unnamed: 0_level_0,Open,High,Low,Close,Volume,return,rsi,macd
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2020-07-13,3205.08,3235.32,3149.43,3155.22,2694339000.0,-0.009407,55.238587,6.658386
2020-07-14,3141.11,3200.95,3127.66,3197.52,2638225000.0,0.013317,59.409451,7.621329
2020-07-15,3225.98,3238.28,3200.76,3226.56,2849504000.0,0.009041,62.025553,9.495176
2020-07-16,3208.36,3220.39,3198.59,3215.57,2214118000.0,-0.003412,60.43801,9.265467
2020-07-17,3224.21,3233.52,3205.65,3224.73,2219965000.0,0.002845,61.326496,8.994139
