# 04 â€” Load Clean Data & Add Features

This notebook bridges SQL and modelling.

**Steps:**
- Loads the cleaned `clean_sales_final` table from SQLite
- Converts `Year_Month` to a pandas datetime
- Adds the time-based features:
  - `Year`
  - `Month`
  - `Month_Since_Start` (relative month index from the first observation)
- Saves:
  - `clean_sales_final.csv`
  - `clean_sales_with_features.csv`

These files are the direct input to all modelling notebooks.


In [None]:
import os
import sqlite3
import pandas as pd

DB_PATH = "data/auto_sales.db"
conn = sqlite3.connect(DB_PATH)

df = pd.read_sql(
    "SELECT * FROM clean_sales_final ORDER BY Make, Body_Type, Year_Month;",
    conn,
)
df["Year_Month"] = pd.to_datetime(df["Year_Month"])

os.makedirs("data", exist_ok=True)
df.to_csv("data/clean_sales_final.csv", index=False)

display(df.head())

In [None]:
ref_date = df["Year_Month"].min()

df["Year"] = df["Year_Month"].dt.year
df["Month"] = df["Year_Month"].dt.month
df["Month_Since_Start"] = (
    (df["Year_Month"].dt.year - ref_date.year) * 12
    + (df["Year_Month"].dt.month - ref_date.month)
)

df.to_csv("data/clean_sales_with_features.csv", index=False)
print("Saved: data/clean_sales_with_features.csv")
display(df.head())