# 02 â€” SQL Setup

This notebook initialises the SQLite database that powers the SQL-first ETL.

**Steps:**
- Creates `data/auto_sales.db`
- Writes:
  - `raw_sales` table from `raw_sales_verified.csv`
  - `translations` table from `translations_verified.csv`
- Adds a few useful indexes
- Verifies that tables exist and look correct

After this, the database is ready for the advanced cleaning logic in Notebook 03.


In [None]:
import sqlite3
import pandas as pd

RAW_VERIFIED = "data/raw_sales_verified.csv"
TRANS_VERIFIED = "data/translations_verified.csv"
DB_PATH = "data/auto_sales.db"

raw = pd.read_csv(RAW_VERIFIED)
trans = pd.read_csv(TRANS_VERIFIED)

conn = sqlite3.connect(DB_PATH)
raw.to_sql("raw_sales", conn, if_exists="replace", index=False)
trans.to_sql("translations", conn, if_exists="replace", index=False)

print(f"Database initialised at {DB_PATH}")

In [None]:
with conn:
    conn.execute("CREATE INDEX IF NOT EXISTS idx_raw_make ON raw_sales(Make);")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_raw_body ON raw_sales(Body_Type);")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_raw_ym   ON raw_sales(Year_Month);")

tables = pd.read_sql("SELECT name FROM sqlite_master WHERE type='table';", conn)
print("Tables in DB:", tables["name"].tolist())

In [None]:
print("Sample from raw_sales:")
display(pd.read_sql("SELECT * FROM raw_sales LIMIT 5;", conn))

print("Sample from translations:")
display(pd.read_sql("SELECT * FROM translations LIMIT 5;", conn))