# ETL Demo Notebook

This notebook demonstrates the ingestion of AdventureWorks (SQL Server) and Azure Open Datasets (weather/demographics), cleaning, and producing an analytics-ready CSV. Fill `.env` and ensure SQL Server is reachable.

**Run order:** 1) Configure `.env` 2) Start SQL Server (docker or local) 3) Place AdventureWorks DB or use existing DB 4) Run cells.

In [1]:
from dotenv import load_dotenv
load_dotenv()
print('Loaded env vars from .env')

Loaded env vars from .env


In [5]:
# Install dependencies (uncomment if needed)
!pip3.11 install -r requirements.txt

Collecting pyodbc (from -r requirements.txt (line 3))
  Downloading pyodbc-5.2.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (2.7 kB)
Collecting azure-storage-blob (from -r requirements.txt (line 5))
  Using cached azure_storage_blob-12.26.0-py3-none-any.whl.metadata (26 kB)
Collecting jupyterlab (from -r requirements.txt (line 8))
  Downloading jupyterlab-4.4.5-py3-none-any.whl.metadata (16 kB)
Collecting azure-core>=1.30.0 (from azure-storage-blob->-r requirements.txt (line 5))
  Using cached azure_core-1.35.0-py3-none-any.whl.metadata (44 kB)
Collecting cryptography>=2.1.4 (from azure-storage-blob->-r requirements.txt (line 5))
  Downloading cryptography-45.0.6-cp311-abi3-macosx_10_9_universal2.whl.metadata (5.7 kB)
Collecting isodate>=0.6.1 (from azure-storage-blob->-r requirements.txt (line 5))
  Using cached isodate-0.7.2-py3-none-any.whl.metadata (11 kB)
Collecting async-lru>=1.0.0 (from jupyterlab->-r requirements.txt (line 8))
  Downloading async_lru-2.0.5-py3-none-any.whl.meta

In [12]:
from src.connectors.sqlserver_adventureworks import get_engine_from_env, get_sales_transactions
engine = get_engine_from_env()
sales = get_sales_transactions(engine, start_date=None)
print('sales rows:', len(sales))
sales.head()

OperationalError: (pyodbc.OperationalError) ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')
(Background on this error at: https://sqlalche.me/e/20/e3q8)

In [None]:
# Load Azure enrichment CSVs (if present)
import pandas as pd
from pathlib import Path
w = Path('data/raw/azure/weather_daily.csv')
if w.exists():
    weather = pd.read_csv(w)
    print('weather rows:', len(weather))
    display(weather.head())
else:
    print('No weather file found at', w)


In [None]:
# Run the packaged ETL job
!python src/etl/jobs/job_seed_and_ingest.py

## Next steps

- Connect `data/analytics/analytics_ready.csv` to Power BI and design dashboards.
- Add data quality checks (Great Expectations) and scheduling (Airflow/Azure Functions).