# Session 1 — Data Engineering Overview & First SQL

Understand the DE lifecycle and run your first SQL queries. Includes simple seaborn plots on synthetic data.


## Environment Setup

In [None]:
import sys, sqlite3, pandas as pd, numpy as np, matplotlib.pyplot as plt
print(sys.version)
import seaborn as sns
sns.set_theme()
from pathlib import Path
DB_PATH = Path('course.db')
conn = sqlite3.connect(DB_PATH)
conn.execute('PRAGMA foreign_keys=ON;')
print('SQLite ready at', DB_PATH.resolve())

In [None]:
def run_sql(q, params=None):
    params = params or {}
    df = pd.read_sql_query(q, conn, params=params)
    display(df)
    return df

## 1. Data Engineering in a Nutshell
**Lifecycle:** Ingest → Validate/Clean → Transform/Model → Store → Serve/Analyze → Orchestrate/Monitor  
**Why SQL?** It's the universal language for querying structured data, building transformations, and validating quality.

### Simple Data Pipeline Diagram (ASCII)
```
Sources -> [Ingest] -> [Staging] -> [Clean/Validate] -> [Transform] -> [Serve (BI/Data Science)]
```

In [None]:
# Seed two tiny tables
conn.executescript('''
DROP TABLE IF EXISTS customers;
DROP TABLE IF EXISTS products;
CREATE TABLE customers(
  customer_id INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
  city TEXT,
  country TEXT,
  address TEXT
);
CREATE TABLE products(
  product_id INTEGER PRIMARY KEY,
  product TEXT NOT NULL,
  price REAL NOT NULL CHECK(price>=0)
);
INSERT INTO customers(name,city,country,address) VALUES
 ('Aria','Austin','USA','1 River Rd'),
 ('Ben','Berlin','Germany','Karlstr. 9'),
 ('Chloe','Chicago','USA',NULL),
 ('Dai','Denver','USA','11 Pine St');
INSERT INTO products(product,price) VALUES
 ('Keyboard',39.99),('Mouse',19.5),('Monitor',189.0),('USB-C Cable',9.99);
'''); conn.commit()
print("Seeded tables.")

## 2. First SQL Queries

In [None]:
run_sql("SELECT * FROM customers ORDER BY city, name;");
run_sql("SELECT product, price FROM products ORDER BY price DESC;");

## 3. Quick EDA with seaborn

In [None]:
dfp = pd.read_sql_query("SELECT * FROM products;", conn)
sns.barplot(data=dfp, x="product", y="price")
plt.title("Product Prices")
plt.xlabel("Product"); plt.ylabel("Price"); plt.show()