
# Unit 2 — Team Classification (Titanic, BQML)

**Goal (team):** Build an *ops-ready* classifier in **BigQuery ML** to predict **`survived`** on the Titanic dataset. Requirements mirror the Flights notebook for comparability.
    
**Dataset:** `bigquery-public-data.ml_datasets.titanic`

**Deliver (inside this notebook):**
- One **LOGISTIC_REG** baseline + one **engineered** model (`TRANSFORM`)
- **Evaluation** via `ML.EVALUATE` and **confusion matrices** (0.5 + custom threshold)
- **Threshold choice** + 3–5 sentence ops justification (e.g., lifeboat allocation policy in a hypothetical ops setting)
- Embedded **rubric** below


In [None]:

# --- Minimal setup (edit 2 vars) ---
from google.colab import auth
auth.authenticate_user()

import os
from google.cloud import bigquery

PROJECT_ID = "YOUR_PROJECT_ID"   # e.g., mgmt-467-47888
REGION     = "us-central1"
TABLE_PATH = "bigquery-public-data.ml_datasets.titanic"

os.environ["PROJECT_ID"] = PROJECT_ID
os.environ["REGION"]     = REGION
bq = bigquery.Client(project=PROJECT_ID)

print("BQ Project:", PROJECT_ID)
print("Source table:", TABLE_PATH)


### Quick sanity check

In [None]:

bq.query(f"SELECT * FROM `{TABLE_PATH}` LIMIT 5").result().to_dataframe()



## 1) Canonical mapping (minimal)
We map to:
- `survived` (BOOL), `pclass` (INT), `sex` (STRING), `age` (NUM), `sibsp` (INT), `parch` (INT), `fare` (NUM), `embarked` (STRING)


In [None]:

CANONICAL_BASE_SQL = f'''
WITH titanic_c AS (
  SELECT
    CAST(survived AS BOOL) AS survived,
    CAST(pclass AS INT64)  AS pclass,
    CAST(sex AS STRING)    AS sex,
    CAST(age AS FLOAT64)   AS age,
    CAST(sibsp AS INT64)   AS sibsp,
    CAST(parch AS INT64)   AS parch,
    CAST(fare AS FLOAT64)  AS fare,
    CAST(embarked AS STRING) AS embarked
  FROM `{TABLE_PATH}`
  WHERE age IS NOT NULL AND fare IS NOT NULL
)
'''
print(CANONICAL_BASE_SQL[:500] + "\n...")


### 2) Split (80/20)

In [None]:

SPLIT_CLAUSE = r'''
, split AS (
  SELECT t.*,
         CASE WHEN RAND(12345) < 0.8 THEN 'TRAIN' ELSE 'EVAL' END AS split
  FROM titanic_c t
)
'''
print(SPLIT_CLAUSE)



## 3) Baseline model — LOGISTIC_REG (`survived`)
Use a small set of signals (keep parity with Flights complexity).


In [None]:

SCHEMA = f"{PROJECT_ID}.unit2_titanic"
MODEL_BASE = f"{SCHEMA}.clf_survived_base"

sql_baseline = f'''
{CANONICAL_BASE_SQL}
{SPLIT_CLAUSE}

CREATE SCHEMA IF NOT EXISTS `{SCHEMA}`;

CREATE OR REPLACE MODEL `{MODEL_BASE}`
OPTIONS (MODEL_TYPE='LOGISTIC_REG', INPUT_LABEL_COLS=['survived']) AS
SELECT
  survived, pclass, sex, age, sibsp, parch, fare, embarked
FROM split
WHERE split='TRAIN'
;

SELECT * FROM ML.EVALUATE(
  MODEL `{MODEL_BASE}`,
  (SELECT survived, pclass, sex, age, sibsp, parch, fare, embarked
   FROM split WHERE split='EVAL')
);
'''
job = bq.query(sql_baseline); _ = job.result()
print("Baseline model trained:", MODEL_BASE)


### Confusion matrix — default 0.5 threshold

In [None]:

cm_default_sql = f'''
{CANONICAL_BASE_SQL}
{SPLIT_CLAUSE}

WITH scored AS (
  SELECT
    t.survived AS label,
    p.predicted_survived AS pred_label,
    p.predicted_survived_probs[OFFSET(0)].prob AS score
  FROM split t
  JOIN ML.PREDICT(MODEL `{MODEL_BASE}`,
      (SELECT pclass, sex, age, sibsp, parch, fare, embarked FROM split)) AS p
  ON TRUE
  WHERE split='EVAL'
)
SELECT
  SUM(CASE WHEN label=TRUE  AND pred_label=TRUE  THEN 1 ELSE 0 END) AS TP,
  SUM(CASE WHEN label=FALSE AND pred_label=TRUE  THEN 1 ELSE 0 END) AS FP,
  SUM(CASE WHEN label=TRUE  AND pred_label=FALSE THEN 1 ELSE 0 END) AS FN,
  SUM(CASE WHEN label=FALSE AND pred_label=FALSE THEN 1 ELSE 0 END) AS TN
FROM scored;
'''
bq.query(cm_default_sql).result().to_dataframe()


### Confusion matrix — your custom threshold

In [None]:

CUSTOM_THRESHOLD = 0.6   # TODO: justify in ops (e.g., conservative rescue policy)

cm_thresh_sql = f'''
{CANONICAL_BASE_SQL}
{SPLIT_CLAUSE}

WITH scored AS (
  SELECT
    t.survived AS label,
    CAST(score >= {CUSTOM_THRESHOLD} AS BOOL) AS pred_label
  FROM (
    SELECT t.*, p.predicted_survived_probs[OFFSET(0)].prob AS score
    FROM split t
    JOIN ML.PREDICT(MODEL `{MODEL_BASE}`,
        (SELECT pclass, sex, age, sibsp, parch, fare, embarked FROM split)) AS p
    ON TRUE
    WHERE split='EVAL'
  )
)
SELECT
  SUM(CASE WHEN label=TRUE  AND pred_label=TRUE  THEN 1 ELSE 0 END) AS TP,
  SUM(CASE WHEN label=FALSE AND pred_label=TRUE  THEN 1 ELSE 0 END) AS FP,
  SUM(CASE WHEN label=TRUE  AND pred_label=FALSE THEN 1 ELSE 0 END) AS FN,
  SUM(CASE WHEN label=FALSE AND pred_label=FALSE THEN 1 ELSE 0 END) AS TN
FROM scored;
'''
bq.query(cm_thresh_sql).result().to_dataframe()



## 4) Engineered model — `TRANSFORM`
Create **family_size**, **fare_bucket**, and a **sex_pclass** interaction (categorical). Compare with baseline.


In [None]:

MODEL_XFORM = f"{SCHEMA}.clf_survived_xform"

sql_xform = f'''
{CANONICAL_BASE_SQL}
{SPLIT_CLAUSE}

CREATE OR REPLACE MODEL `{MODEL_XFORM}`
TRANSFORM (
  -- engineered
  (sibsp + parch + 1) AS family_size,
  CASE
    WHEN fare < 10 THEN 'low'
    WHEN fare < 50 THEN 'mid'
    ELSE 'high'
  END AS fare_bucket,
  CONCAT(sex, '_', CAST(pclass AS STRING)) AS sex_pclass,
  -- include base features too
  pclass, sex, age, sibsp, parch, fare, embarked
)
OPTIONS (MODEL_TYPE='LOGISTIC_REG', INPUT_LABEL_COLS=['survived']) AS
SELECT * FROM split WHERE split='TRAIN'
;

SELECT 'baseline' AS model_version, * FROM ML.EVALUATE(
  MODEL `{MODEL_BASE}`,
  (SELECT survived, pclass, sex, age, sibsp, parch, fare, embarked FROM split WHERE split='EVAL')
)
UNION ALL
SELECT 'engineered' AS model_version, * FROM ML.EVALUATE(
  MODEL `{MODEL_XFORM}`,
  (SELECT * FROM split WHERE split='EVAL')
);
'''
job = bq.query(sql_xform); _ = job.result()
print("Engineered model trained:", MODEL_XFORM)



### Write-up (concise)
- **Threshold chosen & ops rationale:** …  
- **Baseline vs engineered — changes in AUC/precision/recall:** …  
- **Risk framing:** FP vs FN trade in a rescue/triage-like context: what error hurts more and why? …



---

## Rubric (Titanic, 100 pts)
**Team-only deliverable in this notebook**

- Baseline LOGISTIC_REG + evaluation (AUC + confusion @0.5) — **20**  
- Custom threshold confusion matrix + ops justification — **20**  
- Engineered model with `TRANSFORM` (family_size, fare_bucket, sex_pclass) — **20**  
- Comparison table (baseline vs engineered) + 3–5 sentence interpretation — **20**  
- Reproducibility: parameters clear, no hidden magic; mapping documented — **10**  
- Governance notes: assumptions/limitations + slices you would monitor — **10**

> **Strictness:** No screenshots; use actual results cells. Keep explanations concise (bullet points OK).
