# Theme 3 — SQL extraction lab (Module 2)\n\nCe notebook :\n- charge 5 CSV dans une base SQLite locale,\n- lit `m2t3_queries.sql` (3 blocs QUERY),\n- exécute les requêtes,\n- exporte 3 CSV + un `m2t3_run_report.json` (preuve d'exécution).\n\n## Entrées attendues (dans le même dossier que ce notebook)\n- `events.csv`\n- `profiles.csv`\n- `marketing.csv`\n- `support_tickets.csv`\n- `validations.csv`\n- `m2t3_queries.sql` (copie du template, puis rempli par toi)\n

## 1) Imports + helpers

In [None]:
import sqlite3, pandas as pd, json, time, re\nfrom datetime import datetime\n\n# --- 1) build sqlite db ---\ncon = sqlite3.connect("koryxa_sql_lab.db")\n\ndef load_csv(table, path):\n    df = pd.read_csv(path)\n    df.to_sql(table, con, if_exists="replace", index=False)\n    return int(len(df))\n

## 2) Charger les CSV dans SQLite

In [None]:
rows = {}\nrows["events"] = load_csv("events", "events.csv")\nrows["profiles"] = load_csv("profiles", "profiles.csv")\nrows["marketing"] = load_csv("marketing", "marketing.csv")\nrows["support_tickets"] = load_csv("support_tickets", "support_tickets.csv")\nrows["validations"] = load_csv("validations", "validations.csv")\nrows

## 3) Lire le fichier SQL (3 blocs QUERY)

In [None]:
sql_text = open("m2t3_queries.sql", "r", encoding="utf-8").read()\n\ndef extract_query(name: str) -> str:\n    # Matches: -- QUERY:name ... -- ENDQUERY\n    pattern = rf"--\s*QUERY:{re.escape(name)}\s*(.*?)\s*--\s*ENDQUERY"\n    m = re.search(pattern, sql_text, flags=re.DOTALL | re.IGNORECASE)\n    if not m:\n        raise ValueError(f"Missing query block: {name}")\n    q = m.group(1).strip()\n    if not q:\n        raise ValueError(f"Empty query block: {name}")\n    return q\n\nqueries = {\n  "q1_funnel_by_theme": extract_query("q1_funnel_by_theme"),\n  "q2_completion_by_country": extract_query("q2_completion_by_country"),\n  "q3_notebook48h_vs_validation": extract_query("q3_notebook48h_vs_validation")\n}\nlist(queries.keys())

## 4) Exécuter + exporter (CSV + run report JSON)

In [None]:
report = {\n  "created_at": datetime.utcnow().isoformat() + "Z",\n  "tables_rows": rows,\n  "queries": {}\n}\n\noutputs = {\n  "q1_funnel_by_theme": "m2t3_q1_funnel_by_theme.csv",\n  "q2_completion_by_country": "m2t3_q2_completion_by_country.csv",\n  "q3_notebook48h_vs_validation": "m2t3_q3_notebook48h_vs_validation.csv"\n}\n\nfor name, q in queries.items():\n    t0 = time.time()\n    df = pd.read_sql_query(q, con)\n    elapsed = round(time.time() - t0, 4)\n    out = outputs[name]\n    df.to_csv(out, index=False)\n    report["queries"][name] = {"rows": int(len(df)), "seconds": elapsed, "output": out}\n\nwith open("m2t3_run_report.json", "w", encoding="utf-8") as f:\n    json.dump(report, f, ensure_ascii=False, indent=2)\n\nprint("✅ Exports generated:", outputs)\nprint("✅ Run report:", report["queries"])\n