# Visual Talking Points for Presentation\n\nUse one section per slide. Edit wording to match your speaking style.

## Slide: ETL Overview (`images/ETL_Overview.png`)\n\n- This visual shows the full data lifecycle: **Raw Sources -> Extract -> Transform -> Load (Supabase) -> QA Checks -> Derived Tables -> Notebook Analytics**.\n- The key message is that analysis results are not direct raw outputs; they come after standardization, key harmonization, and validation.\n- QA is explicitly in the pipeline (duplicate checks, missingness checks, year-range checks), which protects against misleading trend conclusions.\n- Derived tables (`national_enrollment_trend`, `incident_rate_per_100k`) are purpose-built for analysis and reporting speed.\n- Presentation line: *"We designed the pipeline so the analytical layer only sees validated, structurally consistent data."*

## Slide: Entity Relationship Diagram (`images/entity_relationship.png`)\n\n- `incident` is the hub table keyed by `Incident_ID`.\n- `shooter`, `victim`, and `weapon` are **1-to-many** child tables linked by `Incident_ID`.\n- Enrollment/rate tables are linked at the year level: `national_enrollment_trend` joins to `incident_rate_per_100k` by `year`.\n- This structure supports both micro-level incident detail and macro-level national trend analysis.\n- Presentation line: *"The model separates event facts from participant/weapon detail while still supporting year-level policy analysis."*

## Slide: Row Count Comparison (`images/row_count_comparison.png`)\n\n- Current row totals: `victim` **8,370**, `shooter` **3,542**, `weapon` **3,168**, `incident` **3,136**.\n- Higher counts in child tables are expected because one incident can involve multiple victims, shooters, or weapons.\n- This confirms relational expansion is occurring where expected and supports downstream join logic.\n- Presentation line: *"The distribution validates our table design: incident is the anchor, and participant-level detail scales independently."*

## Slide: Column Data Types (`images/column_data_types.png`)\n\n- `incident`: **47 categorical**, **3 numeric**, **0 datetime** columns.\n- `shooter`: **8 categorical**, **0 numeric**, **0 datetime**.\n- `victim`: **6 categorical**, **0 numeric**, **0 datetime**.\n- `weapon`: **4 categorical**, **0 numeric**, **0 datetime**.\n- Key interpretation: this is a heavily categorical schema, so most modeling and QA work depends on controlled values, encoding choices, and standardization rather than raw continuous variables.\n- Presentation line: *"This is primarily a categorical data problem, which is why harmonization and schema consistency are central to analysis quality."*

## Slide: Missing Values (`images/missing_values.png`)\n\n- Largest missingness in `victim`: `Race` **95.82%**, `Gender` **62.97%**, `School_Affiliation` **57.13%**, `Age` **56.46%**, `Incident_ID` **52.35%**, `Injury` **52.35%**.\n- Largest missingness in `weapon`: `Weapon_Details` **93.72%**, `Weapon_Caliber` **49.97%**, `Weapon_Type` **6.50%**.\n- Practical implication: demographic and weapon-detail analyses should be presented with caution, missing-data caveats, and sensitivity checks.\n- Presentation line: *"The strongest signal in missingness is not random noise; it systematically limits interpretability for victim demographics and weapon granularity."*

## Optional Closing Slide (Data Credibility Summary)\n\n- Data architecture is normalized and analysis-ready.\n- QA checks are integrated into the pipeline, not treated as a one-time cleanup.\n- Primary risk to inference is missing detail fields, not table structure.\n- Final line: *"Our conclusions are strongest at the incident and trend level, and more cautious at fine-grained demographic/weapon-detail levels."*