# BRFSS Dashboard Project — Full Workflow Explanation
### By Swapnanil (with ChatGPT)

This notebook explains the **entire workflow** behind your BRFSS Dash application:
- Data preparation
- Parquet conversion
- Merge logic (ResponseID, BreakoutID)
- Aggregation model (overall, gender, age, race, education, income, state, temporal)
- Confidence interval math
- Dashboard structure
- How to explain everything during your presentation


## 1. Data Loading & Parquet Conversion
### Why Parquet?
- CSV is slow, uncompressed, and heavy for large datasets.
- Parquet loads 5–10× faster and is optimized for column operations.

### Code used to convert CSV → Parquet

In [None]:
import pandas as pd
df = pd.read_csv('BRFSS.csv', low_memory=False)
df.to_parquet('brfss.parquet')
df.head()

## 2. Merge Rules
The dataset includes multiple codes for the same categories across years.
We unify them using merge rules identical to the professor's R code.

In [None]:
from utils.merges import merge_response_id, merge_breakout_id
df['ResponseID'] = merge_response_id(df['ResponseID'])
df['BreakoutID'] = merge_breakout_id(df['BreakoutID'])

## 3. Aggregation Model (Core Statistics)
### Percent Calculation
`percent = persons_sum * 100 / true_sample_size`

### Standard Deviation
`sd = sqrt(percent * (100 - percent) / sample_size)`

### 95% CI
`CI = percent ± 2 * sd`
Exactly matches the R implementation.

In [None]:
from utils.aggregation import aggregate_overall
sample_q = df[df['Question'].notna()].iloc[0]['Question']
summary = aggregate_overall(df[df['Question']==sample_q])
summary

## 4. Dash App Architecture
The application is modular and clean:

### A. Dropdown Inputs
- Class
- Topic
- Question

### B. Analysis Panels
- Overall
- Gender
- Age Group
- Race
- Education
- Income
- Temporal Trend
- State/Territory

### C. Interactive Filters
- Show All
- Top 3 (More)
- Bottom 3 (Less)

### D. Visualizations
- Bar charts with confidence intervals
- Line charts for trends

## 5. How to Explain the Project (Presentation Script)
Use this script when you present:

### Step 1 — Introduction
"This dashboard analyzes BRFSS health survey data using stratified summaries and confidence intervals."

### Step 2 — Data Preparation
"The dataset is huge, so we convert CSV to Parquet and apply merge rules to fix inconsistent labels."

### Step 3 — Statistical Model
"Each panel computes percentages and 95% confidence intervals using the exact formulas from class."

### Step 4 — Dashboard Design
"Selecting Class → Topic → Question updates all charts across gender, race, age, and income."

### Step 5 — Technical Quality
"The code is modular (prepare, merges, aggregation, options, app.py) and optimized for performance."

### Step 6 — Conclusion
"The dashboard is fast, interactive, and useful for public‑health insights."


## 6. Final Notes
- This notebook is your documentation.
- Read it once fully and you will understand the entire workflow end‑to‑end.
- I can also create slides or a PDF report if you want.
