A Python-powered data pipeline that ingests raw U.S. polling data, computes rolling weighted averages, and renders interactive visualizations for election and generic ballot tracking.
poll-aggregator-engine/
├── update_csv.py # Data pipeline — rolling average computation
├── ballot_final.csv # Processed output data with computed averages
└── Visualize.html # Interactive chart + table frontend
The pipeline reads a raw CSV of polling data with the following schema:
| Column | Description |
|---|---|
Pollster |
Name of the polling organization |
Date |
Poll end date |
ApprovalRating |
Republican / candidate A result (%) |
DisapprovalRating |
Democrat / candidate B result (%) |
Spread |
Raw margin between candidates |
Grade |
Pollster reliability grade |
The core of the pipeline applies a rolling window average across all polls:
- Reverses the dataframe so averaging flows oldest → newest
- Applies a configurable sliding window (default: 10 polls)
- Computes
Avg_ApprovalRatingandAvg_DisapprovalRatingper row - Re-inserts the computed columns into the correct position in the schema
- Outputs a clean
ballot_final.csvready for the frontend
window_size = 10 # Configurable: 10 = congress/generic | 6 = state races | 4 = others
for i in range(len(df_rev)):
window = slice(max(0, i - window_size + 1), i + 1)
avg_approval.append(round(df_rev['ApprovalRating'][window].mean(), 1))
avg_disapproval.append(round(df_rev['DisapprovalRating'][window].mean(), 1))Why reverse then re-reverse? Pandas rolling averages need data in chronological order to correctly weight recency. The dataframe is reversed before averaging, then flipped back so the newest polls appear at the top of the output — matching how the frontend consumes the data.
After processing, the CSV has two new computed columns injected between DisapprovalRating and Spread:
| Column | Description |
|---|---|
Avg_ApprovalRating |
10-poll rolling average — Republican side |
Avg_DisapprovalRating |
10-poll rolling average — Democrat side |
The frontend reads ballot_final.csv directly in the browser via fetch() and renders:
- Line chart — plots both rolling averages over time using Chart.js, with confidence bands (±0.5%) around each line
- Data table — lists every poll with color-coded margin pills (red = R lead, blue = D lead, intensity scales with margin size)
- Responsive layout — separate rendering paths for desktop and mobile
| Layer | Technology |
|---|---|
| Data pipeline | Python 3, pandas |
| Data format | CSV |
| Visualization | JavaScript, Chart.js |
| Styling | CSS3 (glassmorphism, CSS variables) |
| Deployment | Static HTML — no server required |
pip install pandas
python update_csv.pyUpdate the input/output paths at the top of update_csv.py to point to your CSV file before running.
Open Visualize.html in a browser. Make sure ballot_final.csv is in the correct relative path (../polls/2026-generic-ballot.csv by default — update csvPath in the script if needed).
Note: Because the page fetches a local CSV via
fetch(), you may need to serve it from a local server to avoid CORS issues:python -m http.server 8000Then open
http://localhost:8000/Visualize.html
- Data wrangling — cleaning, reshaping, and reindexing DataFrames with pandas
- Rolling window statistics — custom sliding window logic with configurable size
- Schema management — dynamic column insertion at precise positions in a DataFrame
- Data visualization — Chart.js line charts with confidence bands, custom label plugins, and responsive breakpoints
- Frontend/backend integration — Python pipeline output consumed directly by a browser-based chart
- Clean code — no hardcoded magic numbers, configurable parameters, reusable structure
| Parameter | Location | Default | Notes |
|---|---|---|---|
window_size |
update_csv.py line 8 |
10 |
Change to 6 for state races, 4 for smaller datasets |
csvPath |
Visualize.html |
../polls/2026-generic-ballot.csv |
Update to match your file structure |
axisMin / axisMax |
Visualize.html |
30 / 60 |
Y-axis range for the chart |