perf(parser,db): replace lifetime cml_stats refresh with windowed stats by cchwala · Pull Request #51 · OpenSenseAction/GMDI_prototype

cchwala · 2026-05-19T19:47:19Z

The background stats timer previously called update_cml_stats() for every known CML every 60 s. That function scans the full history of cml_data, causing 20-30 s of ~100% PostgreSQL CPU usage every minute regardless of how much new data arrived.

This commit replaces that hot path with a cheap windowed variant:

Database (migration 009 / init.sql)

Add 8 new columns to cml_stats: completeness_percent_6h, total_records_6h, valid_records_6h, mean_rsl_6h, stddev_rsl_6h, completeness_percent_1h, mean_rsl_1h, stddev_rsl_1h
Add update_cml_stats_windowed(cml_id, user_id) which computes all windowed aggregates in one pass using FILTER clauses. TimescaleDB chunk exclusion limits the scan to the current uncompressed chunk (~6 h of data) irrespective of total dataset size.
GRANT EXECUTE on the new function to demo_openmrg and demo_orange_cameroun.

Parser (db_writer.py / main.py)

Add DBWriter.refresh_windowed_stats() mirroring refresh_stats() but calling update_cml_stats_windowed.
Replace both refresh_stats() call sites in the stats background thread with refresh_windowed_stats().
Wire _update_stats_for_cmls() into write_rawdata() so lifetime columns (total_records, min_rsl, max_rsl, …) stay current when new data arrives; the stats update and the data insert share one commit so they are atomic.

Webserver (main.py)

/api/cml-stats: replace the complex LEFT JOIN + live STDDEV recompute with a simple SELECT from cml_stats reading the pre-computed windowed columns. Response now carries completeness_percent_6h as completeness_percent, plus completeness_percent_1h and stddev_last_60min (pre-computed 1 h stddev).
get_archive_statistics(): replace COUNT(*) FROM cml_data_secure (full hypertable scan) with SUM(total_records) FROM cml_stats (O(users)).

Frontend (realtime.html)

Update dropdown labels: "Data Completeness" -> "Data Completeness (6h)" and "RSL Std Dev (60min)" -> "RSL Std Dev (1h)".
Update popup text to match new time-window labels.

Scripts / onboarding (generate_config.py)

Add GRANT EXECUTE on update_cml_stats_windowed to per-user SQL template so new users get access to both stats functions.

Tests

test_api_cml_stats.py: rewrite fixture for the new 9-column schema and add assertions for completeness_percent_1h and stddev_last_60min.
All 63 parser unit tests and 67 webserver unit tests pass.

The background stats timer previously called update_cml_stats() for every known CML every 60 s. That function scans the full history of cml_data, causing 20-30 s of ~100% PostgreSQL CPU usage every minute regardless of how much new data arrived. This commit replaces that hot path with a cheap windowed variant: Database (migration 009 / init.sql) - Add 8 new columns to cml_stats: completeness_percent_6h, total_records_6h, valid_records_6h, mean_rsl_6h, stddev_rsl_6h, completeness_percent_1h, mean_rsl_1h, stddev_rsl_1h - Add update_cml_stats_windowed(cml_id, user_id) which computes all windowed aggregates in one pass using FILTER clauses. TimescaleDB chunk exclusion limits the scan to the current uncompressed chunk (~6 h of data) irrespective of total dataset size. - GRANT EXECUTE on the new function to demo_openmrg and demo_orange_cameroun. Parser (db_writer.py / main.py) - Add DBWriter.refresh_windowed_stats() mirroring refresh_stats() but calling update_cml_stats_windowed. - Replace both refresh_stats() call sites in the stats background thread with refresh_windowed_stats(). - Wire _update_stats_for_cmls() into write_rawdata() so lifetime columns (total_records, min_rsl, max_rsl, …) stay current when new data arrives; the stats update and the data insert share one commit so they are atomic. Webserver (main.py) - /api/cml-stats: replace the complex LEFT JOIN + live STDDEV recompute with a simple SELECT from cml_stats reading the pre-computed windowed columns. Response now carries completeness_percent_6h as completeness_percent, plus completeness_percent_1h and stddev_last_60min (pre-computed 1 h stddev). - get_archive_statistics(): replace COUNT(*) FROM cml_data_secure (full hypertable scan) with SUM(total_records) FROM cml_stats (O(users)). Frontend (realtime.html) - Update dropdown labels: "Data Completeness" -> "Data Completeness (6h)" and "RSL Std Dev (60min)" -> "RSL Std Dev (1h)". - Update popup text to match new time-window labels. Scripts / onboarding (generate_config.py) - Add GRANT EXECUTE on update_cml_stats_windowed to per-user SQL template so new users get access to both stats functions. Tests - test_api_cml_stats.py: rewrite fixture for the new 9-column schema and add assertions for completeness_percent_1h and stddev_last_60min. - All 63 parser unit tests and 67 webserver unit tests pass.

codecov · 2026-05-19T19:47:55Z

Codecov Report

❌ Patch coverage is 91.02564% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.42%. Comparing base (ae79bfe) to head (ebc18b7).

Files with missing lines	Patch %	Lines
parser/db_writer.py	76.47%	4 Missing ⚠️
parser/tests/test_main.py	95.00%	2 Missing ⚠️
parser/main.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #51      +/-   ##
==========================================
+ Coverage   84.31%   85.42%   +1.10%     
==========================================
  Files          28       28              
  Lines        2940     3012      +72     
==========================================
+ Hits         2479     2573      +94     
+ Misses        461      439      -22

Flag	Coverage Δ
mno_simulator	`86.12% <ø> (ø)`
parser	`91.99% <90.90%> (+0.83%)`	⬆️
scripts	`74.52% <ø> (ø)`
webserver	`73.78% <100.00%> (+2.99%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add unit tests for the new code paths introduced in the previous commit: parser/tests/test_db_writer.py - test_refresh_windowed_stats_commits_on_success: verifies update_cml_stats_windowed is called and the transaction is committed. - test_refresh_windowed_stats_rollback_on_error: verifies the exception is swallowed and the connection is rolled back on DB failure. parser/tests/test_main.py - _capture_stats_loop helper: runs main() with a CapturingThread so the stats_loop closure can be extracted and called synchronously. - test_stats_loop_calls_refresh_windowed_stats_on_startup - test_stats_loop_initial_refresh_error_is_swallowed - test_stats_loop_calls_refresh_windowed_stats_in_timer_loop webserver/tests/test_api_routes.py - test_get_archive_statistics_reads_total_records_from_cml_stats: regression guard ensuring the archive stats endpoint uses COALESCE(SUM(total_records), 0) FROM cml_stats and never COUNT(*) FROM cml_data (full hypertable scan).

_capture_stats_loop called stats_loop() after the with-patch block exited. With all patches removed, DBWriter.connect() attempted a real DB connection, failed, and the retry loop spun forever because mock_event.is_set() was configured to return False. Replace _capture_stats_loop with _run_stats_loop, which invokes stats_loop() inside the patch context so DBWriter stays mocked throughout the call.

cchwala added 2 commits May 19, 2026 22:39

cchwala merged commit 6835796 into main May 19, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(parser,db): replace lifetime cml_stats refresh with windowed stats#51

perf(parser,db): replace lifetime cml_stats refresh with windowed stats#51
cchwala merged 3 commits into
mainfrom
feat/windowed-cml-stats

cchwala commented May 19, 2026

Uh oh!

codecov Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cchwala commented May 19, 2026

Uh oh!

codecov Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 19, 2026 •

edited

Loading