Centralize materialized view registry and refactor refresh task#292
Centralize materialized view registry and refactor refresh task#292shlokgilda wants to merge 2 commits intochaoss:mainfrom
Conversation
- add collectoss/application/db/materialized_views.py: single source of truth for all 15 existing materialized view definitions (SQL + unique index columns). get_refresh_sql() helper constructs REFRESH statements. - refactor collectoss/tasks/db/refresh_materialized_views.py: replace 14 hardcoded try/except-pass blocks with a dynamic loop over the registry. uses an AUTOCOMMIT connection so REFRESH CONCURRENTLY works correctly (it cannot run inside a transaction block). raises RuntimeError if any view fails both concurrent and non-concurrent refresh. - wire alembic_utils into collectoss/application/schema/alembic/env.py: register all views from the registry so that alembic revision --autogenerate detects SQL definition changes automatically. - add alembic-utils==0.8.8 to pyproject.toml and regenerate uv.lock. closes chaoss#243 (partial — heatmap views follow in a separate PR) Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
| MATERIALIZED_VIEWS = [ | ||
| # --- View 1: legacy DDL (augur_full.sql), no unique index --- | ||
| { | ||
| "name": "issue_reporter_created_at", | ||
| "schema": "augur_data", | ||
| "sql": _ISSUE_REPORTER_CREATED_AT, | ||
| "unique_index_columns": [], # only a non-unique btree on repo_id | ||
| }, | ||
| # --- Views 2-6: from migration 4, indexes from migration 25 --- | ||
| { |
There was a problem hiding this comment.
This feels like a very different paradigm from the class-based modeling in the other SQLAlchemy models.
In developing this, did you come to realize that text-based SQL is the only way to do it? I think it would be awesome if there was a class-based approach to this (ideally passing SQLAlchemy Table's around or something, but if we have to DIY our own classes that mirror the rest of our models and just encapsulate this metadata alongside the SQL query, that could work too)
There was a problem hiding this comment.
This feels like a very different paradigm from the class-based modeling in the other SQLAlchemy models...
Fair callout. pushed a follow-up (56647597a).
On Table-based modeling: not viable here. Table describes column/constraint shape, not SELECT semantics, and several views are 100+-line UNION ALL / CTE queries (explorer_new_contributors is ~164 lines). SQLAlchemy also has no first-class materialized-view construct.
What I did: went with the DIY-class fallback. New frozen MaterializedView dataclass with fqn, refresh_sql(), and to_pg_view():
@dataclass(frozen=True)
class MaterializedView:
name: str
schema: str
sql: str
unique_index_columns: tuple[str, ...] = ()- All 15 registry entries converted from dicts to instances.
- Refresh task uses
view.fqn/view.refresh_sql(...);get_refresh_sqlremoved. env.pycollapses to[v.to_pg_view() for v in MATERIALIZED_VIEWS]. Composition over subclassingPGMaterializedViewsinceunique_index_columnsis metadata alembic_utils doesn't track.tuple(notlist) sofrozen=Trueis actually immutable. Custom__repr__keeps the SQL out of logs.
Verified the emitted REFRESH MATERIALIZED VIEW [CONCURRENTLY] ... is byte-identical to before across all 15 views, both modes. No SQL or migration changes.
Replace the list-of-dicts registry with a frozen MaterializedView dataclass exposing fqn, refresh_sql(), and to_pg_view(). Brings the registry's shape in line with the declarative ORM style used elsewhere in the codebase and gives callers attribute access + type checking instead of string-keyed dict lookups. unique_index_columns is a tuple so frozen=True actually means immutable. __repr__ is overridden to keep the multi-hundred-line view SQL out of debug logs. Refresh task and alembic env.py updated to use the new API; get_refresh_sql free function removed (only two call sites). Emitted REFRESH SQL is byte-identical to the previous version.
Description:
Introduces a single source of truth for all 15 existing PostgreSQL materialized view definitions and replaces a fragile hardcoded refresh task with a dynamic loop.
collectoss/application/db/materialized_views.py— registry of all 15 existing materialized views (SQL definitions + unique index columns).get_refresh_sql()helper buildsREFRESH MATERIALIZED VIEWstatements.collectoss/tasks/db/refresh_materialized_views.py— replaces 14 hardcodedtry/except: passblocks with a dynamic loop over the registry. Uses anAUTOCOMMITconnection soREFRESH CONCURRENTLYactually works(it cannot run inside a transaction block, which the old code silently violated). Raises
RuntimeErrorat the end if any view failed both concurrent and non-concurrent refresh, so failures are visible in Celery instead of swallowed.alembic_utilsintocollectoss/application/schema/alembic/env.py— registers all views from the registry soalembic revision --autogeneratedetects SQL definition changes automatically (phase 2 of Add new materialized views for heatmaps on 8knot (and make a system for it) #243).alembic-utils==0.8.8topyproject.tomland regenerateuv.lock.No schema changes in this PR. The 3 new heatmap views for 8Knot follow in a separate incremental PR on top of this one.
This PR fixes #243 (partial — heatmap views follow separately)
Notes for Reviewers:
The key correctness fix worth understanding:
REFRESH MATERIALIZED VIEW CONCURRENTLYwhich wraps everything inengine.begin(), so every concurrent refresh was silently failing and falling back to a blocking refresh on every run. This PR fixes that by usingengine.connect().execution_options(isolation_level="AUTOCOMMIT")directly.The
; COMMIT;embedded in the old SQL strings has also been removed — those were breaking SQLAlchemy's transaction management.First post-deploy
alembic revision --autogeneratemay propose a no-op normalization revision for views whose SQL Postgres normalizes differently from what's in the registry (Postgres reformats SQL on storage). Safe to discard that revision.Signed commits
AI Disclosure: Claude Code was used to draft this PR description and write docstrings in the new
materialized_views.pymodule. I reviewed and verified all code changes, SQL definitions, and fixes before committing.