Skip to content

Centralize materialized view registry and refactor refresh task#292

Open
shlokgilda wants to merge 2 commits intochaoss:mainfrom
shlokgilda:feature/materialized-views-registry
Open

Centralize materialized view registry and refactor refresh task#292
shlokgilda wants to merge 2 commits intochaoss:mainfrom
shlokgilda:feature/materialized-views-registry

Conversation

@shlokgilda
Copy link
Copy Markdown

Description:

Introduces a single source of truth for all 15 existing PostgreSQL materialized view definitions and replaces a fragile hardcoded refresh task with a dynamic loop.

  • add collectoss/application/db/materialized_views.py — registry of all 15 existing materialized views (SQL definitions + unique index columns). get_refresh_sql() helper builds REFRESH MATERIALIZED VIEW statements.
  • refactor collectoss/tasks/db/refresh_materialized_views.py — replaces 14 hardcoded try/except: pass blocks with a dynamic loop over the registry. Uses an AUTOCOMMIT connection so REFRESH CONCURRENTLY actually works
    (it cannot run inside a transaction block, which the old code silently violated). Raises RuntimeError at the end if any view failed both concurrent and non-concurrent refresh, so failures are visible in Celery instead of swallowed.
  • wire alembic_utils into collectoss/application/schema/alembic/env.py — registers all views from the registry so alembic revision --autogenerate detects SQL definition changes automatically (phase 2 of Add new materialized views for heatmaps on 8knot (and make a system for it) #243).
  • add alembic-utils==0.8.8 to pyproject.toml and regenerate uv.lock.

No schema changes in this PR. The 3 new heatmap views for 8Knot follow in a separate incremental PR on top of this one.

This PR fixes #243 (partial — heatmap views follow separately)

Notes for Reviewers:

The key correctness fix worth understanding: REFRESH MATERIALIZED VIEW CONCURRENTLY which wraps everything in engine.begin(), so every concurrent refresh was silently failing and falling back to a blocking refresh on every run. This PR fixes that by using engine.connect().execution_options(isolation_level="AUTOCOMMIT") directly.

The ; COMMIT; embedded in the old SQL strings has also been removed — those were breaking SQLAlchemy's transaction management.

First post-deploy alembic revision --autogenerate may propose a no-op normalization revision for views whose SQL Postgres normalizes differently from what's in the registry (Postgres reformats SQL on storage). Safe to discard that revision.

Signed commits

  • Yes, I signed my commits.

AI Disclosure: Claude Code was used to draft this PR description and write docstrings in the new materialized_views.py module. I reviewed and verified all code changes, SQL definitions, and fixes before committing.

- add collectoss/application/db/materialized_views.py: single source of
  truth for all 15 existing materialized view definitions (SQL + unique
  index columns). get_refresh_sql() helper constructs REFRESH statements.

- refactor collectoss/tasks/db/refresh_materialized_views.py: replace 14
  hardcoded try/except-pass blocks with a dynamic loop over the registry.
  uses an AUTOCOMMIT connection so REFRESH CONCURRENTLY works correctly
  (it cannot run inside a transaction block). raises RuntimeError if any
  view fails both concurrent and non-concurrent refresh.

- wire alembic_utils into collectoss/application/schema/alembic/env.py:
  register all views from the registry so that alembic revision
  --autogenerate detects SQL definition changes automatically.

- add alembic-utils==0.8.8 to pyproject.toml and regenerate uv.lock.

closes chaoss#243 (partial — heatmap views follow in a separate PR)

Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
@shlokgilda shlokgilda requested a review from MoralCode as a code owner May 5, 2026 05:02
Comment on lines +692 to +701
MATERIALIZED_VIEWS = [
# --- View 1: legacy DDL (augur_full.sql), no unique index ---
{
"name": "issue_reporter_created_at",
"schema": "augur_data",
"sql": _ISSUE_REPORTER_CREATED_AT,
"unique_index_columns": [], # only a non-unique btree on repo_id
},
# --- Views 2-6: from migration 4, indexes from migration 25 ---
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a very different paradigm from the class-based modeling in the other SQLAlchemy models.

In developing this, did you come to realize that text-based SQL is the only way to do it? I think it would be awesome if there was a class-based approach to this (ideally passing SQLAlchemy Table's around or something, but if we have to DIY our own classes that mirror the rest of our models and just encapsulate this metadata alongside the SQL query, that could work too)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a very different paradigm from the class-based modeling in the other SQLAlchemy models...

Fair callout. pushed a follow-up (56647597a).

On Table-based modeling: not viable here. Table describes column/constraint shape, not SELECT semantics, and several views are 100+-line UNION ALL / CTE queries (explorer_new_contributors is ~164 lines). SQLAlchemy also has no first-class materialized-view construct.

What I did: went with the DIY-class fallback. New frozen MaterializedView dataclass with fqn, refresh_sql(), and to_pg_view():

@dataclass(frozen=True)
class MaterializedView:
    name: str
    schema: str
    sql: str
    unique_index_columns: tuple[str, ...] = ()
  • All 15 registry entries converted from dicts to instances.
  • Refresh task uses view.fqn / view.refresh_sql(...); get_refresh_sql removed.
  • env.py collapses to [v.to_pg_view() for v in MATERIALIZED_VIEWS]. Composition over subclassing PGMaterializedView since unique_index_columns is metadata alembic_utils doesn't track.
  • tuple (not list) so frozen=True is actually immutable. Custom __repr__ keeps the SQL out of logs.

Verified the emitted REFRESH MATERIALIZED VIEW [CONCURRENTLY] ... is byte-identical to before across all 15 views, both modes. No SQL or migration changes.

Replace the list-of-dicts registry with a frozen MaterializedView dataclass
exposing fqn, refresh_sql(), and to_pg_view(). Brings the registry's shape
in line with the declarative ORM style used elsewhere in the codebase and
gives callers attribute access + type checking instead of string-keyed
dict lookups.

unique_index_columns is a tuple so frozen=True actually means immutable.
__repr__ is overridden to keep the multi-hundred-line view SQL out of
debug logs. Refresh task and alembic env.py updated to use the new API;
get_refresh_sql free function removed (only two call sites).

Emitted REFRESH SQL is byte-identical to the previous version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add new materialized views for heatmaps on 8knot (and make a system for it)

2 participants