Strategize materialized view refresh timing relative to collection phases

Spinning out from #292 review (cc @MoralCode).

The refresh task in `collectoss/tasks/db/refresh_materialized_views.py` runs every view on whatever schedule Celery beat says, regardless of:

- Whether collection is mid-cycle for a given repo (views can land right after a refresh holding partial data).
- Which collection phase (core / secondary / facade) feeds each view; we may refresh views whose source hasn't actually changed.
- Concurrent inserts. `REFRESH MATERIALIZED VIEW CONCURRENTLY` doesn't block reads but does serialize against itself, and on heavy collection windows a long-running refresh can interleave with writes in surprising ways.

Stuff worth thinking through:

- Trigger refresh after a collection phase completes for a repo group, instead of on a wall clock?
- Tag each view in the registry with the phases that feed it; only refresh views whose phases just finished?
- Track `last_refreshed_at` per view, skip if nothing changed?
- `issue_reporter_created_at` lacks a unique index so it can only refresh non-concurrently. that lock is briefly disruptive. Schedule it separately, or add a unique constraint to bring it onto the concurrent path?

Library/fork choice for view + index management is a separate conversation — see #314.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategize materialized view refresh timing relative to collection phases #315

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Strategize materialized view refresh timing relative to collection phases #315

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions