feat: Add Topic Modeling database schema tables#3397
Merged
sgoggins merged 4 commits intochaoss:mainfrom Nov 13, 2025
Merged
Conversation
33e66cc to
0592997
Compare
Add two new tables and ORM models for Topic Modeling versioning system:
1. topic_model_meta table (Migration 35):
- Stores metadata for each trained topic model
- 21 fields including model_id (UUID PK), repo_id (FK), training parameters,
quality metrics (coherence_score, topic_diversity), and visualization data
- Enables model versioning, comparison, and intelligent retraining
2. topic_model_event table (Migration 36):
- Audit log for topic modeling events
- Tracks training lifecycle: started, completed, retrain triggered, etc.
- Provides observability for automated and manual training operations
3. TopicModelMeta ORM model:
- SQLAlchemy model definition for topic_model_meta table
- Relationships and field mappings for application layer
These schema changes support the Topic Modeling feature that enables:
- Automated NMF-based topic extraction from repository messages
- Model version management and comparison
- Intelligent retraining based on data/quality changes
- Storage optimization via REPLACE strategy for automatic runs
Related: chaoss#3207
Signed-off-by: Xiaoha <blairjade183@gmail.com>
0592997 to
d20c672
Compare
- All JSON/JSONB fields in Augur have NO indexes - Verified: repo_badging.data (JSONB), chaoss_metric_status.cm_info (JSON), etc. - payload is used for display, not filtering - Query performance relies on ix_tme_repo_ts and ix_tme_event indexes Signed-off-by: Xiaoha <blairjade183@gmail.com>
sgoggins
previously approved these changes
Nov 12, 2025
Member
sgoggins
left a comment
There was a problem hiding this comment.
This looks like the right way to add things to the schema. @MoralCode ?
Thank you @xiaoha-cloud !!!
MoralCode
requested changes
Nov 12, 2025
Contributor
There was a problem hiding this comment.
Other than making sure you are using timezone-aware columns for all the timestamps, I don't really see a reason not to merge this - its going to create new unused tables but thats okay since its part one of the topic modeling contribution and merging it sooner is better so other database changes can be made without impacting the pending merge.
augur/application/schema/alembic/versions/35_create_topic_model_meta_table.py
Outdated
Show resolved
Hide resolved
- set training_start_time/end_time/data_collection_date to TIMESTAMPTZ - update TopicModelMeta ORM to use timezone-aware columns - align topic_model_event ts column with TIMESTAMPTZ requirement - satisfies maintainer request for timezone data storage Signed-off-by: Xiaoha <blairjade183@gmail.com>
MoralCode
reviewed
Nov 12, 2025
augur/application/schema/alembic/versions/35_create_topic_model_meta_table.py
Outdated
Show resolved
Hide resolved
- switch Alembic migrations to use sa.TIMESTAMP(timezone=True) - keeps timezone support while avoiding Postgres-specific type import Signed-off-by: Xiaoha <blairjade183@gmail.com>
MoralCode
approved these changes
Nov 13, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds:
Migration 35:
topic_model_metatableMigration 36:
topic_model_eventtableTopicModelMeta ORM model
Why split into two PRs?
Related: #3207
Notes for Reviewers
Signed commits