-
Notifications
You must be signed in to change notification settings - Fork 521
fix(Code References): Code references are slow to query #7463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
4ead702
Ditch fake retention
emyller 32be058
Fix docs
emyller 54744b3
Redesign code references data model
emyller 4a59b20
Improve types
emyller 2b0f99e
Migrate code references
emyller e9575cd
Re-enable code references stats now that queries are fast
emyller 803642d
Merge remote-tracking branch 'github/main' into fix/fast-code-references
emyller 4fb1afb
Flag non-secure md5 use as intentional
emyller 29f3276
Even faster now
emyller 64007c8
Fix scan timestamp
emyller File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,2 @@ | ||
| # TODO: Implement history cleanup? | ||
| FEATURE_FLAG_CODE_REFERENCES_RETENTION_DAYS = 30 | ||
|
gagantrivedi marked this conversation as resolved.
|
||
|
|
||
| # Linux maximum file path length, as per limits.h/PATH_MAX | ||
| MAX_FILE_PATH_LENGTH = 4096 | ||
224 changes: 224 additions & 0 deletions
224
api/projects/code_references/migrations/0003_introduce_per_feature_scanned_references.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,224 @@ | ||
| import hashlib | ||
| import json | ||
|
gagantrivedi marked this conversation as resolved.
|
||
| from itertools import groupby | ||
| from operator import attrgetter | ||
| from typing import TypedDict | ||
|
|
||
| import django.db.models.deletion | ||
| from django.apps.registry import Apps | ||
| from django.db import migrations, models | ||
| from django.db.models import Max | ||
|
|
||
|
|
||
| class LegacyCodeReference(TypedDict): | ||
| feature_name: str | ||
| file_path: str | ||
| line_number: int | ||
|
|
||
|
|
||
| class StoredCodeReference(TypedDict): | ||
| file_path: str | ||
| line_number: int | ||
|
|
||
|
|
||
| def _hash_references(references: list[StoredCodeReference]) -> str: | ||
| return hashlib.md5( | ||
| json.dumps(references, sort_keys=True).encode(), | ||
| usedforsecurity=False, | ||
| ).hexdigest() | ||
|
|
||
|
|
||
| def migrate_scans_forward(apps: Apps, _: object) -> None: | ||
| """Split each legacy scan into new cardinality (per-repository and per-feature)""" | ||
|
|
||
| LegacyScan = apps.get_model("code_references", "FeatureFlagCodeReferencesScan") | ||
| PerFeatureScan = apps.get_model("code_references", "ScannedCodeReferences") | ||
| Repository = apps.get_model("code_references", "VCSRepository") | ||
| Feature = apps.get_model("features", "Feature") | ||
|
|
||
| legacy_scans_summaries = LegacyScan.objects.values( | ||
| "project_id", | ||
| "repository_url", | ||
| "vcs_provider", | ||
| ).annotate(last_scanned_at=Max("created_at")) | ||
|
|
||
| repositories = { | ||
| (summary["project_id"], summary["repository_url"]): Repository.objects.create( | ||
| project_id=summary["project_id"], | ||
| url=summary["repository_url"], | ||
| vcs_provider=summary["vcs_provider"], | ||
| last_scanned_at=summary["last_scanned_at"], | ||
| ) | ||
| for summary in legacy_scans_summaries | ||
| } | ||
|
|
||
| # Oldest-first per project so the newest scan wins on hash collisions | ||
| legacy_scans = LegacyScan.objects.order_by("project_id", "created_at").iterator() | ||
| grouped_scans = groupby(legacy_scans, key=attrgetter("project_id")) | ||
| for project_id, project_scans in grouped_scans: | ||
| features = { | ||
| (feature.project_id, feature.name): feature | ||
| for feature in Feature.objects.filter( | ||
| project_id=project_id, | ||
| deleted_at__isnull=True, # Historical models drop SoftDeleteManager | ||
| ) | ||
| } | ||
| for legacy_scan in project_scans: | ||
| repository_url = legacy_scan.repository_url | ||
| repository = repositories[project_id, repository_url] | ||
|
|
||
| references_by_feature: dict[str, list[StoredCodeReference]] = {} | ||
| for reference in legacy_scan.code_references: | ||
| feature_name = reference["feature_name"] | ||
| references_by_feature.setdefault(feature_name, []).append( | ||
| StoredCodeReference( | ||
| file_path=reference["file_path"], | ||
| line_number=reference["line_number"], | ||
| ) | ||
| ) | ||
|
|
||
| for feature_name, references in references_by_feature.items(): | ||
| if not (feature := features.get((project_id, feature_name))): | ||
| continue | ||
| PerFeatureScan.objects.update_or_create( | ||
| feature=feature, | ||
| repository=repository, | ||
| code_references_hash=_hash_references(references), | ||
| defaults={ | ||
| "revision": legacy_scan.revision, | ||
| "code_references": references, | ||
| "created_at": legacy_scan.created_at, | ||
| }, | ||
| ) | ||
|
|
||
|
|
||
| def migrate_scans_backward(apps: Apps, _: object) -> None: | ||
| """Mirror each per-feature row back into the legacy single-table layout.""" | ||
| LegacyScan = apps.get_model("code_references", "FeatureFlagCodeReferencesScan") | ||
| PerFeatureScan = apps.get_model("code_references", "ScannedCodeReferences") | ||
| LegacyScan._meta.get_field("created_at").auto_now_add = False | ||
|
|
||
| per_feature_scans = PerFeatureScan.objects.select_related( | ||
| "repository", | ||
| "feature", | ||
| ).iterator(chunk_size=200) | ||
|
|
||
| for per_feature_scan in per_feature_scans: | ||
| repository = per_feature_scan.repository | ||
| feature_name = per_feature_scan.feature.name | ||
| LegacyScan.objects.create( | ||
| project_id=repository.project_id, | ||
| repository_url=repository.url, | ||
| vcs_provider=repository.vcs_provider, | ||
| revision=per_feature_scan.revision, | ||
| code_references=[ | ||
| {"feature_name": feature_name, **reference} | ||
| for reference in per_feature_scan.code_references | ||
| ], | ||
| created_at=per_feature_scan.created_at, | ||
| ) | ||
|
|
||
|
|
||
| class Migration(migrations.Migration): | ||
| dependencies = [ | ||
| ("code_references", "0002_add_project_repo_created_index"), | ||
| ("features", "0066_constrain_feature_type"), | ||
| ("projects", "0029_bump_default_project_limits"), | ||
| ] | ||
|
|
||
| operations = [ | ||
| migrations.CreateModel( | ||
| name="VCSRepository", | ||
| fields=[ | ||
| ( | ||
| "id", | ||
| models.AutoField( | ||
| auto_created=True, | ||
| primary_key=True, | ||
| serialize=False, | ||
| verbose_name="ID", | ||
| ), | ||
| ), | ||
| ("created_at", models.DateTimeField(auto_now_add=True)), | ||
| ("url", models.URLField()), | ||
| ( | ||
| "vcs_provider", | ||
| models.CharField( | ||
| choices=[("github", "GitHub")], | ||
| max_length=50, | ||
| ), | ||
| ), | ||
| ("last_scanned_at", models.DateTimeField(null=True)), | ||
| ( | ||
| "project", | ||
| models.ForeignKey( | ||
| on_delete=django.db.models.deletion.CASCADE, | ||
| related_name="vcs_repositories", | ||
| to="projects.project", | ||
| ), | ||
| ), | ||
| ], | ||
| ), | ||
| migrations.AddConstraint( | ||
| model_name="vcsrepository", | ||
| constraint=models.UniqueConstraint( | ||
| fields=("project", "url"), | ||
| name="unique_vcs_repository", | ||
| ), | ||
| ), | ||
| migrations.CreateModel( | ||
| name="ScannedCodeReferences", | ||
| fields=[ | ||
| ( | ||
| "id", | ||
| models.AutoField( | ||
| auto_created=True, | ||
| primary_key=True, | ||
| serialize=False, | ||
| verbose_name="ID", | ||
| ), | ||
| ), | ||
| ("created_at", models.DateTimeField()), | ||
| ("revision", models.CharField(max_length=100)), | ||
| ("code_references", models.JSONField(default=list)), | ||
| ("code_references_hash", models.CharField(max_length=32)), | ||
| ( | ||
| "feature", | ||
| models.ForeignKey( | ||
| on_delete=django.db.models.deletion.CASCADE, | ||
| related_name="scanned_code_references", | ||
| to="features.feature", | ||
| ), | ||
| ), | ||
| ( | ||
| "repository", | ||
| models.ForeignKey( | ||
| on_delete=django.db.models.deletion.CASCADE, | ||
| related_name="scanned_code_references", | ||
| to="code_references.vcsrepository", | ||
| ), | ||
| ), | ||
| ], | ||
| ), | ||
| migrations.AddConstraint( | ||
| model_name="scannedcodereferences", | ||
| constraint=models.UniqueConstraint( | ||
| fields=("feature", "repository", "code_references_hash"), | ||
| name="unique_scanned_code_references", | ||
| ), | ||
| ), | ||
| migrations.AddIndex( | ||
| model_name="scannedcodereferences", | ||
| index=models.Index( | ||
| fields=("feature", "repository", "created_at"), | ||
| name="cr_feature_repo_created_idx", | ||
| ), | ||
| ), | ||
| migrations.RunPython( | ||
| code=migrate_scans_forward, | ||
| reverse_code=migrate_scans_backward, | ||
| ), | ||
| migrations.DeleteModel( | ||
| name="FeatureFlagCodeReferencesScan", | ||
| ), | ||
| ] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,37 +1,75 @@ | ||
| from django.db import models | ||
|
|
||
| from projects.code_references.types import JSONCodeReference, VCSProvider | ||
| from projects.code_references.types import StoredCodeReference, VCSProvider | ||
|
|
||
|
|
||
| class FeatureFlagCodeReferencesScan(models.Model): | ||
| class VCSRepository(models.Model): | ||
| """ | ||
| A scan of feature flag code references in a repository | ||
| A VCS repository that is scanned for feature flag code references | ||
| """ | ||
|
|
||
| created_at = models.DateTimeField(auto_now_add=True) | ||
|
|
||
| project = models.ForeignKey( | ||
| "projects.Project", | ||
| on_delete=models.CASCADE, | ||
| related_name="code_references", | ||
| related_name="vcs_repositories", | ||
| ) | ||
|
|
||
| # Provider-agnostic URL to the web UI of the repository, e.g. https://github.flagsmith.com/backend/ | ||
| repository_url = models.URLField() | ||
| url = models.URLField() | ||
|
|
||
| vcs_provider = models.CharField( | ||
| max_length=50, | ||
| choices=VCSProvider.choices, | ||
| default=VCSProvider.GITHUB, # TODO: Remove when adding other providers | ||
| ) | ||
|
|
||
| last_scanned_at = models.DateTimeField(null=True) | ||
|
|
||
| class Meta: | ||
| constraints = [ | ||
| models.UniqueConstraint( | ||
| fields=["project", "url"], | ||
| name="unique_vcs_repository", | ||
| ), | ||
| ] | ||
|
|
||
|
|
||
| class ScannedCodeReferences(models.Model): | ||
| """ | ||
| A list of code references for a feature scanned from a VCS repository | ||
| """ | ||
|
|
||
| created_at = models.DateTimeField() | ||
|
|
||
| feature = models.ForeignKey( | ||
| "features.Feature", | ||
| on_delete=models.CASCADE, | ||
| related_name="scanned_code_references", | ||
| ) | ||
|
|
||
| repository = models.ForeignKey( | ||
| VCSRepository, | ||
| on_delete=models.CASCADE, | ||
| related_name="scanned_code_references", | ||
| ) | ||
|
|
||
| revision = models.CharField(max_length=100) | ||
| code_references = models.JSONField[list[JSONCodeReference]](default=list) | ||
|
|
||
| created_at = models.DateTimeField(auto_now_add=True, db_index=True) | ||
| code_references = models.JSONField[list[StoredCodeReference]](default=list) | ||
|
|
||
| code_references_hash = models.CharField(max_length=32) | ||
|
|
||
| class Meta: | ||
| ordering = ["-created_at"] | ||
| constraints = [ | ||
|
emyller marked this conversation as resolved.
|
||
| models.UniqueConstraint( # Supports batch-insert with ignore-conflicts | ||
| fields=["feature", "repository", "code_references_hash"], | ||
| name="unique_scanned_code_references", | ||
| ), | ||
| ] | ||
| indexes = [ | ||
| models.Index( | ||
| fields=["project", "repository_url", "-created_at"], | ||
| name="code_ref_proj_repo_created_idx", | ||
| models.Index( # Supports finding the latest scan for a feature/repository | ||
| fields=["feature", "repository", "created_at"], | ||
| name="cr_feature_repo_created_idx", | ||
| ), | ||
| ] | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.