Incoming reports clustering by similarity by ksy36 · Pull Request #86 · MozillaSecurity/WebCompatManager

ksy36 · 2026-02-04T21:58:20Z

Changes in this PR:
import_reports_from_bigquery is only saving reports to db without cluster_id or bucket_id
triage_new_reports command gets reports that don't have bucket_id and attempts to cluster and bucket them (runs every hour at the moment)

I think once we import live reports the frequency of triaging can be increased.

ksy36 · 2026-02-10T21:23:48Z

I need to find a way to run cluster_reports command once to cluster existing reports on production before these changes are deployed :)

ksy36 · 2026-02-12T20:26:49Z

Ok I've added a page at /reportmanager/clustering/ where we can trigger full clustering and it also displays all runs, full and incremental ones (triaging new reports). We can probably display more info in the table later, if needed.

jgraham

I mainly reviewed the backend changes so far; for the frontend I wonder if we can get away without having this feature in the first instance.

jgraham · 2026-02-23T12:29:01Z

+    )
+
+    def handle(self, *args: object, **options: object) -> None:
+        status = ClusteringJob.get_clustering_status()


How important is this lock if we can't run stuff from the UI? I guess it still makes sense, but it also makes sense that we can't run multiple import jobs in parallel for example, in which case maybe we should have a more generic locking mechanism rather than one specific to this job type? I guess we're also using it to track progress, but again it seems like there's a lot in common with what we'd want to track progress of other jobs e.g. import. Not a change for this PR.

Yeah, triage_new_reports runs every hour, so it's unlikely (but possible) that it will run at the same time as full reclustering (which I'm thinking to run with a command directly). Also on the first run triage_new_reports checks whether there are any successful runs, i.e. initial clustering exists and returns early if not. I guess it could also trigger initial clustering once (if it doesn't exist) without me manually running the command.
But yes I agree about generic locking mechanism.

jgraham

I think we should land this (once the lint is fixed!) so we can start to see the results and experiment with it.

ksy36 force-pushed the incoming_clustering branch 2 times, most recently from 1ce7f27 to c515179 Compare February 10, 2026 20:43

ksy36 marked this pull request as ready for review February 10, 2026 20:47

ksy36 requested a review from jgraham February 10, 2026 20:47

ksy36 force-pushed the incoming_clustering branch from c515179 to ddfcd1a Compare February 11, 2026 15:30

ksy36 commented Feb 11, 2026

View reviewed changes

Comment thread server/reportmanager/models.py Outdated

ksy36 requested a review from denschub February 11, 2026 16:28

ksy36 marked this pull request as draft February 12, 2026 15:33

ksy36 marked this pull request as ready for review February 12, 2026 20:24

jgraham requested changes Feb 16, 2026

View reviewed changes

ksy36 force-pushed the incoming_clustering branch 2 times, most recently from e0f083e to 3ed40e0 Compare February 20, 2026 22:03

ksy36 commented Feb 20, 2026

View reviewed changes

Comment thread server/reportmanager/clustering/ClusterBucketManager.py

jgraham reviewed Feb 23, 2026

View reviewed changes

ksy36 force-pushed the incoming_clustering branch from 7c219b9 to 884f346 Compare February 26, 2026 04:08

jgraham approved these changes Feb 26, 2026

View reviewed changes

ksy36 force-pushed the incoming_clustering branch from 884f346 to 99e3155 Compare February 26, 2026 14:17

ksy36 added 5 commits February 26, 2026 14:09

Incoming reports similarity clustering

257aa7d

Tests

e0f5655

Add UI to trigger full clustering

991e2b3

Code review changes

7748d57

Code review changes

7bb6c35

ksy36 force-pushed the incoming_clustering branch from 99e3155 to 7bb6c35 Compare February 26, 2026 19:10

ksy36 merged commit 2bd2d25 into main Feb 26, 2026
8 checks passed

Conversation

ksy36 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ksy36 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ksy36 commented Feb 12, 2026

Uh oh!

jgraham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jgraham Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

ksy36 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jgraham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ksy36 commented Feb 4, 2026 •

edited

Loading

ksy36 commented Feb 10, 2026 •

edited

Loading