feat(replays): Add self-serve bulk deletes #93864

cmanallen · 2025-06-18T21:36:35Z

Enables customers to bulk delete their own data asynchronously.

codecov · 2025-06-18T21:43:39Z

Codecov Report

Attention: Patch coverage is 99.10515% with 4 lines in your changes missing coverage. Please review.

❌ File not in storage

No result to display due to the CLI not being able to find the file.
Please ensure the file contains junit in the name and automated file search is enabled,
or the desired file specified by the file and search_dir arguments of the CLI.

Files with missing lines	Patch %	Lines
src/sentry/replays/usecases/delete.py	94.11%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #93864       +/-   ##
===========================================
+ Coverage   53.07%   88.04%   +34.97%     
===========================================
  Files       10334    10349       +15     
  Lines      596983   597858      +875     
  Branches    23221    23221               
===========================================
+ Hits       316821   526360   +209539     
+ Misses     279658    70994   -208664     
  Partials      504      504

github-actions · 2025-06-20T13:45:44Z

This PR has a migration; here is the generated SQL for src/sentry/replays/migrations/0006_add_bulk_delete_job.py

for 0006_add_bulk_delete_job in replays

--
-- Create model ReplayDeletionJobModel
--
CREATE TABLE "replays_replaydeletionjob" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "date_updated" timestamp with time zone NOT NULL, "date_added" timestamp with time zone NOT NULL, "range_start" timestamp with time zone NOT NULL, "range_end" timestamp with time zone NOT NULL, "environments" text[] NOT NULL, "organization_id" bigint NOT NULL, "project_id" bigint NOT NULL, "status" varchar NOT NULL, "query" text NOT NULL, "offset" integer NOT NULL);
CREATE INDEX CONCURRENTLY "replays_replaydeletionjob_organization_id_19b03747" ON "replays_replaydeletionjob" ("organization_id");
CREATE INDEX CONCURRENTLY "replays_replaydeletionjob_project_id_447cb4a5" ON "replays_replaydeletionjob" ("project_id");

JoshFerge · 2025-06-20T14:46:06Z

src/sentry/replays/endpoints/project_replay_jobs_delete.py

+
+        # Create the deletion job
+        job = ReplayDeletionJobModel.objects.create(
+            range_start=data["rangeStart"],


can we use our own standardized statsPeriod params instead of rangeStart and rangeEnd? get_date_range_from_params or get_filter_params

Will address in a follow-up.

JoshFerge · 2025-06-20T14:48:03Z

src/sentry/replays/migrations/0006_add_bulk_delete_job.py

+                    ),
+                ),
+                (
+                    "organization_id",


is organization_id necessary on this model? or is project_id enough?

This was originally an organization endpoint so you could list the deletions for all of your projects. I simplified to a project endpoint but I think that organization list view could make a return. The org-id simplifies querying that regard.

JoshFerge · 2025-06-20T14:49:37Z

src/sentry/replays/endpoints/project_replay_jobs_delete.py

+            status="pending",
+        )
+
+        # We always start with an offset of 0 (obviously) but future work doesn't need to obey


i see below that we only set in-progress if offset is 0. does this comment mean soon people will be able to start with an offset greater than 0, and if so, should we pass a new parameter to the job first_run or somesuch to control the in-progress setting?

Probably just for restarting a previously run task so in-progress should already be set. I think the offset parameter is not well used right now. So I may remove it until that use case actually comes.

I'm going to address the offset issue in a follow-up. I want to make sure this is implemented thoughtfully and there's pressure to start deleting using this task quickly.

markstory · 2025-06-20T14:38:52Z

src/sentry/replays/endpoints/project_replay_jobs_delete.py

+        job = ReplayDeletionJobModel.objects.create(
+            range_start=data["rangeStart"],
+            range_end=data["rangeEnd"],
+            environments=data["environments"],
+            organization_id=project.organization_id,
+            project_id=project.id,
+            query=data["query"],
+            status="pending",
+        )
+
+        # We always start with an offset of 0 (obviously) but future work doesn't need to obey
+        # this. You're free to start from wherever you want.
+        run_bulk_replay_delete_job.delay(job.id, offset=0)


This is pretty similar to our existing deletions system, but we don't have a good solution for range deletions currently. 🤔

Do you have a link to the existing system?

It lives in sentry.deletions is is used for bulk deletes and large object tree deletion. It uses periodic tasks to ensure that deletions are driven to completion in spite of all the failures/disruptions our systems can have.

I will probably pick your brain on this later and we can possibly fold this into the existing system. Would be nice to re-use common systems.

Sounds good.

src/sentry/replays/tasks.py

JoshFerge · 2025-06-20T14:50:50Z

src/sentry/replays/usecases/delete.py

@@ -0,0 +1,155 @@
+from __future__ import annotations


i know we have code existing for replays deletions, is any of it able to be reused?

This is meant to replace it. The previous script causes some load issues (ClickHouse and TaskWorker). The goal was basically to copy paste the old behavior and tweak it to be more reliable. Once its all merged and verified to be working I'll remove the current delete behavior.

But yeah that old code was re-used just in a copy-paste sort of way. There's enough tweaks that abstracting it didn't make much sense. Plus we're actively using the script and I didn't want to break it while I was developing this new system.

src/sentry/replays/tasks.py

src/sentry/replays/usecases/delete.py

Co-authored-by: Mark Story <mark@mark-story.com>

…m/getsentry/sentry into cmanallen/replays-add-bulk-delete

Co-authored-by: Mark Story <mark@mark-story.com>

…m/getsentry/sentry into cmanallen/replays-add-bulk-delete

markstory

The schema and task configuration looks good to me.

sentry-io · 2025-06-20T19:54:52Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ TypeError: unsupported operand type(s) for +: 'Non... in de
‼️ ReplayDeletionJobModel.DoesNotExist: ReplayDeletio... in de

_{Did you find this useful? React with a 👍 or 👎}

cmanallen added 4 commits June 17, 2025 15:14

Add delete tasks

3f81970

Merge branch 'master' into cmanallen/replays-add-bulk-delete

9e2b921

Add endpoints and test coverage

b580547

Require project:write or project:admin privileges to use the resource

8d9fabd

cmanallen requested review from a team as code owners June 18, 2025 21:36

github-actions bot added the Scope: Backend label Jun 18, 2025

vercel bot deployed to Preview June 18, 2025 21:37 View deployment

cmanallen added 3 commits June 18, 2025 18:40

Clean up and refactors

c45e821

Allow configurable limit

ab0a920

Add non-mocked coverage

15620eb

vercel bot deployed to Preview June 18, 2025 23:59 View deployment

cmanallen added 2 commits June 20, 2025 08:40

Merge branch 'master' into cmanallen/replays-add-bulk-delete

5ed4a46

Add migration

e606acd

cmanallen requested a review from a team as a code owner June 20, 2025 13:43

vercel bot deployed to Preview June 20, 2025 13:45 View deployment

Fix typing

0dac928

vercel bot deployed to Preview June 20, 2025 13:50 View deployment

cmanallen added 3 commits June 20, 2025 08:53

Add countDeleted field

5e4b65f

Return a count of deleted

51d0d6b

Update lockfile

8413f69

vercel bot deployed to Preview June 20, 2025 14:09 View deployment

Delete in ascending timestamp order

2125f46

vercel bot deployed to Preview June 20, 2025 14:34 View deployment

JoshFerge reviewed Jun 20, 2025

View reviewed changes

markstory reviewed Jun 20, 2025

View reviewed changes

JoshFerge reviewed Jun 20, 2025

View reviewed changes

src/sentry/replays/tasks.py Show resolved Hide resolved

JoshFerge reviewed Jun 20, 2025

View reviewed changes

src/sentry/replays/usecases/delete.py Outdated Show resolved Hide resolved

cmanallen and others added 6 commits June 20, 2025 09:57

Increase deadline duration

8450804

Co-authored-by: Mark Story <mark@mark-story.com>

🛠️ apply pre-commit fixes

61df848

Use enum

209377f

Merge branch 'cmanallen/replays-add-bulk-delete' of https://github.co…

07558c1

…m/getsentry/sentry into cmanallen/replays-add-bulk-delete

Increase deadline duration

3e020a3

Co-authored-by: Mark Story <mark@mark-story.com>

🛠️ apply pre-commit fixes

da1c45a

vercel bot deployed to Preview June 20, 2025 15:02 View deployment

cmanallen added 3 commits June 20, 2025 10:03

Merge branch 'cmanallen/replays-add-bulk-delete' of https://github.co…

e4adaa6

…m/getsentry/sentry into cmanallen/replays-add-bulk-delete

Ack late

fa61b05

Add retry policy to ClickHouse query

252507a

vercel bot deployed to Preview June 20, 2025 15:29 View deployment

cmanallen added 4 commits June 20, 2025 10:52

DefaultFieldsModel

e11c857

Remove migration

42a37bd

Merge branch 'master' into cmanallen/replays-add-bulk-delete

9f09883

Add new migration

f7ae55c

vercel bot deployed to Preview June 20, 2025 15:56 View deployment

markstory approved these changes Jun 20, 2025

View reviewed changes

Use deterministic attribute to assert order

85ee7d5

vercel bot deployed to Preview June 20, 2025 16:20 View deployment

cmanallen merged commit 49a2758 into master Jun 20, 2025
65 checks passed

cmanallen deleted the cmanallen/replays-add-bulk-delete branch June 20, 2025 16:49

github-actions bot locked and limited conversation to collaborators Jul 6, 2025

Uh oh!

feat(replays): Add self-serve bulk deletes #93864

feat(replays): Add self-serve bulk deletes #93864

Uh oh!

Conversation

cmanallen commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

❌ File not in storage

Uh oh!

github-actions bot commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmanallen Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmanallen Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

markstory left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sentry-io bot commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Suspect Issues

Uh oh!

Uh oh!

cmanallen commented Jun 18, 2025 •

edited

Loading

codecov bot commented Jun 18, 2025 •

edited

Loading

github-actions bot commented Jun 20, 2025 •

edited

Loading

cmanallen Jun 20, 2025 •

edited

Loading

cmanallen Jun 20, 2025 •

edited

Loading

sentry-io bot commented Jun 20, 2025 •

edited

Loading