Skip to content

feat(replays): Add self-serve bulk deletes #93864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jun 20, 2025

Conversation

cmanallen
Copy link
Member

@cmanallen cmanallen commented Jun 18, 2025

Enables customers to bulk delete their own data asynchronously.

@cmanallen cmanallen requested review from a team as code owners June 18, 2025 21:36
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jun 18, 2025
Copy link

codecov bot commented Jun 18, 2025

Codecov Report

Attention: Patch coverage is 99.10515% with 4 lines in your changes missing coverage. Please review.

❌ File not in storage

No result to display due to the CLI not being able to find the file.
Please ensure the file contains junit in the name and automated file search is enabled,
or the desired file specified by the file and search_dir arguments of the CLI.

Files with missing lines Patch % Lines
src/sentry/replays/usecases/delete.py 94.11% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #93864       +/-   ##
===========================================
+ Coverage   53.07%   88.04%   +34.97%     
===========================================
  Files       10334    10349       +15     
  Lines      596983   597858      +875     
  Branches    23221    23221               
===========================================
+ Hits       316821   526360   +209539     
+ Misses     279658    70994   -208664     
  Partials      504      504               

Copy link
Contributor

github-actions bot commented Jun 20, 2025

This PR has a migration; here is the generated SQL for src/sentry/replays/migrations/0006_add_bulk_delete_job.py

for 0006_add_bulk_delete_job in replays

--
-- Create model ReplayDeletionJobModel
--
CREATE TABLE "replays_replaydeletionjob" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "date_updated" timestamp with time zone NOT NULL, "date_added" timestamp with time zone NOT NULL, "range_start" timestamp with time zone NOT NULL, "range_end" timestamp with time zone NOT NULL, "environments" text[] NOT NULL, "organization_id" bigint NOT NULL, "project_id" bigint NOT NULL, "status" varchar NOT NULL, "query" text NOT NULL, "offset" integer NOT NULL);
CREATE INDEX CONCURRENTLY "replays_replaydeletionjob_organization_id_19b03747" ON "replays_replaydeletionjob" ("organization_id");
CREATE INDEX CONCURRENTLY "replays_replaydeletionjob_project_id_447cb4a5" ON "replays_replaydeletionjob" ("project_id");


# Create the deletion job
job = ReplayDeletionJobModel.objects.create(
range_start=data["rangeStart"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use our own standardized statsPeriod params instead of rangeStart and rangeEnd? get_date_range_from_params or get_filter_params

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address in a follow-up.

),
),
(
"organization_id",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is organization_id necessary on this model? or is project_id enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally an organization endpoint so you could list the deletions for all of your projects. I simplified to a project endpoint but I think that organization list view could make a return. The org-id simplifies querying that regard.

status="pending",
)

# We always start with an offset of 0 (obviously) but future work doesn't need to obey
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see below that we only set in-progress if offset is 0. does this comment mean soon people will be able to start with an offset greater than 0, and if so, should we pass a new parameter to the job first_run or somesuch to control the in-progress setting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably just for restarting a previously run task so in-progress should already be set. I think the offset parameter is not well used right now. So I may remove it until that use case actually comes.

Copy link
Member Author

@cmanallen cmanallen Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to address the offset issue in a follow-up. I want to make sure this is implemented thoughtfully and there's pressure to start deleting using this task quickly.

Comment on lines +90 to +102
job = ReplayDeletionJobModel.objects.create(
range_start=data["rangeStart"],
range_end=data["rangeEnd"],
environments=data["environments"],
organization_id=project.organization_id,
project_id=project.id,
query=data["query"],
status="pending",
)

# We always start with an offset of 0 (obviously) but future work doesn't need to obey
# this. You're free to start from wherever you want.
run_bulk_replay_delete_job.delay(job.id, offset=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty similar to our existing deletions system, but we don't have a good solution for range deletions currently. 🤔

Copy link
Member Author

@cmanallen cmanallen Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a link to the existing system?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It lives in sentry.deletions is is used for bulk deletes and large object tree deletion. It uses periodic tasks to ensure that deletions are driven to completion in spite of all the failures/disruptions our systems can have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will probably pick your brain on this later and we can possibly fold this into the existing system. Would be nice to re-use common systems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@@ -0,0 +1,155 @@
from __future__ import annotations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know we have code existing for replays deletions, is any of it able to be reused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant to replace it. The previous script causes some load issues (ClickHouse and TaskWorker). The goal was basically to copy paste the old behavior and tweak it to be more reliable. Once its all merged and verified to be working I'll remove the current delete behavior.

But yeah that old code was re-used just in a copy-paste sort of way. There's enough tweaks that abstracting it didn't make much sense. Plus we're actively using the script and I didn't want to break it while I was developing this new system.

Copy link
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema and task configuration looks good to me.

@cmanallen cmanallen merged commit 49a2758 into master Jun 20, 2025
65 checks passed
@cmanallen cmanallen deleted the cmanallen/replays-add-bulk-delete branch June 20, 2025 16:49
Copy link

sentry-io bot commented Jun 20, 2025

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

Did you find this useful? React with a 👍 or 👎

@github-actions github-actions bot locked and limited conversation to collaborators Jul 6, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants