-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
feat(replays): Add self-serve bulk deletes #93864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is ❌ File not in storage
Additional details and impacted files@@ Coverage Diff @@
## master #93864 +/- ##
===========================================
+ Coverage 53.07% 88.04% +34.97%
===========================================
Files 10334 10349 +15
Lines 596983 597858 +875
Branches 23221 23221
===========================================
+ Hits 316821 526360 +209539
+ Misses 279658 70994 -208664
Partials 504 504 |
This PR has a migration; here is the generated SQL for for --
-- Create model ReplayDeletionJobModel
--
CREATE TABLE "replays_replaydeletionjob" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "date_updated" timestamp with time zone NOT NULL, "date_added" timestamp with time zone NOT NULL, "range_start" timestamp with time zone NOT NULL, "range_end" timestamp with time zone NOT NULL, "environments" text[] NOT NULL, "organization_id" bigint NOT NULL, "project_id" bigint NOT NULL, "status" varchar NOT NULL, "query" text NOT NULL, "offset" integer NOT NULL);
CREATE INDEX CONCURRENTLY "replays_replaydeletionjob_organization_id_19b03747" ON "replays_replaydeletionjob" ("organization_id");
CREATE INDEX CONCURRENTLY "replays_replaydeletionjob_project_id_447cb4a5" ON "replays_replaydeletionjob" ("project_id"); |
|
||
# Create the deletion job | ||
job = ReplayDeletionJobModel.objects.create( | ||
range_start=data["rangeStart"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use our own standardized statsPeriod params instead of rangeStart
and rangeEnd
? get_date_range_from_params
or get_filter_params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address in a follow-up.
), | ||
), | ||
( | ||
"organization_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is organization_id necessary on this model? or is project_id enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was originally an organization endpoint so you could list the deletions for all of your projects. I simplified to a project endpoint but I think that organization list view could make a return. The org-id simplifies querying that regard.
status="pending", | ||
) | ||
|
||
# We always start with an offset of 0 (obviously) but future work doesn't need to obey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see below that we only set in-progress if offset is 0. does this comment mean soon people will be able to start with an offset greater than 0, and if so, should we pass a new parameter to the job first_run
or somesuch to control the in-progress setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just for restarting a previously run task so in-progress should already be set. I think the offset parameter is not well used right now. So I may remove it until that use case actually comes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to address the offset issue in a follow-up. I want to make sure this is implemented thoughtfully and there's pressure to start deleting using this task quickly.
job = ReplayDeletionJobModel.objects.create( | ||
range_start=data["rangeStart"], | ||
range_end=data["rangeEnd"], | ||
environments=data["environments"], | ||
organization_id=project.organization_id, | ||
project_id=project.id, | ||
query=data["query"], | ||
status="pending", | ||
) | ||
|
||
# We always start with an offset of 0 (obviously) but future work doesn't need to obey | ||
# this. You're free to start from wherever you want. | ||
run_bulk_replay_delete_job.delay(job.id, offset=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty similar to our existing deletions system, but we don't have a good solution for range deletions currently. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a link to the existing system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It lives in sentry.deletions is is used for bulk deletes and large object tree deletion. It uses periodic tasks to ensure that deletions are driven to completion in spite of all the failures/disruptions our systems can have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably pick your brain on this later and we can possibly fold this into the existing system. Would be nice to re-use common systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
@@ -0,0 +1,155 @@ | |||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know we have code existing for replays deletions, is any of it able to be reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meant to replace it. The previous script causes some load issues (ClickHouse and TaskWorker). The goal was basically to copy paste the old behavior and tweak it to be more reliable. Once its all merged and verified to be working I'll remove the current delete behavior.
But yeah that old code was re-used just in a copy-paste sort of way. There's enough tweaks that abstracting it didn't make much sense. Plus we're actively using the script and I didn't want to break it while I was developing this new system.
Co-authored-by: Mark Story <mark@mark-story.com>
…m/getsentry/sentry into cmanallen/replays-add-bulk-delete
Co-authored-by: Mark Story <mark@mark-story.com>
…m/getsentry/sentry into cmanallen/replays-add-bulk-delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema and task configuration looks good to me.
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
Enables customers to bulk delete their own data asynchronously.