Skip to content

Make table deletion idempotent#10625

Open
KKcorps wants to merge 2 commits intoapache:masterfrom
KKcorps:async_table_deletion
Open

Make table deletion idempotent#10625
KKcorps wants to merge 2 commits intoapache:masterfrom
KKcorps:async_table_deletion

Conversation

@KKcorps
Copy link
Copy Markdown
Contributor

@KKcorps KKcorps commented Apr 17, 2023

This PR aims to solve the issue describe in #10603

A new flag async is added to the table request.

When set, we trigger table deletion via a background job and only return a job id. The table deletion then proceeds in background via a message handler on controller. The controller now waits for EV to converge and only then triggers deletion for segments and configs.

Also added a new API /tables/deleteTableStatus/{jobId} to track the progress of deletion.

It returns the response in the following format

 {
     "jobId":"01fe31f4-39ac-44f8-baaa-33289cf818cf",
     "segmentsPendingDeletion":[],
     "segmentsPendingDeletionCount":0,
     "messageCount":"1",
     "submissionTimeMs":"1681943216398",
     "jobType":"TABLE_DELETE",
     "tableName":"table0_REALTIME",
     "tableConfigDeleted":false
}

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 17, 2023

Codecov Report

Merging #10625 (25ebf03) into master (0cb456a) will decrease coverage by 0.05%.
The diff coverage is 73.36%.

@@             Coverage Diff              @@
##             master   #10625      +/-   ##
============================================
- Coverage     70.37%   70.32%   -0.05%     
- Complexity     6495     6498       +3     
============================================
  Files          2106     2108       +2     
  Lines        113004   113638     +634     
  Branches      17026    17126     +100     
============================================
+ Hits          79521    79918     +397     
- Misses        27917    28113     +196     
- Partials       5566     5607      +41     
Flag Coverage Δ
integration1 24.41% <21.19%> (-0.13%) ⬇️
integration2 24.02% <22.28%> (+0.01%) ⬆️
unittests1 67.89% <0.00%> (-0.09%) ⬇️
unittests2 13.99% <70.10%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ommon/messages/TableDeletionControllerMessage.java 0.00% <0.00%> (ø)
...apache/pinot/controller/BaseControllerStarter.java 82.93% <ø> (ø)
...er/ControllerUserDefinedMessageHandlerFactory.java 43.20% <70.00%> (+26.54%) ⬆️
...oller/api/resources/PinotTableRestletResource.java 54.58% <73.68%> (+1.38%) ⬆️
...ntroller/helix/core/PinotHelixResourceManager.java 71.62% <95.52%> (+0.67%) ⬆️
...mmon/metadata/controllerjob/ControllerJobType.java 100.00% <100.00%> (ø)

... and 85 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@KKcorps KKcorps force-pushed the async_table_deletion branch from 36486ba to cd23fdf Compare April 19, 2023 19:47
@KKcorps KKcorps force-pushed the async_table_deletion branch from cd23fdf to 25ebf03 Compare April 19, 2023 21:09
@KKcorps KKcorps requested a review from Jackie-Jiang April 19, 2023 21:46
@KKcorps KKcorps marked this pull request as ready for review April 19, 2023 21:46
@KKcorps KKcorps changed the title WIP: Make table deletion idempotent Make table deletion idempotent Apr 19, 2023
@KKcorps
Copy link
Copy Markdown
Contributor Author

KKcorps commented Apr 20, 2023

@jtao15 can you also take a look please?

Copy link
Copy Markdown
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the TableDeletionControllerMessage? Can we make it similar to other controller jobs, and use a ZK record to store the progress of table delete so that it is accessible by any controller?

@KKcorps
Copy link
Copy Markdown
Contributor Author

KKcorps commented Apr 24, 2023

I am already storing progress in the ZK. That's how the deleteStatusAPI works. Without the message thought how are we supposed to return API response immediately? Other jobs are dealing with IS update only and not EV converge which can take extremely long.

@Jackie-Jiang
Copy link
Copy Markdown
Contributor

You may take a look at how rebalance task is handled PinotTableRestletResource.rebalance(). Essentially we want to use a background executor to execute the task and keeps updating the ZK about the progress. Using Helix message will use the Helix executor, but we already have a dedicate executor in controller for such long running tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants