New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up pendingSegments table #3565
Comments
Ok, I will try to fix this issue |
@haoxiang47 expressed interest in submitting a PR to fix this issue - @gianm do you have any thoughts about how best to do this? |
Hmm, no immediate thoughts, other than we should make sure that we don't accidentally clean up pending segments that might be used by ingestions that are either long running or have been paused for a long time. |
Do you see any issue with cleaning the entries up after the first task in a set of replicas successfully completes handoff in Alternately KafkaSupervisor could manage the entry removal once all the tasks associated with a sequenceName have been completed or stopped, but it feels cleaner doing it in FiniteAppenderatorDriver since it was the one who created the entry in the first place. |
Ideally we don't rely on something acting at a single point in time for cleanup, because failures of that thing could leave rows dangling around forever. If we can remove all stale pending segments though, and not just the ones for our current sequence, that'd work. |
(subject to being careful not to remove pending segments that are for still active sequences) |
Hm alright, another way we could handle this is having a cleanup thread on the overlord that periodically compares the druid_pendingSegments table to the druid_tasks table and removes any entries in druid_pendingSegments with a sequenceName that doesn't have a corresponding 'active' entry in druid_tasks. |
well, in our system we have lots of datasources, so in overload it will create lots of tasks, when the times go, the table will biggger than before so that it will influence query from mysql. Now we just add index in druid_pendingSegments table, and it make the query fast a lot. So I think we can first simplely add indexes to this table and than cleanup the table automatically. |
Addressed by #5149 |
The pendingSegments table should have entries purged when they're no longer required to determine sequences of segments.
See: https://groups.google.com/forum/#!topic/druid-user/O0yxORw92VM
The text was updated successfully, but these errors were encountered: