Clean up stale entries from upgradeSegments table#15637
Clean up stale entries from upgradeSegments table#15637AmatyaAvadhanula merged 8 commits intoapache:masterfrom
Conversation
kfaraz
left a comment
There was a problem hiding this comment.
Using a task action to perform metadata cleanup makes perfect sense. I have left some feedback that should be addressed before this is merged.
There a couple more concerns:
- Rolling Upgrade: A task running on an upgraded version would fire the new action but an old Overlord would not be able to recognize it.
- Should the new action be fired only for supervisor type tasks? Atleast in the impl of the task action, we should do some filtering and proceed to DELETE entries only for relevant supervisor tasks with REPLACE context. Otherwise, we would just be firing redundant DELETE statements on the metadata.
server/src/main/java/org/apache/druid/indexing/overlord/IndexerMetadataStorageCoordinator.java
Outdated
Show resolved
Hide resolved
| * @return Map from append Segment ID to REPLACE lock version | ||
| */ | ||
| private Map<String, String> getAppendSegmentsCommittedDuringTask( | ||
| @VisibleForTesting |
There was a problem hiding this comment.
Please don't do this. Try to find some other cleaner way to test this.
| import com.fasterxml.jackson.core.type.TypeReference; | ||
| import org.apache.druid.indexing.common.task.Task; | ||
|
|
||
| public class CleanMetadataAction implements TaskAction<Void> |
There was a problem hiding this comment.
Please add a javadoc here.
I would prefer using some kind of response object rather than void. This response object could potentially contain the information of whether the cleanup was successful or not, how many rows of which table were deleted, etc. (probably not needed in this PR, but let's return something meaningful anyway.)
Also, rename:
| public class CleanMetadataAction implements TaskAction<Void> | |
| public class CleanupMetadataAction implements TaskAction<Void> |
|
|
||
| // call it 3 times, once to update location in setup, then one for status and location in cleanup | ||
| Mockito.verify(taskActionClient, times(3)).submit(any()); | ||
| // call it 3 times, once to update location in setup, then one for status and location in cleanup, |
There was a problem hiding this comment.
| // call it 3 times, once to update location in setup, then one for status and location in cleanup, | |
| // call it 4 times, once to update location in setup, then one for status and location in cleanup, |
|
@kfaraz, thank you for the feedback.
Apologies, I had missed this. An alternative could be to invoke the cleanup from TaskLockbox directly when the task is removed from the set of active tasks instead of using a new task action. |
Is there any cleanup which is currently performed by the TaskLockbox? |
Yes, the TaskLockbox is responsible for clearing the lock entries from the metadata store The TaskQueue could also call a task action while removing a task from its set of active tasks. This happens on the Overlord itself so there wouldn't be an issue with rolling upgrades. |
|
@AmatyaAvadhanula , let's keep it simple. Since the new task action is responsible only for cleanup of upgrade segments (something which doesn't happen in the current version anyway), let's just put the submission of the task action in a try catch. If it succeeds, cool. If the OL cannot identify the task action and throws an exception, we just log the exception saying that the OL is probably on an old version. This would ensure that the task itself doesn't fail if the OL is on an old version. Also, please make sure you test out this scenario. |
| try { | ||
| try { | ||
| log.info("Removing task[%s] from activeTasks", task.getId()); | ||
| if (findLocksForTask(task).stream().anyMatch(lock -> lock.getType() == TaskLockType.REPLACE)) { |
There was a problem hiding this comment.
Should we use findReplaceLocksForTask method instead? But it would return only non-revoked locks.
kfaraz
left a comment
There was a problem hiding this comment.
Minor comments, otherwise LGTM
| DataSegment retrieveSegmentForId(String id, boolean includeUnused); | ||
|
|
||
| /** | ||
| * Clean entries in upgrade segments table after the corresponding replace task has ended |
There was a problem hiding this comment.
| * Clean entries in upgrade segments table after the corresponding replace task has ended | |
| * Delete entries from upgrade segments table after the corresponding replace task has ended |
|
@kfaraz , Thank you for the review! |
* Clean up stale entries from upgradeSegments table
The upgradeSegments table contains entries of pending segments allocated by tasks with APPEND locks when there was a REPLACE lock held over the interval.
This PR aims to clean the entries from the metadata store after the task with the REPLACE lock completes by invoking a clean up method before unlocking its locks.
Since this is performed on the TaskLockbox directly, rolling upgrades wouldn't be affected and the clean up would happen automatically once the Overlord is upgraded. Also, the task lockbox is aware of the locks held by the task and this can help us avoid unnecessary calls to the metadata store to clean stale entries.
This PR has: