-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow kill task to mark segments as unused #11501
Conversation
|
||
```json | ||
{ | ||
"type": "kill", | ||
"id": <task_id>, | ||
"dataSource": <task_datasource>, | ||
"interval" : <all_segments_in_this_interval_will_die!>, | ||
"markAsUnused": <true|false>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems worth to mention what the default is.
docs/ingestion/data-management.md
Outdated
"context": <task context> | ||
} | ||
``` | ||
|
||
If `markAsUnused` is true, the kill task will first mark any segments within the specified interval as unused, before deleting the unused segments within the interval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could make this more scary because it cannot be undone once they delete segments with markAsUnused
set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a WARNING section to the kill task
.createStatement( | ||
StringUtils.format( | ||
"UPDATE %s SET used=false WHERE dataSource = :dataSource " | ||
+ "AND start >= :start AND %2$send%2$s <= :end", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm should the end be exclusive? I see IndexerSQLMetadataStorageCoordinator.retrieveUnusedSegmentsForInterval()
uses the same filter of "end" <= :end
, which seems to break the contract of IndexerMetadataStorageCoordinator.retrieveUnusedSegmentsForInterval()
because the end time is exclusive for segment intervals. Maybe this hasn't caused much troubles so far because retrieveUnusedSegmentsForInterval
returns only unused segments, even though it seems like a bug. But this method will unset the used flag and thus probably should respect the exclusivity of the end time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I used the same query as run by the markUnused
endpoint on the coordinator:
https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/metadata/SqlSegmentsMetadataManager.java#L836
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry NVM. I was confused.
Co-authored-by: Jihoon Son <jihoonson@apache.org>
...-service/src/test/java/org/apache/druid/indexing/common/task/KillUnusedSegmentsTaskTest.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/client/indexing/HttpIndexingServiceClient.java
Show resolved
Hide resolved
Looks good, just had a few comments. Also is it doable to add integration test for this? Can we piggy back on existiing integration test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @jon-wei
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚢
This PR adds a new
markAsUnused
option for the Kill Task, which allows it to mark any segments within the specified interval as unused before deleting the unused segments within an interval.This is useful for allowing the mark unused -> delete sequence to happen with a single API call for the caller, as well as allowing the unmark action to occur under a task interval lock.
This PR has: