Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow kill task to mark segments as unused #11501

Merged
merged 4 commits into from
Jul 29, 2021
Merged

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Jul 27, 2021

This PR adds a new markAsUnused option for the Kill Task, which allows it to mark any segments within the specified interval as unused before deleting the unused segments within an interval.

This is useful for allowing the mark unused -> delete sequence to happen with a single API call for the caller, as well as allowing the unmark action to occur under a task interval lock.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

docs/ingestion/data-management.md Outdated Show resolved Hide resolved

```json
{
"type": "kill",
"id": <task_id>,
"dataSource": <task_datasource>,
"interval" : <all_segments_in_this_interval_will_die!>,
"markAsUnused": <true|false>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems worth to mention what the default is.

"context": <task context>
}
```

If `markAsUnused` is true, the kill task will first mark any segments within the specified interval as unused, before deleting the unused segments within the interval.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could make this more scary because it cannot be undone once they delete segments with markAsUnused set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a WARNING section to the kill task

.createStatement(
StringUtils.format(
"UPDATE %s SET used=false WHERE dataSource = :dataSource "
+ "AND start >= :start AND %2$send%2$s <= :end",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm should the end be exclusive? I see IndexerSQLMetadataStorageCoordinator.retrieveUnusedSegmentsForInterval() uses the same filter of "end" <= :end, which seems to break the contract of IndexerMetadataStorageCoordinator.retrieveUnusedSegmentsForInterval() because the end time is exclusive for segment intervals. Maybe this hasn't caused much troubles so far because retrieveUnusedSegmentsForInterval returns only unused segments, even though it seems like a bug. But this method will unset the used flag and thus probably should respect the exclusivity of the end time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry NVM. I was confused.

jon-wei and others added 2 commits July 27, 2021 16:02
@zachjsh
Copy link
Contributor

zachjsh commented Jul 27, 2021

Looks good, just had a few comments. Also is it doable to add integration test for this? Can we piggy back on existiing integration test?

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jon-wei

Copy link
Contributor

@zachjsh zachjsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢

@jon-wei jon-wei merged commit 9b250c5 into apache:master Jul 29, 2021
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants