Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Kafka supervisor adopting running tasks between versions #6958

Closed
dclim opened this issue Jan 30, 2019 · 6 comments · Fixed by #7212
Closed

Support Kafka supervisor adopting running tasks between versions #6958

dclim opened this issue Jan 30, 2019 · 6 comments · Fixed by #7212
Labels
Milestone

Comments

@dclim
Copy link
Contributor

dclim commented Jan 30, 2019

Motivation

A Kafka supervisor expects to 'own' all index_kafka type tasks running on its configured datasource. When the supervisor starts up, it checks TaskStorage to get a list of all the active tasks, and for each running Kafka task for the datasource, it checks whether the task is 'adoptable' or has to be terminated because it does not match the supervisor's configuration. A task is adoptable by a supervisor if it was created with the same strategy for assigning Kafka partitions, has a set of starting offsets that matches the state described in the metadata store, and has the same data schema and tuning config. All of this information is represented in the sequenceName hash that is calculated on task creation and is stored as part of the task metadata.

Currently, when checking for adoptability, the supervisor generates a hash based on its expected starting offsets and its data schema / tuning configuration, and then compares the hash to the one generated when the task was created. If they match, the supervisor adopts and starts tracking the lifecycle of the task; otherwise the task is killed and a new one is created from the supervisor's configuration.

If between Druid versions the definition of DataSchema or KafkaTuningConfig changes (for example a new field is added), this implementation will cause the calculated hash to differ and the task to be rejected, even if the ingestion specs are 'equivalent'. This is because the logic compares the stored hash generated using the old ingestion schema definition with one generated with the new schema definition which may be backward-compatible but generates a different hash. As a result, doing a rolling update of the overlord may cause indexing tasks to terminate unnecessarily.

Proposed Changes

Instead of the supervisor computing the sequenceName hash for its own configuration and comparing this to the hash the task was created with, the supervisor should re-compute the task's hash using the task's data schema and tuning config. This will have the effect of changes in the data schema and tuning config class definitions being represented in the hash so that a proper comparison can be made.

Changed Interfaces

None

Migration

None

Alternatives

An alternative implementation might be to exclude the data schema and tuning config from the hash altogether, meaning that a supervisor would adopt an existing running task without consideration of the ingestion parameters, and would just let it run to completion. Once that task completes, the supervisor would then spawn subsequent tasks that match the supervisor's new configuration regardless of how the previous task was configured.

I think the complexity here comes in when you consider replica tasks and what a supervisor should do if there are replicas and a task fails. Since replicas are expected to do the same thing and generate identical segments independently, the supervisor would need to spawn a replacement task that has a configuration that matches the old configuration being used by the replica rather than using its own configuration.

There may be other complications in not including the ingestion spec as part of the sequenceName hash. Ultimately, the proposed change feels more straightforward and accomplishes the objective of supporting task adoption across versions.

@dclim dclim added the Proposal label Jan 30, 2019
@justinborromeo
Copy link
Contributor

I'm working on this issue

@pdeva
Copy link
Contributor

pdeva commented Mar 11, 2019

is this the cause of #6854

@gianm
Copy link
Contributor

gianm commented Mar 11, 2019

It sounds like it probably is.

@pdeva
Copy link
Contributor

pdeva commented Mar 11, 2019

the documentation for rolling updates should then atleast reflect this info that there can be several minutes of query downtime when doing a rolling update to the overlord.
if any configuration value can be adjusted to make this downtime shorter, it should be listed in said documentation

@gianm
Copy link
Contributor

gianm commented Mar 11, 2019

It looks like there is intent to fix this before the next release anyway.

@pdeva
Copy link
Contributor

pdeva commented Mar 11, 2019

correct me if i am wrong, but this is scheduled for 0.15 release right?
shouldn't we atleast update the 0.14 release where this issue will still exist

pdeva added a commit to pdeva/druid that referenced this issue Mar 12, 2019
upgrading overlord nodes in 0.14.0 can result in few minutes of query downtime for KIS tasks, as noticed in apache#6854
this behavior should be documented until this is fixed by apache#6958 in 0.15.
@jihoonson jihoonson added this to the 0.14.1 milestone May 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants