Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KIS error when upgrading from 0.13 to 0.14 #7470

Closed
pdeva opened this issue Apr 13, 2019 · 3 comments · Fixed by #7512
Closed

KIS error when upgrading from 0.13 to 0.14 #7470

pdeva opened this issue Apr 13, 2019 · 3 comments · Fixed by #7512

Comments

@pdeva
Copy link
Contributor

pdeva commented Apr 13, 2019

Please provide a detailed title (e.g. "Broker crashes when using TopN query with Bound filter" instead of just "Broker crashes").

Affected Version

0.14

Description

2019-04-13 23:49:30,496 WARN o.a.d.i.c.a.RemoteTaskActionClient [task-runner-0-priority-0] Exception submitting action for task[index_kafka_infraserver-minute_b570a1d94fb7c21_dlmpiogd]
org.apache.druid.java.util.common.IOE: Error with status[400 Bad Request] and message[{"error":"Instantiation of [simple type, class org.apache.druid.indexing.kafka.KafkaPartitions] value failed: null (through reference chain: org.apache.druid.indexing.common.actions.TaskActionHolder[\"action\"]->org.apache.druid.indexing.common.actions.CheckPointDataSourceMetadataAction[\"previousCheckPoint\"]->org.apache.druid.indexing.kafka.KafkaDataSourceMetadata[\"partitions\"])"}]. Check overlord logs for details.
	at org.apache.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:95) [druid-indexing-service-0.14.0-incubating.jar:0.14.0-incubating]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:695) [druid-indexing-service-0.14.0-incubating.jar:0.14.0-incubating]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:246) [druid-indexing-service-0.14.0-incubating.jar:0.14.0-incubating]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:166) [druid-indexing-service-0.14.0-incubating.jar:0.14.0-incubating]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.14.0-incubating.jar:0.14.0-incubating]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.14.0-incubating.jar:0.14.0-incubating]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
@pdeva
Copy link
Contributor Author

pdeva commented Apr 13, 2019

seeing this NPE in coordinator+overlord node:

2019-04-13 23:54:21,014 INFO o.a.d.i.c.a.LocalTaskActionClient [qtp248705782-98] Performing action for task[index_kafka_infraserver-minute_b570a1d94fb7c21_gjkmfbfd]: CheckPointDataSourceMetadataAction{supervisorId='infraserver-minute', baseSequenceName='index_kafka_infraserver-minute_b570a1d94fb7c21', taskGroupId='0', previousCheckPoint=KafkaDataSourceMetadata{SeekableStreamStartSequenceNumbers=SeekableStreamStartSequenceNumbers{stream='infraserver', partitionSequenceNumberMap={0=20249923360}, exclusivePartitions=[]}}, currentCheckPoint=KafkaDataSourceMetadata{SeekableStreamStartSequenceNumbers=SeekableStreamEndSequenceNumbers{stream='infraserver', partitionSequenceNumberMap={0=20254957211}}}}
2019-04-13 23:54:21,015 INFO o.a.d.i.s.s.SeekableStreamSupervisor [qtp248705782-98] Checkpointing [KafkaDataSourceMetadata{SeekableStreamStartSequenceNumbers=SeekableStreamEndSequenceNumbers{stream='infraserver', partitionSequenceNumberMap={0=20254957211}}}] for taskGroup [0]
2019-04-13 23:54:21,028 ERROR o.a.d.i.s.s.SeekableStreamSupervisor [KafkaSupervisor-infraserver-minute] SeekableStreamSupervisor[infraserver-minute] failed to handle notice: {class=org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor, exceptionType=class java.lang.NullPointerException, exceptionMessage=null, noticeClass=CheckpointNotice}
java.lang.NullPointerException
        at org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor$CheckpointNotice.handle(SeekableStreamSupervisor.java:380) ~[druid-indexing-service-0.14.0-incubating.jar:0.14.0-incubating]
        at org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor.lambda$tryInit$3(SeekableStreamSupervisor.java:724) ~[druid-indexing-service-0.14.0-incubating.jar:0.14.0-incubating]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_181]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]

@gianm
Copy link
Contributor

gianm commented Apr 19, 2019

This looks like a rolling-update issue between 0.12/13 and 0.14: the checkpoint is sent from task -> supervisor with a new-form startPartitions that the older supervisor cannot understand. Things should work fine once everything is on 0.14, but we still need to fix this so rolling updates work as expected.

@gianm
Copy link
Contributor

gianm commented Apr 19, 2019

By the way, I think this rolling update will work if you do Overlord first, before any MiddleManagers. We do generally want updates to work with the Overlord either early or late (see http://druid.io/docs/latest/operations/rolling-updates.html). So please try that Overlord-first method as a workaround if you can.

gianm added a commit to implydata/druid-public that referenced this issue Apr 19, 2019
This allows them to be deserialized by older Druid versions as
KafkaPartitions objects.

Fixes apache#7470.
@fjy fjy closed this as completed in #7512 Apr 19, 2019
fjy pushed a commit that referenced this issue Apr 19, 2019
…s. (#7512)

This allows them to be deserialized by older Druid versions as
KafkaPartitions objects.

Fixes #7470.
gianm added a commit to implydata/druid-public that referenced this issue Apr 20, 2019
…s. (apache#7512)

This allows them to be deserialized by older Druid versions as
KafkaPartitions objects.

Fixes apache#7470.
clintropolis pushed a commit that referenced this issue Apr 24, 2019
…s. (#7512)

This allows them to be deserialized by older Druid versions as
KafkaPartitions objects.

Fixes #7470.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants