-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Labels
Description
Very frequently, druid tasks fail on Azure Storage with com.microsoft.azure.storage.StorageException: The specified block list is invalid.
Does anybody know what could be the issue or how to solve this? Full stack trace below:
2018-11-03T01:05:05,774 ERROR [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged index[native-content-events]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class java.lang.RuntimeException, exceptionMessage=java.io.IOException, interval=2018-11-02T00:00:00.000Z/2018-11-03T00:00:00.000Z}
java.lang.RuntimeException: java.io.IOException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.storage.azure.AzureDataSegmentPusher.push(AzureDataSegmentPusher.java:162) ~[?:?]
at io.druid.segment.realtime.plumber.RealtimePlumber$2.doRun(RealtimePlumber.java:447) [druid-server-0.12.3.jar:0.12.3]
at io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:42) [druid-common-0.12.3.jar:0.12.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
Caused by: java.io.IOException
at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:639) ~[?:?]
at com.microsoft.azure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:280) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:580) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:497) ~[?:?]
at io.druid.storage.azure.AzureStorage.uploadBlob(AzureStorage.java:86) ~[?:?]
at io.druid.storage.azure.AzureDataSegmentPusher.uploadDataSegment(AzureDataSegmentPusher.java:115) ~[?:?]
at io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:155) ~[?:?]
at io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:151) ~[?:?]
at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) ~[java-util-0.12.3.jar:0.12.3]
at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[java-util-0.12.3.jar:0.12.3]
at io.druid.storage.azure.AzureUtils.retryAzureOperation(AzureUtils.java:58) ~[?:?]
at io.druid.storage.azure.AzureDataSegmentPusher.push(AzureDataSegmentPusher.java:149) ~[?:?]
... 5 more
Caused by: com.microsoft.azure.storage.StorageException: The specified block list is invalid.
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89) ~[?:?]
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307) ~[?:?]
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.commitBlockList(CloudBlockBlob.java:245) ~[?:?]
at com.microsoft.azure.storage.blob.BlobOutputStream.commit(BlobOutputStream.java:313) ~[?:?]
at com.microsoft.azure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:277) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:580) ~[?:?]
at com.microsoft.azure.storage.blob.CloudBlockBlob.upload(CloudBlockBlob.java:497) ~[?:?]
at io.druid.storage.azure.AzureStorage.uploadBlob(AzureStorage.java:86) ~[?:?]
at io.druid.storage.azure.AzureDataSegmentPusher.uploadDataSegment(AzureDataSegmentPusher.java:115) ~[?:?]
at io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:155) ~[?:?]
at io.druid.storage.azure.AzureDataSegmentPusher$1.call(AzureDataSegmentPusher.java:151) ~[?:?]
at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) ~[java-util-0.12.3.jar:0.12.3]
at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) ~[java-util-0.12.3.jar:0.12.3]
at io.druid.storage.azure.AzureUtils.retryAzureOperation(AzureUtils.java:58) ~[?:?]
at io.druid.storage.azure.AzureDataSegmentPusher.push(AzureDataSegmentPusher.java:149) ~[?:?]
... 5 more
2018-11-03T01:05:05,844 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[native-content-events_2018-11-02T00:00:00.000Z_2018-11-03T00:00:00.000Z_2018-11-02T00:00:18.801Z] at path[/druid/segments/10.244.9.75:8101/10.244.9.75:8101_realtime__default_tier_2018-11-02T00:00:18.927Z_5ce6a9ad766940469addc11518f593bf0]
2018-11-03T01:05:05,844 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.curator.announcement.Announcer - unannouncing [/druid/segments/10.244.9.75:8101/10.244.9.75:8101_realtime__default_tier_2018-11-02T00:00:18.927Z_5ce6a9ad766940469addc11518f593bf0]
2018-11-03T01:05:05,858 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf]: LockReleaseAction{interval=2018-11-02T00:00:00.000Z/2018-11-03T00:00:00.000Z}
2018-11-03T01:05:05,859 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf] to overlord: [LockReleaseAction{interval=2018-11-02T00:00:00.000Z/2018-11-03T00:00:00.000Z}].
2018-11-03T01:05:05,860 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.java.util.http.client.pool.ChannelResourceFactory - Generating: http://10.244.2.87:8080
2018-11-03T01:05:05,881 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Deleting Index File[var/druid/task/index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf/work/persist/native-content-events/2018-11-02T00:00:00.000Z_2018-11-03T00:00:00.000Z]
2018-11-03T01:05:05,904 INFO [native-content-events-2018-11-02T00:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Removing sinkKey 1541116800000 for segment native-content-events_2018-11-02T00:00:00.000Z_2018-11-03T00:00:00.000Z_2018-11-02T00:00:18.801Z
2018-11-03T01:05:05,911 ERROR [task-runner-0-priority-0] io.druid.indexing.common.task.RealtimeIndexTask - Failed to finish realtime task: {class=io.druid.indexing.common.task.RealtimeIndexTask, exceptionType=class io.druid.java.util.common.ISE, exceptionMessage=Exception occurred during persist and merge.}
io.druid.java.util.common.ISE: Exception occurred during persist and merge.
at io.druid.segment.realtime.plumber.RealtimePlumber.finishJob(RealtimePlumber.java:557) ~[druid-server-0.12.3.jar:0.12.3]
at io.druid.indexing.common.task.RealtimeIndexTask.run(RealtimeIndexTask.java:458) [druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.3.jar:0.12.3]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
2018-11-03T01:05:05,912 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[AbstractTask{id='index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf', groupId='index_realtime_native-content-events', taskResource=TaskResource{availabilityGroup='native-content-events-2018-11-02T00:00:00.000Z-0000', requiredCapacity=1}, dataSource='native-content-events', context={}}]
io.druid.java.util.common.ISE: Exception occurred during persist and merge.
at io.druid.segment.realtime.plumber.RealtimePlumber.finishJob(RealtimePlumber.java:557) ~[druid-server-0.12.3.jar:0.12.3]
at io.druid.indexing.common.task.RealtimeIndexTask.run(RealtimeIndexTask.java:458) ~[druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.3.jar:0.12.3]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]
2018-11-03T01:05:05,912 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf] status changed to [FAILED].
2018-11-03T01:05:05,913 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_realtime_native-content-events_2018-11-02T00:00:00.000Z_0_0_icgaffmf",
"status" : "FAILED",
"duration" : 90289188
}
Reactions are currently unavailable