-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index_parallel task fails because of error in opening zip file (running on indexers) #11478
Comments
up |
I am running into this issue as well - posted in the druid forum here https://www.druidforum.org/t/error-in-opening-zip-file-during-ingestion/7429 |
Hello, |
Interesting. So your indexer nodes were getting pre-empted and that is what was causing this? So the zip file error is just a weird message for some other underlying issue |
Yep, I guess so. |
I am not - and this also repros consistently for me. Must be something like an OOM |
This issue has been marked as stale due to 280 days of inactivity. |
This issue has been closed due to lack of activity. If you think that |
Hi everyone,
Following a post on ASF slack, I open up a new issue here on the advice of someone from Imply.
We are running compaction tasks through indexers that randomly fail on phase 3 (partial_index_generic_merge) with the following error message (more details below) : "error in opening zip file"
The reply we had on slack :
As to the specific error, I'm not sure if it's exactly the same as what's going on in #9993, but that issue does point out an important thing, which is that if the shuffle server returns an error, the shuffle client will not actually log out that error, but it will just log this sort of obtuse zip decompression error. (Because it's trying to unzip the error message.) This isn't good error behavior, so we should adjust that to log the actual server error instead of trying to unzip the error message. Which is silly!
This seems an indexer bug .Could you please create a BUG request in druid github project with all the details.
Affected Version
0.21.0
Description
Cluster size
1 master (coordinator/overlord)
2 routers/brokers
~10 historicals
~20 indexers (dedicated to these tasks) + ~5 indexers for realtime ingestion (kafka)
~30TB data
Configurations in use
Spec object we are using :
{ "type": "index_parallel", "spec": { "ioConfig": { "type": "index_parallel", "inputSource": { "type": "druid", "dataSource": "events", "interval": "2021-07-13T00:00:00/2021-07-14T00:00:00" } }, "tuningConfig": { "type": "index_parallel", "partitionsSpec": { "type": "hashed", "maxRowsPerSegment": 800000 }, "forceGuaranteedRollup": true, "maxNumConcurrentSubTasks": 40, "totalNumMergeTasks": 20, "maxRetry": 10, "maxPendingPersists": 1, "maxRowsPerSegment": 800000 }, "dataSchema": { "dataSource": "events", "granularitySpec": { "type": "uniform", "queryGranularity": "HOUR", "segmentGranularity": "HOUR", "rollup": true }, "timestampSpec": { "column": "__time", "format": "iso" }, "dimensionsSpec": { }, "metricsSpec": [ ] } } }
Steps to reproduce the problem
Happens randomly
The error message or stack traces encountered. Providing more context, such as nearby log messages or even entire logs, can be helpful.
{"severity": "INFO", "message": "[[partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z]-threading-task-runner-executor-0] org.apache.druid.utils.CompressionUtils - Unzipping file[/opt/druid-data/task/partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z/work/indexing-tmp/2021-07-20T08:00:00.000Z/2021-07-20T09:00:00.000Z/10/temp_partial_index_generate_events_ooikmkan_2021-07-21T11:00:25.016Z] to [/opt/druid-data/task/partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z/work/indexing-tmp/2021-07-20T08:00:00.000Z/2021-07-20T09:00:00.000Z/10/unzipped_partial_index_generate_events_ooikmkan_2021-07-21T11:00:25.016Z]"} {"severity": "ERROR", "message": "[[partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z]-threading-task-runner-executor-0] org.apache.druid.indexing.overlord.ThreadingTaskRunner - Exception caught while running the task."} java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) ~[?:1.8.0_292] at java.util.zip.ZipFile.<init>(ZipFile.java:225) ~[?:1.8.0_292] at java.util.zip.ZipFile.<init>(ZipFile.java:155) ~[?:1.8.0_292] at java.util.zip.ZipFile.<init>(ZipFile.java:169) ~[?:1.8.0_292] at org.apache.druid.utils.CompressionUtils.unzip(CompressionUtils.java:235) ~[druid-core-0.21.0.jar:0.21.0] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.fetchSegmentFiles(PartialSegmentMergeTask.java:224) ~[druid-indexing-service-0.21.0.jar:0.21.0] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.runTask(PartialSegmentMergeTask.java:162) ~[druid-indexing-service-0.21.0.jar:0.21.0] at org.apache.druid.indexing.common.task.batch.parallel.PartialGenericSegmentMergeTask.runTask(PartialGenericSegmentMergeTask.java:41) ~[druid-indexing-service-0.21.0.jar:0.21.0] at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:152) ~[druid-indexing-service-0.21.0.jar:0.21.0] at org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:211) [druid-indexing-service-0.21.0.jar:0.21.0] at org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:151) [druid-indexing-service-0.21.0.jar:0.21.0] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
N/A
Any ideas on how to we can resolve this ?
Feel free to ask if you need anything else.
Thanks a lot
The text was updated successfully, but these errors were encountered: