Index_parallel task fails because of error in opening zip file (running on indexers) #11478

vpeack · 2021-07-21T15:31:02Z

Hi everyone,

Following a post on ASF slack, I open up a new issue here on the advice of someone from Imply.
We are running compaction tasks through indexers that randomly fail on phase 3 (partial_index_generic_merge) with the following error message (more details below) : "error in opening zip file"

The reply we had on slack :
As to the specific error, I'm not sure if it's exactly the same as what's going on in #9993, but that issue does point out an important thing, which is that if the shuffle server returns an error, the shuffle client will not actually log out that error, but it will just log this sort of obtuse zip decompression error. (Because it's trying to unzip the error message.) This isn't good error behavior, so we should adjust that to log the actual server error instead of trying to unzip the error message. Which is silly!
This seems an indexer bug .Could you please create a BUG request in druid github project with all the details.

Affected Version

0.21.0

Description

Cluster size
1 master (coordinator/overlord)
2 routers/brokers
~10 historicals
~20 indexers (dedicated to these tasks) + ~5 indexers for realtime ingestion (kafka)
~30TB data
Configurations in use
Spec object we are using :
{ "type": "index_parallel", "spec": { "ioConfig": { "type": "index_parallel", "inputSource": { "type": "druid", "dataSource": "events", "interval": "2021-07-13T00:00:00/2021-07-14T00:00:00" } }, "tuningConfig": { "type": "index_parallel", "partitionsSpec": { "type": "hashed", "maxRowsPerSegment": 800000 }, "forceGuaranteedRollup": true, "maxNumConcurrentSubTasks": 40, "totalNumMergeTasks": 20, "maxRetry": 10, "maxPendingPersists": 1, "maxRowsPerSegment": 800000 }, "dataSchema": { "dataSource": "events", "granularitySpec": { "type": "uniform", "queryGranularity": "HOUR", "segmentGranularity": "HOUR", "rollup": true }, "timestampSpec": { "column": "__time", "format": "iso" }, "dimensionsSpec": { }, "metricsSpec": [ ] } } }
Steps to reproduce the problem
Happens randomly
The error message or stack traces encountered. Providing more context, such as nearby log messages or even entire logs, can be helpful.

{"severity": "INFO", "message": "[[partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z]-threading-task-runner-executor-0] org.apache.druid.utils.CompressionUtils - Unzipping file[/opt/druid-data/task/partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z/work/indexing-tmp/2021-07-20T08:00:00.000Z/2021-07-20T09:00:00.000Z/10/temp_partial_index_generate_events_ooikmkan_2021-07-21T11:00:25.016Z] to [/opt/druid-data/task/partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z/work/indexing-tmp/2021-07-20T08:00:00.000Z/2021-07-20T09:00:00.000Z/10/unzipped_partial_index_generate_events_ooikmkan_2021-07-21T11:00:25.016Z]"} {"severity": "ERROR", "message": "[[partial_index_generic_merge_events_gpceoeme_2021-07-21T11:15:41.883Z]-threading-task-runner-executor-0] org.apache.druid.indexing.overlord.ThreadingTaskRunner - Exception caught while running the task."} java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) ~[?:1.8.0_292] at java.util.zip.ZipFile.<init>(ZipFile.java:225) ~[?:1.8.0_292] at java.util.zip.ZipFile.<init>(ZipFile.java:155) ~[?:1.8.0_292] at java.util.zip.ZipFile.<init>(ZipFile.java:169) ~[?:1.8.0_292] at org.apache.druid.utils.CompressionUtils.unzip(CompressionUtils.java:235) ~[druid-core-0.21.0.jar:0.21.0] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.fetchSegmentFiles(PartialSegmentMergeTask.java:224) ~[druid-indexing-service-0.21.0.jar:0.21.0] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.runTask(PartialSegmentMergeTask.java:162) ~[druid-indexing-service-0.21.0.jar:0.21.0] at org.apache.druid.indexing.common.task.batch.parallel.PartialGenericSegmentMergeTask.runTask(PartialGenericSegmentMergeTask.java:41) ~[druid-indexing-service-0.21.0.jar:0.21.0] at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:152) ~[druid-indexing-service-0.21.0.jar:0.21.0] at org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:211) [druid-indexing-service-0.21.0.jar:0.21.0] at org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:151) [druid-indexing-service-0.21.0.jar:0.21.0] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_292] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_292] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]

Any debugging that you have already done
N/A

Any ideas on how to we can resolve this ?
Feel free to ask if you need anything else.

Thanks a lot

The text was updated successfully, but these errors were encountered:

vpeack · 2021-10-14T16:27:45Z

up

ritvik-statsig · 2022-06-14T23:03:25Z

I am running into this issue as well - posted in the druid forum here https://www.druidforum.org/t/error-in-opening-zip-file-during-ingestion/7429

ThomasBarach · 2022-06-15T09:09:26Z

Hello,
FYI, we're not seeing this error anymore. We were using GCP preemptible instances back then. Once we've switched to non-preemptible instances, everything was fine.

ritvik-statsig · 2022-06-15T16:18:00Z

Interesting. So your indexer nodes were getting pre-empted and that is what was causing this? So the zip file error is just a weird message for some other underlying issue

ThomasBarach · 2022-06-15T16:22:38Z

Yep, I guess so.
Are you using Spot/Preemptible cloud instances as well?

ritvik-statsig · 2022-06-15T16:28:48Z

I am not - and this also repros consistently for me. Must be something like an OOM

github-actions · 2023-11-09T00:16:32Z

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

github-actions · 2023-12-08T00:16:42Z

This issue has been closed due to lack of activity. If you think that
is incorrect, or the issue requires additional review, you can revive the issue at
any time.

vpeack added the Uncategorized problem report label Jul 21, 2021

github-actions bot added the stale label Nov 9, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index_parallel task fails because of error in opening zip file (running on indexers) #11478

Index_parallel task fails because of error in opening zip file (running on indexers) #11478

vpeack commented Jul 21, 2021 •

edited

Loading

vpeack commented Oct 14, 2021

ritvik-statsig commented Jun 14, 2022

ThomasBarach commented Jun 15, 2022

ritvik-statsig commented Jun 15, 2022

ThomasBarach commented Jun 15, 2022

ritvik-statsig commented Jun 15, 2022

github-actions bot commented Nov 9, 2023

github-actions bot commented Dec 8, 2023

Index_parallel task fails because of error in opening zip file (running on indexers) #11478

Index_parallel task fails because of error in opening zip file (running on indexers) #11478

Comments

vpeack commented Jul 21, 2021 • edited Loading

Affected Version

Description

vpeack commented Oct 14, 2021

ritvik-statsig commented Jun 14, 2022

ThomasBarach commented Jun 15, 2022

ritvik-statsig commented Jun 15, 2022

ThomasBarach commented Jun 15, 2022

ritvik-statsig commented Jun 15, 2022

github-actions bot commented Nov 9, 2023

github-actions bot commented Dec 8, 2023

vpeack commented Jul 21, 2021 •

edited

Loading