index_parallel task fails if segmentGranularity has a timeZone #9993

tarpdalton · 2020-06-05T17:52:48Z

Affected Version

0.18.0 and 0.18.1

Description

Cluster size

1 master (coordinator/overlord)
1 router/broker
1 historical
3-10 middleManagers

Steps to reproduce the problem

create and run an index_parallel task
- must include a timeZone in the segmentGranularity in the granularitySpec in the dataSchema
- must have maxNumConcurrentSubTasks greater than 1 in the tuningConfig
- must have type as hashed for partitionsSpec in tuningConfig

The error message or stack traces encountered.

The main error is the ZipException

2020-06-04T23:39:20,955 INFO [task-runner-0-priority-0] org.apache.druid.utils.CompressionUtils - Unzipping file[var/druid/task/partial_index_merge_datasource_1_geoeiplm_2020-06-04T23:39:16.988Z/work/indexing-tmp/2020-04-24T04:00:00.000Z/2020-04-25T04:00:00.000Z/1/temp_partial_index_generate_datasource_1_ieoldkdf_2020-06-04T23:39:01.964Z] to [var/druid/task/partial_index_merge_datasource_1_geoeiplm_2020-06-04T23:39:16.988Z/work/indexing-tmp/2020-04-24T04:00:00.000Z/2020-04-25T04:00:00.000Z/1/unzipped_partial_index_generate_datasource_1_ieoldkdf_2020-06-04T23:39:01.964Z]
2020-06-04T23:39:20,956 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Exception while running task[AbstractTask{id='partial_index_merge_datasource_1_geoeiplm_2020-06-04T23:39:16.988Z', groupId='index_parallel_datasource_1_jjglpmkc_2020-06-04T23:38:57.541Z', taskResource=TaskResource{availabilityGroup='partial_index_merge_datasource_1_geoeiplm_2020-06-04T23:39:16.988Z', requiredCapacity=1}, dataSource='datasource_1', context={forceTimeChunkLock=true}}]
java.util.zip.ZipException: error in opening zip file
	at java.util.zip.ZipFile.open(Native Method) ~[?:1.8.0_252]
	at java.util.zip.ZipFile.<init>(ZipFile.java:225) ~[?:1.8.0_252]
	at java.util.zip.ZipFile.<init>(ZipFile.java:155) ~[?:1.8.0_252]
	at java.util.zip.ZipFile.<init>(ZipFile.java:169) ~[?:1.8.0_252]
	at org.apache.druid.utils.CompressionUtils.unzip(CompressionUtils.java:250) ~[druid-core-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.fetchSegmentFiles(PartialSegmentMergeTask.java:231) ~[druid-indexing-service-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.runTask(PartialSegmentMergeTask.java:169) ~[druid-indexing-service-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.common.task.batch.parallel.PartialHashSegmentMergeTask.runTask(PartialHashSegmentMergeTask.java:44) ~[druid-indexing-service-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:123) ~[druid-indexing-service-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:421) [druid-indexing-service-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:393) [druid-indexing-service-0.18.1.jar:0.18.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]

The unzip fails because findPartitionFile fails to find the partition created during the partial_index_generate task. getPartition returns the error message instead of the zip file. So the unzip fails.

The partition file is stored with the timezone offset in the path like this:
2020-04-24T00:00:00.000-04:00/2020-04-25T00:00:00.000-04:00

/tmp/intermediary-segments/index_parallel_datasource_1_iiocmdme_2020-06-04T23:15:56.314Z/2020-04-24T00:00:00.000-04:00/2020-04-25T00:00:00.000-04:00/1/partial_index_generate_datasource_1_cgdlipdp_2020-06-04T23:16:02.960Z

But the http request to getPartition uses the UTC time
startTime=2020-04-24T04:00:00.000Z&endTime=2020-04-25T04:00:00.000Z

2020-06-04T23:39:20,945 DEBUG [HttpClient-Netty-Worker-0] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://<hostname_removed>:8091/druid/worker/v1/shuffle/task/index_parallel_datasource_1_jjglpmkc_2020-06-04T23%3A38%3A57.541Z/partial_index_generate_datasource_1_ieoldkdf_2020-06-04T23%3A39%3A01.964Z/partition?startTime=2020-04-24T04:00:00.000Z&endTime=2020-04-25T04:00:00.000Z&partitionId=1] Got response: 404 Not Found

Any debugging that you have already done

I'm not very familiar with the druid code so I'm not sure if there is a simple code fix. @jihoonson might know how to fix it, since he is working on #8061.

It looks like startTime and endTime param args are from

partial_index_merge
  spec
    ioConfig
      partitionLocations
        interval

Maybe you could store interval with the tz offset instead of the materialized UTC time?

The text was updated successfully, but these errors were encountered:

jihoonson · 2020-06-05T18:04:52Z

@tarpdalton thank you for the detailed report! I don't have a concrete idea to fix the bug right now, but will take a look.

FrankChen021 · 2020-06-06T10:07:48Z

@jihoonson I don't understand what's the meaning of setting timeZone, origin for segmentGranularity, and I don't see any document about this. There is another segmentGranularity setting problem #9894 .

jihoonson · 2020-06-08T14:48:53Z

@FrankChen021 it's documented here. #9894 is about duration segment granularity and doesn't seem related to this issue.

FrankChen021 · 2020-06-08T15:02:28Z

@FrankChen021 it's documented here. #9894 is about duration segment granularity and doesn't seem related to this issue.

The doc is about query granularity, although segment granularity shares the same type as query granularity, it does not explain why people need to care about timezone/origin of segment granularity. I don’t see any beneficial from these two parameters on segment granularity

jihoonson · 2020-06-08T15:48:04Z

Yes, the doc should say it can be used for segment granularity as well. However, it is at least linked https://druid.apache.org/docs/latest/ingestion/index.html#granularityspec.

The doc is about query granularity, although segment granularity shares the same type as query granularity, it does not explain why people need to care about timezone/origin of segment granularity. I don’t see any beneficial from these two parameters on segment granularity

I'm not sure what you are suggesting. The timezone is useful when you have timestamps of a different timezone from the one where your druid is running. The origin is useful when you want to make time buckets differently.

tarpdalton · 2020-06-08T17:44:27Z

I'll share my use case for segment granularity. Here is my granularity spec for loading some data:

      "granularitySpec": {
        "segmentGranularity": {
          "type": "period",
          "period": "P1D",
          "timeZone": "America/New_York"
        },
        "queryGranularity": {
          "type": "period",
          "period": "P1D",
          "timeZone": "America/New_York"
        },
        "rollup": true,
        "intervals": [
          "2020-05-12T00:00:00-04:00/2020-05-13T00:00:00-04:00"
        ]
      },

I am rolling up in daily buckets, but offset by the timezone. The granularity is big so the roll up is more efficient.
The event data that I am storing in druid occurs in the EST/EDT timezone. When I query druid to see how many events happened March 12th; I want to see events from March 12th EDT, not March 12th UTC.

FrankChen021 · 2020-06-09T07:18:52Z

@tarpdalton I see there's some benefits by setting timezone for segment granularity. Each segment starts at 00:00 EDT instead of UTC, to query data within a local day exactly falls into one segment. But if segment starts at 00:00 +0, data in local time may spread in two segments.

harshmohta · 2022-07-05T18:58:36Z

We are also facing similar issue, We are using druid 0.20.2. Issue started coming when we moved from ec2 r5.4x instance to i3en.4x instance

` "tuningConfig": {
"type": "index_parallel",
"splitHintSpec": {
"type": "maxSize",
"maxNumFiles": 2
},
"partitionsSpec" : {
"type" : "hashed",
"numShards": 35
},

  "forceGuaranteedRollup": true,
  "totalNumMergeTasks": 100,
  "maxNumSegmentsToMerge": 100,
  "maxNumConcurrentSubTasks": 500,
  "maxRowsInMemory": 3000000,
  "maxPendingPersists": 1,
  "useCombiner" : true,
  "forceExtendableShardSpecs" : true,
  "indexSpec": {
    "bitmap": {
      "type": "roaring"
    },
    "dimensionCompression": "lz4",
    "metricCompression": "lz4"
  }
}`

2022-07-05T18:36:46,245 ERROR [[partial_index_generic_merge_datasource_mngkoeoe_2022-07-05T18:36:44.742Z]-threading-task-runner-executor-3] org.apache.druid.indexing.overlord.ThreadingTaskRunner - Exception caught while running the task. java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) ~[?:1.8.0_282] at java.util.zip.ZipFile.<init>(ZipFile.java:225) ~[?:1.8.0_282] at java.util.zip.ZipFile.<init>(ZipFile.java:155) ~[?:1.8.0_282] at java.util.zip.ZipFile.<init>(ZipFile.java:169) ~[?:1.8.0_282] at org.apache.druid.utils.CompressionUtils.unzip(CompressionUtils.java:250) ~[druid-core-0.20.1.jar:0.20.1] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.fetchSegmentFiles(PartialSegmentMergeTask.java:220) ~[druid-indexing-service-0.20.1.jar:0.20.1] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.runTask(PartialSegmentMergeTask.java:158) ~[druid-indexing-service-0.20.1.jar:0.20.1] at org.apache.druid.indexing.common.task.batch.parallel.PartialGenericSegmentMergeTask.runTask(PartialGenericSegmentMergeTask.java:41) ~[druid-indexing-service-0.20.1.jar:0.20.1] at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:140) ~[druid-indexing-service-0.20.1.jar:0.20.1] at org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:211) [druid-indexing-service-0.20.1.jar:0.20.1] at org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:151) [druid-indexing-service-0.20.1.jar:0.20.1] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282] 2022-07-05T18:36:46,246 ERROR [threading-task-runner-executor-3] org.apache.druid.segment.realtime.appenderator.UnifiedIndexerAppenderatorsManager - Could not find datasource bundle for [datasource], task [partial_index_generic_merge_datasource_mngkoeoe_2022-07-05T18:36:44.742Z]

tarpdalton added the Uncategorized problem report label Jun 5, 2020

jihoonson added Area - Batch Ingestion Bug and removed Uncategorized problem report labels Jun 5, 2020

vpeack mentioned this issue Jul 21, 2021

Index_parallel task fails because of error in opening zip file (running on indexers) #11478

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index_parallel task fails if segmentGranularity has a timeZone #9993

index_parallel task fails if segmentGranularity has a timeZone #9993

tarpdalton commented Jun 5, 2020

jihoonson commented Jun 5, 2020

FrankChen021 commented Jun 6, 2020

jihoonson commented Jun 8, 2020

FrankChen021 commented Jun 8, 2020

jihoonson commented Jun 8, 2020

tarpdalton commented Jun 8, 2020

FrankChen021 commented Jun 9, 2020

harshmohta commented Jul 5, 2022

index_parallel task fails if segmentGranularity has a timeZone #9993

index_parallel task fails if segmentGranularity has a timeZone #9993

Comments

tarpdalton commented Jun 5, 2020

Affected Version

Description

Cluster size

Steps to reproduce the problem

The error message or stack traces encountered.

Any debugging that you have already done

jihoonson commented Jun 5, 2020

FrankChen021 commented Jun 6, 2020

jihoonson commented Jun 8, 2020

FrankChen021 commented Jun 8, 2020

jihoonson commented Jun 8, 2020

tarpdalton commented Jun 8, 2020

FrankChen021 commented Jun 9, 2020

harshmohta commented Jul 5, 2022