Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index_parallel task fails if segmentGranularity has a timeZone #9993

Open
tarpdalton opened this issue Jun 5, 2020 · 8 comments
Open

index_parallel task fails if segmentGranularity has a timeZone #9993

tarpdalton opened this issue Jun 5, 2020 · 8 comments

Comments

@tarpdalton
Copy link
Contributor

Affected Version

0.18.0 and 0.18.1

Description

Cluster size

  • 1 master (coordinator/overlord)
  • 1 router/broker
  • 1 historical
  • 3-10 middleManagers

Steps to reproduce the problem

  • create and run an index_parallel task
    • must include a timeZone in the segmentGranularity in the granularitySpec in the dataSchema
    • must have maxNumConcurrentSubTasks greater than 1 in the tuningConfig
    • must have type as hashed for partitionsSpec in tuningConfig

The error message or stack traces encountered.

The main error is the ZipException

2020-06-04T23:39:20,955 INFO [task-runner-0-priority-0] org.apache.druid.utils.CompressionUtils - Unzipping file[var/druid/task/partial_index_merge_datasource_1_geoeiplm_2020-06-04T23:39:16.988Z/work/indexing-tmp/2020-04-24T04:00:00.000Z/2020-04-25T04:00:00.000Z/1/temp_partial_index_generate_datasource_1_ieoldkdf_2020-06-04T23:39:01.964Z] to [var/druid/task/partial_index_merge_datasource_1_geoeiplm_2020-06-04T23:39:16.988Z/work/indexing-tmp/2020-04-24T04:00:00.000Z/2020-04-25T04:00:00.000Z/1/unzipped_partial_index_generate_datasource_1_ieoldkdf_2020-06-04T23:39:01.964Z]
2020-06-04T23:39:20,956 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Exception while running task[AbstractTask{id='partial_index_merge_datasource_1_geoeiplm_2020-06-04T23:39:16.988Z', groupId='index_parallel_datasource_1_jjglpmkc_2020-06-04T23:38:57.541Z', taskResource=TaskResource{availabilityGroup='partial_index_merge_datasource_1_geoeiplm_2020-06-04T23:39:16.988Z', requiredCapacity=1}, dataSource='datasource_1', context={forceTimeChunkLock=true}}]
java.util.zip.ZipException: error in opening zip file
	at java.util.zip.ZipFile.open(Native Method) ~[?:1.8.0_252]
	at java.util.zip.ZipFile.<init>(ZipFile.java:225) ~[?:1.8.0_252]
	at java.util.zip.ZipFile.<init>(ZipFile.java:155) ~[?:1.8.0_252]
	at java.util.zip.ZipFile.<init>(ZipFile.java:169) ~[?:1.8.0_252]
	at org.apache.druid.utils.CompressionUtils.unzip(CompressionUtils.java:250) ~[druid-core-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.fetchSegmentFiles(PartialSegmentMergeTask.java:231) ~[druid-indexing-service-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.runTask(PartialSegmentMergeTask.java:169) ~[druid-indexing-service-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.common.task.batch.parallel.PartialHashSegmentMergeTask.runTask(PartialHashSegmentMergeTask.java:44) ~[druid-indexing-service-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:123) ~[druid-indexing-service-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:421) [druid-indexing-service-0.18.1.jar:0.18.1]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:393) [druid-indexing-service-0.18.1.jar:0.18.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]

The unzip fails because findPartitionFile fails to find the partition created during the partial_index_generate task. getPartition returns the error message instead of the zip file. So the unzip fails.

The partition file is stored with the timezone offset in the path like this:
2020-04-24T00:00:00.000-04:00/2020-04-25T00:00:00.000-04:00

/tmp/intermediary-segments/index_parallel_datasource_1_iiocmdme_2020-06-04T23:15:56.314Z/2020-04-24T00:00:00.000-04:00/2020-04-25T00:00:00.000-04:00/1/partial_index_generate_datasource_1_cgdlipdp_2020-06-04T23:16:02.960Z

But the http request to getPartition uses the UTC time
startTime=2020-04-24T04:00:00.000Z&endTime=2020-04-25T04:00:00.000Z

2020-06-04T23:39:20,945 DEBUG [HttpClient-Netty-Worker-0] org.apache.druid.java.util.http.client.NettyHttpClient - [GET http://<hostname_removed>:8091/druid/worker/v1/shuffle/task/index_parallel_datasource_1_jjglpmkc_2020-06-04T23%3A38%3A57.541Z/partial_index_generate_datasource_1_ieoldkdf_2020-06-04T23%3A39%3A01.964Z/partition?startTime=2020-04-24T04:00:00.000Z&endTime=2020-04-25T04:00:00.000Z&partitionId=1] Got response: 404 Not Found

Any debugging that you have already done

I'm not very familiar with the druid code so I'm not sure if there is a simple code fix. @jihoonson might know how to fix it, since he is working on #8061.

It looks like startTime and endTime param args are from

partial_index_merge
  spec
    ioConfig
      partitionLocations
        interval

Maybe you could store interval with the tz offset instead of the materialized UTC time?

@jihoonson
Copy link
Contributor

@tarpdalton thank you for the detailed report! I don't have a concrete idea to fix the bug right now, but will take a look.

@FrankChen021
Copy link
Member

@jihoonson I don't understand what's the meaning of setting timeZone, origin for segmentGranularity, and I don't see any document about this. There is another segmentGranularity setting problem #9894 .

@jihoonson
Copy link
Contributor

@FrankChen021 it's documented here. #9894 is about duration segment granularity and doesn't seem related to this issue.

@FrankChen021
Copy link
Member

@FrankChen021 it's documented here. #9894 is about duration segment granularity and doesn't seem related to this issue.

The doc is about query granularity, although segment granularity shares the same type as query granularity, it does not explain why people need to care about timezone/origin of segment granularity. I don’t see any beneficial from these two parameters on segment granularity

@jihoonson
Copy link
Contributor

Yes, the doc should say it can be used for segment granularity as well. However, it is at least linked https://druid.apache.org/docs/latest/ingestion/index.html#granularityspec.

The doc is about query granularity, although segment granularity shares the same type as query granularity, it does not explain why people need to care about timezone/origin of segment granularity. I don’t see any beneficial from these two parameters on segment granularity

I'm not sure what you are suggesting. The timezone is useful when you have timestamps of a different timezone from the one where your druid is running. The origin is useful when you want to make time buckets differently.

@tarpdalton
Copy link
Contributor Author

I'll share my use case for segment granularity. Here is my granularity spec for loading some data:

      "granularitySpec": {
        "segmentGranularity": {
          "type": "period",
          "period": "P1D",
          "timeZone": "America/New_York"
        },
        "queryGranularity": {
          "type": "period",
          "period": "P1D",
          "timeZone": "America/New_York"
        },
        "rollup": true,
        "intervals": [
          "2020-05-12T00:00:00-04:00/2020-05-13T00:00:00-04:00"
        ]
      },

I am rolling up in daily buckets, but offset by the timezone. The granularity is big so the roll up is more efficient.
The event data that I am storing in druid occurs in the EST/EDT timezone. When I query druid to see how many events happened March 12th; I want to see events from March 12th EDT, not March 12th UTC.

@FrankChen021
Copy link
Member

@tarpdalton I see there's some benefits by setting timezone for segment granularity. Each segment starts at 00:00 EDT instead of UTC, to query data within a local day exactly falls into one segment. But if segment starts at 00:00 +0, data in local time may spread in two segments.

@harshmohta
Copy link

We are also facing similar issue, We are using druid 0.20.2. Issue started coming when we moved from ec2 r5.4x instance to i3en.4x instance

` "tuningConfig": {
"type": "index_parallel",
"splitHintSpec": {
"type": "maxSize",
"maxNumFiles": 2
},
"partitionsSpec" : {
"type" : "hashed",
"numShards": 35
},

  "forceGuaranteedRollup": true,
  "totalNumMergeTasks": 100,
  "maxNumSegmentsToMerge": 100,
  "maxNumConcurrentSubTasks": 500,
  "maxRowsInMemory": 3000000,
  "maxPendingPersists": 1,
  "useCombiner" : true,
  "forceExtendableShardSpecs" : true,
  "indexSpec": {
    "bitmap": {
      "type": "roaring"
    },
    "dimensionCompression": "lz4",
    "metricCompression": "lz4"
  }
}`

2022-07-05T18:36:46,245 ERROR [[partial_index_generic_merge_datasource_mngkoeoe_2022-07-05T18:36:44.742Z]-threading-task-runner-executor-3] org.apache.druid.indexing.overlord.ThreadingTaskRunner - Exception caught while running the task. java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) ~[?:1.8.0_282] at java.util.zip.ZipFile.<init>(ZipFile.java:225) ~[?:1.8.0_282] at java.util.zip.ZipFile.<init>(ZipFile.java:155) ~[?:1.8.0_282] at java.util.zip.ZipFile.<init>(ZipFile.java:169) ~[?:1.8.0_282] at org.apache.druid.utils.CompressionUtils.unzip(CompressionUtils.java:250) ~[druid-core-0.20.1.jar:0.20.1] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.fetchSegmentFiles(PartialSegmentMergeTask.java:220) ~[druid-indexing-service-0.20.1.jar:0.20.1] at org.apache.druid.indexing.common.task.batch.parallel.PartialSegmentMergeTask.runTask(PartialSegmentMergeTask.java:158) ~[druid-indexing-service-0.20.1.jar:0.20.1] at org.apache.druid.indexing.common.task.batch.parallel.PartialGenericSegmentMergeTask.runTask(PartialGenericSegmentMergeTask.java:41) ~[druid-indexing-service-0.20.1.jar:0.20.1] at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:140) ~[druid-indexing-service-0.20.1.jar:0.20.1] at org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:211) [druid-indexing-service-0.20.1.jar:0.20.1] at org.apache.druid.indexing.overlord.ThreadingTaskRunner$1.call(ThreadingTaskRunner.java:151) [druid-indexing-service-0.20.1.jar:0.20.1] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282] 2022-07-05T18:36:46,246 ERROR [threading-task-runner-executor-3] org.apache.druid.segment.realtime.appenderator.UnifiedIndexerAppenderatorsManager - Could not find datasource bundle for [datasource], task [partial_index_generic_merge_datasource_mngkoeoe_2022-07-05T18:36:44.742Z]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants