index_hadoop tasks fail on wrong file format when run inside indexer #8840

sixtus · 2019-11-07T11:12:00Z

we have seen this before, it's not replacing : and thus HDFS refuses the file as it's illegal.

I guess yet another thing fixed in peon but not in indexer?

java.lang.IllegalArgumentException: Pathname /druid/indexer/foo/2019-10-29T00:00:00.000Z_2019-10-30T00:00:00.000Z/2019-11-06T19:57:56.216Z/29/index.zip.0 from hdfs://us2/druid/indexer/foo/2019-10-29T00:00:00.000Z_2019-10-30T00:00:00.000Z/2019-11-06T19:57:56.216Z/29/index.zip.0 is not a valid DFS filename.

The text was updated successfully, but these errors were encountered:

sixtus · 2019-11-10T01:17:22Z

ok, I think I found something:

druid.storage.storageDirectory=/druid/indexer works for index_kafka and compact, but it doesn't for index_hadoop. for that it has to be hdfs:// prefixed, or the segments end up in hdfs, but as local file in the database. Maybe that triggered the above too?

sixtus · 2019-11-12T14:34:14Z

ok, so I fixed the config problem and revert to mainline. The problem is back. index-hadoop fails on trying to write a file containing a :.

gianm · 2019-11-12T15:08:54Z

Hi @sixtus,

Is this in Druid 0.16.0?

Am I hearing you right that when you run on 0.16.0 using MM + peons, everything works fine, but when you run on 0.16.0 with an Indexer, you get this behavior where : are not replaced with _ in segment names?

sixtus · 2019-11-12T15:12:32Z

Yes, this is druid 0.16.0. I just fixed the config bug and reverted and the bug is back. I am not sure if it's running on peons (didn't validate that), but it sure isn't working when running on indexer.

My PR has 2 commits. I was expecting it to work after the first and it didn't. But I think this was just a jar caching issue. I am about to test it again.

gianm · 2019-11-12T15:13:54Z

What are your druid.storage.* properties?

I am looking at the code that handles HDFS path colon replacement, and it hasn't changed in a while, and I don't see a reason for it to behave differently on MM/Peon vs Indexer. Also, we do have other users on 0.16.0 + HDFS deep storage. So it should still work…

But if it doesn't please feed us some more clues and we can get to the bottom of it.

sixtus · 2019-11-12T15:21:33Z

It's working for compact and index_kafka. However index_hadoop is written by Hadoop itself, I think that's the problem. I just removed the broken commit from my PR.

druid.storage.type=hdfs
druid.storage.storageDirectory=hdfs://us2/druid/indexer

and yes, I have been doing this for a while. I remember filing a bug back in the days. And now it's back.

sixtus · 2019-11-12T15:23:05Z

to be more precise, it's the hadoop reduce task that fails

sixtus · 2019-11-12T15:33:09Z

and there is no alternative to index_hadoop for raw files in HDFS, at least it's not obvious from the documentation.

gianm · 2019-11-12T21:16:09Z

Hmm, I tried the Hadoop tutorial, which uses HDFS deep storage, and it worked ok for me. I wonder if something weird is going on with your setup.

Do you have a stack trace from the reduce task that fails?

sixtus · 2019-11-12T21:28:11Z

11:33:28.944 [main] ERROR org.apache.druid.indexer.JobHelper - Exception in retry loop
java.lang.IllegalArgumentException: Pathname /druid/indexer/foo/2019-11-12T04:00:00.000Z_2019-11-12T05:00:00.000Z/2019-11-12T10:04:03.778Z/0/index.zip.0 from hdfs://us2/druid/indexer/foo/2019-11-12T04:00:00.000Z_2019-11-12T05:00:00.000Z/2019-11-12T10:04:03.778Z/0/index.zip.0 is not a valid DFS filename.
	at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:217) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:476) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:473) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:473) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:414) ~[hadoop-hdfs-client-2.8.5.jar:?]
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:929) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890) ~[hadoop-common-2.8.5.jar:?]
	at org.apache.druid.indexer.JobHelper$2.push(JobHelper.java:452) [druid-indexing-hadoop-0.16.0-lqm1.jar:0.16.0-lqm1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_222]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_222]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_222]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_222]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409) [hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163) [hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155) [hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) [hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346) [hadoop-common-2.8.5.jar:?]
	at com.sun.proxy.$Proxy78.push(Unknown Source) [?:?]
	at org.apache.druid.indexer.JobHelper.serializeOutIndex(JobHelper.java:469) [druid-indexing-hadoop-0.16.0-lqm1.jar:0.16.0-lqm1]
	at org.apache.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:827) [druid-indexing-hadoop-0.16.0-lqm1.jar:0.16.0-lqm1]
	at org.apache.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:579) [druid-indexing-hadoop-0.16.0-lqm1.jar:0.16.0-lqm1]
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) [hadoop-mapreduce-client-core-2.8.5.jar:?]
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) [hadoop-mapreduce-client-core-2.8.5.jar:?]
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) [hadoop-mapreduce-client-core-2.8.5.jar:?]
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) [hadoop-mapreduce-client-app-2.8.5.jar:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_222]
	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_222]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) [hadoop-common-2.8.5.jar:?]
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) [hadoop-mapreduce-client-app-2.8.5.jar:?]

we upgraded to 2.8.5 right around the same time as druid 0.16

sixtus · 2019-11-12T21:31:02Z

The example is druid.storage.type=local btw

sixtus · 2019-11-12T22:15:53Z

I just tried my patch, it's not working

sixtus · 2019-11-18T11:15:34Z

I just noticed the kill task is also throwing an exception:

Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]: Error: java.lang.IllegalArgumentException: Pathname /druid/indexer/foo/2019-11-17T17:00:00.000Z_2019-11-17T18:00:00.000Z/2019-11-18T07:08:21.067Z/0/index.zip.1 from hdfs://us2/druid/indexer/foo/2019-11-17T17:00:00.000Z_2019-11-17T18:00:00.000Z/2019-11-18T07:08:21.067Z/0/index.zip.1 is not a valid DFS filename.
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:217)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:476)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:473)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:473)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:414)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:929)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.druid.indexer.JobHelper$2.push(JobHelper.java:452)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at java.lang.reflect.Method.invoke(Method.java:498)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at com.sun.proxy.$Proxy78.push(Unknown Source)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.druid.indexer.JobHelper.serializeOutIndex(JobHelper.java:469)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:828)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:579)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at java.security.AccessController.doPrivileged(Native Method)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at javax.security.auth.Subject.doAs(Subject.java:422)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
Nov 18 11:59:13 dl385g10-nm14-01-nr106 druid-indexer[60784]:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)

I verified, there is no broken name in the meta storage. I.e. the kill task must generated it itself (rather than use the path from metastore) and then runs into the same trap.

From my limited understanding, it looks like it's not instantiating HdfsDataSegmentPusher but rather LocalDataSegmentPusher- still puzzled though.

jon-wei · 2019-12-06T02:40:01Z

It sounds like the config is specifying druid.storage.type=local somewhere, is it possible that there's a stray/stale entry for that somewhere in your runtime properties?

When the Druid processes start up, they'll log their configuration properties; for the indexer process, do you see it using druid.storage.type=hdfs at that point?

sixtus · 2019-12-09T08:10:08Z

I am using my own chef recipe (just for context: we are the first production installation outside metamarket, we are using druid in production since 2012), just for paranoia I used grep. Nothing "local". Also the recipe used to work on 0.13. On the staging system, I switched to Minio (s3) and there I get a valid segment in minio, but "local" as type in the payload.

…

On Fri, Dec 6, 2019 at 3:40 AM Jonathan Wei ***@***.***> wrote: It sounds like the config is specifying druid.storage.type=local somewhere, is it possible that there's a stray/stale entry for that somewhere in your runtime properties? When the Druid processes start up, they'll log their configuration properties; for the indexer process, do you see it using druid.storage.type=hdfs at that point? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8840>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAWNUOJKPKIWRMY7KRMUMLQXG3QHANCNFSM4JKFAJSQ> .

-- *Hagen Rother* Lead Architect | LiquidM ------------------------------ LiquidM Technology GmbH Invalidenstraße 74 | 10557 Berlin | Germany Phone: +49 176 15 00 38 77 Internet: www.liquidm.com | LinkedIn < http://www.linkedin.com/company/3488199?trk=tyah&trkInfo=tas%3AliquidM%2Cidx%3A1-2-2>

------------------------------ Managing Directors | Philipp Simon & Thomas Hille Jurisdiction | Local Court Berlin-Charlottenburg HRB 152426 B

sixtus · 2019-12-09T09:54:48Z

I think I found something: the staging system started working as expected when I used a middleManager rather than an indexer. a) any idea where to look? b) can I pin a certain type of work to a specific middleManager (I only seen by data source) c) can I run a 2nd (independent) set over overlord/middlemanager to split index_kafka and index_hadoop? The thread model of the middleManager is killing performance for index_kafka Thanks! Hagen On Mon, Dec 9, 2019 at 9:09 AM Hagen Rother <hagen.rother@liquidm.com> wrote:

…

I am using my own chef recipe (just for context: we are the first production installation outside metamarket, we are using druid in production since 2012), just for paranoia I used grep. Nothing "local". Also the recipe used to work on 0.13. On the staging system, I switched to Minio (s3) and there I get a valid segment in minio, but "local" as type in the payload. On Fri, Dec 6, 2019 at 3:40 AM Jonathan Wei ***@***.***> wrote: > It sounds like the config is specifying druid.storage.type=local > somewhere, is it possible that there's a stray/stale entry for that > somewhere in your runtime properties? > > When the Druid processes start up, they'll log their configuration > properties; for the indexer process, do you see it using > druid.storage.type=hdfs at that point? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#8840>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAAWNUOJKPKIWRMY7KRMUMLQXG3QHANCNFSM4JKFAJSQ> > . > -- *Hagen Rother* Lead Architect | LiquidM ------------------------------ LiquidM Technology GmbH Invalidenstraße 74 | 10557 Berlin | Germany Phone: +49 176 15 00 38 77 Internet: www.liquidm.com | LinkedIn < http://www.linkedin.com/company/3488199?trk=tyah&trkInfo=tas%3AliquidM%2Cidx%3A1-2-2> ------------------------------ Managing Directors | Philipp Simon & Thomas Hille Jurisdiction | Local Court Berlin-Charlottenburg HRB 152426 B

-- *Hagen Rother* Lead Architect | LiquidM ------------------------------ LiquidM Technology GmbH Invalidenstraße 74 | 10557 Berlin | Germany Phone: +49 176 15 00 38 77 Internet: www.liquidm.com | LinkedIn < http://www.linkedin.com/company/3488199?trk=tyah&trkInfo=tas%3AliquidM%2Cidx%3A1-2-2>

------------------------------ Managing Directors | Philipp Simon & Thomas Hille Jurisdiction | Local Court Berlin-Charlottenburg HRB 152426 B

jon-wei · 2019-12-10T01:28:14Z

@sixtus I was able to reproduce this, it appears to be an issue where the Hadoop mapper or reducer is not picking up the druid.storage.type property correctly, will look into fixing this for 0.17.0

jon-wei · 2019-12-17T22:33:28Z

@sixtus I've opened a fix PR for this: #9059

sixtus added the Uncategorized problem report label Nov 7, 2019

sixtus mentioned this issue Nov 9, 2019

ensure valid HDFS names #8849

Closed

2 tasks

sixtus changed the title ~~hadoop_index tasks fail on wrong file format when run inside indexer~~ index_hadoop tasks fail on wrong file format when run inside indexer Nov 12, 2019

jon-wei added Bug and removed Uncategorized problem report labels Dec 10, 2019

jon-wei added this to the 0.17.0 milestone Dec 10, 2019

jon-wei added the Area - Batch Ingestion label Dec 17, 2019

jon-wei mentioned this issue Dec 17, 2019

Fix hadoop ingestion property handling when using indexers #9059

Merged

8 tasks

jon-wei closed this as completed in #9059 Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index_hadoop tasks fail on wrong file format when run inside indexer #8840

index_hadoop tasks fail on wrong file format when run inside indexer #8840

sixtus commented Nov 7, 2019

sixtus commented Nov 10, 2019

sixtus commented Nov 12, 2019

gianm commented Nov 12, 2019

sixtus commented Nov 12, 2019

gianm commented Nov 12, 2019

sixtus commented Nov 12, 2019 •

edited

sixtus commented Nov 12, 2019

sixtus commented Nov 12, 2019

gianm commented Nov 12, 2019

sixtus commented Nov 12, 2019 •

edited

sixtus commented Nov 12, 2019

sixtus commented Nov 12, 2019

sixtus commented Nov 18, 2019 •

edited

jon-wei commented Dec 6, 2019

sixtus commented Dec 9, 2019 via email

sixtus commented Dec 9, 2019 via email

jon-wei commented Dec 10, 2019

jon-wei commented Dec 17, 2019

index_hadoop tasks fail on wrong file format when run inside indexer #8840

index_hadoop tasks fail on wrong file format when run inside indexer #8840

Comments

sixtus commented Nov 7, 2019

sixtus commented Nov 10, 2019

sixtus commented Nov 12, 2019

gianm commented Nov 12, 2019

sixtus commented Nov 12, 2019

gianm commented Nov 12, 2019

sixtus commented Nov 12, 2019 • edited

sixtus commented Nov 12, 2019

sixtus commented Nov 12, 2019

gianm commented Nov 12, 2019

sixtus commented Nov 12, 2019 • edited

sixtus commented Nov 12, 2019

sixtus commented Nov 12, 2019

sixtus commented Nov 18, 2019 • edited

jon-wei commented Dec 6, 2019

sixtus commented Dec 9, 2019 via email

sixtus commented Dec 9, 2019 via email

jon-wei commented Dec 10, 2019

jon-wei commented Dec 17, 2019

sixtus commented Nov 12, 2019 •

edited

sixtus commented Nov 12, 2019 •

edited

sixtus commented Nov 18, 2019 •

edited