Druid processes not loading native hadoop and snappy libraries (with CDH) #3025

ottomata · 2016-05-26T14:10:37Z

Over in https://groups.google.com/forum/#!searchin/druid-user/snappy/druid-user/WzaqgiM4SO4/7SR0-rU4CgAJ I described I problem I was having getting Druid to read snappy files out of HDFS. Note that I am using CDH 5.5.2/Hadoop 2.6.0 for both hadoop-dependencies and for druid-hdfs-storage. I've done this by setting -Dhadoop.mapreduce.job.user.classpath.first=true in druid.indexer.runner.javaOpts, creating hadoop-dependencies/hadoop-client/cdh and setting "hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:cdh"] in my index task .json. I've swapped out the 2.3.0 hard-included with the druid-hdfs-storage extension for my 2.6.0 dependencies.

As far as I can tell, both Hadoop and Snappy native libs are loaded properly when I set LD_LIBRARY_PATH. LD_LIBRARY_PATH is prepended to java.library.path.

I prepped some code to help me make sure I wasn’t doing something dumb:
https://gist.github.com/ottomata/6caf158d3b787a1c3439d936a1e28916#file-snappynativetest-java

I am able to load native hadoop and snappy using the same classpath and java.library.path that druid uses.

At the bottom of this issue are the relevant middleManager logs that leads up to this error. In summary I see:

middleManager starts, uses /usr/lib/hadoop/lib/native (zookeeper too?)
Peon indexing job starts, uses /usr/lib/hadoop/lib/native (zookeeper too?), but prints out ‘Unable to load native-hadoop library for your platform… using builtin-java classes where applicable'
YARN Hadoop indexing job is submitted and completes. I believe this writes a .snappy file somewhere into hdfs:///tmp/hadoop-indexing/…
middleManager (or Peon task?) attempts to read previously written snappy file. Errors out with ‘native snappy library not available: this version of libhadoop was built without snappy support’.

So ja, something is fishy with the Peon’s java.library.path. Even though the java.library.path is clearly set properly when the Peon starts up, it does not register the shared library files, as indicated by the ‘Unable to load native-hadoop library…’ message.

Ultimately I was able to get around this by forcing the indexing task to write Gzip files in HDFS by setting "jobProperties" : {"mapreduce.output.fileoutputformat.compress": "org.apache.hadoop.io.compress.GzipCodec"} in my index task .json.

Actual logs below. I’ve removed stuff that looked like noise. I see classpaths, extensions, and hadoop-dependencies all loading as expected.

...
2016-05-23T19:18:31,500 INFO io.druid.cli.CliMiddleManager: *                   java.library.path:/usr/lib/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
...
2016-05-23T19:18:32,700 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/usr/lib/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
...
2016-05-23T19:18:35,840 INFO org.eclipse.jetty.server.ServerConnector: Started ServerConnector@6685f71a{HTTP/1.1}{0.0.0.0:8091}
2016-05-23T19:18:35,844 INFO org.eclipse.jetty.server.Server: Started @25796ms
...
2016-05-23T19:19:40,744 INFO io.druid.cli.CliPeon: *                            java.library.path:/usr/lib/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
...
2016-05-23T19:19:42,894 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/usr/lib/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
...

2016-05-23T19:19:43,745 INFO io.druid.indexing.worker.executor.ExecutorLifecycle: Running with task: {
  "type" : "index_hadoop",
  "id" : "index_hadoop_pageviews_2016-05-23T19:19:22.575Z",
...
2016-05-23T19:19:48,880 INFO org.eclipse.jetty.server.ServerConnector: Started ServerConnector@1371e566{HTTP/1.1}{0.0.0.0:8100}
2016-05-23T19:19:48,881 INFO org.eclipse.jetty.server.Server: Started @25487ms
...
2016-05-23T19:20:20,066 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1463163743644_0030
...

2016-05-23T19:21:17,670 INFO io.druid.indexer.DetermineHashedPartitionsJob: Job completed, loading up partitions for intervals[Optional.of([2015-09-01T00:00:00.000Z/2015-09-02T00:00:00.000Z])].
2016-05-23T19:21:17,959 ERROR io.druid.indexing.overlord.ThreadPoolTaskRunner: Exception while running task[HadoopIndexTask{id=index_hadoop_pageviews_2016-05-23T19:19:22.575Z, type=index_hadoop, dataSource=pageviews}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
    at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
    at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:160) ~[druid-indexing-service-0.9.0.jar:0.9.0]
    at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.0.jar:0.9.0]
    at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:338) [druid-indexing-service-0.9.0.jar:0.9.0]
...
Caused by: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65) ~[?:?]
    at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:193) ~[?:?]
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178) ~[?:?]
    at org.apache.hadoop.io.compress.CompressionCodec$Util.createInputStreamWithCodecPool(CompressionCodec.java:157) ~[?:?]
    at org.apache.hadoop.io.compress.SnappyCodec.createInputStream(SnappyCodec.java:163) ~[?:?]
    at io.druid.indexer.Utils.openInputStream(Utils.java:101) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]
    at io.druid.indexer.Utils.openInputStream(Utils.java:77) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0]

...
2016-05-23T19:21:18,084 INFO io.druid.indexing.worker.executor.ExecutorLifecycle: Task completed with status: {
  "id" : "index_hadoop_pageviews_2016-05-23T19:19:22.575Z",
  "status" : "FAILED",
  "duration" : 93849
}

The text was updated successfully, but these errors were encountered:

drcrallen · 2016-05-26T16:11:10Z

Labeling this as a suspected bug. Native snappy (or any other) items should be available.

HongZhu · 2017-03-04T06:41:30Z

Any updates?
@ottomata - have you identified a workaround?

I believe that I have encountered the same issue while performing batch ingestion of Snappy compressed Avro files into Druid.

Here is the exception:
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1486340755723_533930_m_001362_0 - exited : org.xerial.snappy.SnappyNative.uncompressedLength(Ljava/lang/Object;II)I

For our druid installation (druid-0.9.2.1), druid-avro-extensions has snappy-java-1.0.5.jar installed (which doesn't have the native libraries for our hardware platform - x86_64). Native snappy support for Druid processes will be a good solution, just like the way snappy is integrated with Hadoop via installed native libraries.

HongZhu · 2017-03-11T21:04:21Z

FYI: recompiling Druid codebase by upgrading snappy-java dependency to at least 1.1.1.3 fixes the issue.

HongZhu · 2017-03-11T21:04:36Z

org.xerial.snappy snappy-java 1.1.1.3 jar compile

gurugv-zz · 2017-08-07T18:02:00Z

replaced http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.1.3/snappy-java-1.1.1.3.jar at
./extensions/druid-avro-extensions/
./hadoop-dependencies/hadoop-client/2.7.3.2.6.1.0-129/
./extensions/druid-hdfs-storage

worked for me.
What would be the recommended fix for this ?

github-actions · 2023-05-30T00:18:14Z

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

github-actions · 2023-06-27T00:21:01Z

This issue has been closed due to lack of activity. If you think that
is incorrect, or the issue requires additional review, you can revive the issue at
any time.

drcrallen added the Bug label May 26, 2016

github-actions bot added the stale label May 30, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Druid processes not loading native hadoop and snappy libraries (with CDH) #3025

Druid processes not loading native hadoop and snappy libraries (with CDH) #3025

ottomata commented May 26, 2016

drcrallen commented May 26, 2016

HongZhu commented Mar 4, 2017

HongZhu commented Mar 11, 2017

HongZhu commented Mar 11, 2017

gurugv-zz commented Aug 7, 2017

github-actions bot commented May 30, 2023

github-actions bot commented Jun 27, 2023

Druid processes not loading native hadoop and snappy libraries (with CDH) #3025

Druid processes not loading native hadoop and snappy libraries (with CDH) #3025

Comments

ottomata commented May 26, 2016

drcrallen commented May 26, 2016

HongZhu commented Mar 4, 2017

HongZhu commented Mar 11, 2017

HongZhu commented Mar 11, 2017

gurugv-zz commented Aug 7, 2017

github-actions bot commented May 30, 2023

github-actions bot commented Jun 27, 2023