-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Druid processes not loading native hadoop and snappy libraries (with CDH) #3025
Comments
Labeling this as a suspected bug. Native snappy (or any other) items should be available. |
Any updates? I believe that I have encountered the same issue while performing batch ingestion of Snappy compressed Avro files into Druid. Here is the exception: For our druid installation (druid-0.9.2.1), druid-avro-extensions has snappy-java-1.0.5.jar installed (which doesn't have the native libraries for our hardware platform - x86_64). Native snappy support for Druid processes will be a good solution, just like the way snappy is integrated with Hadoop via installed native libraries. |
FYI: recompiling Druid codebase by upgrading snappy-java dependency to at least 1.1.1.3 fixes the issue. |
org.xerial.snappy snappy-java 1.1.1.3 jar compile |
replaced http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.1.3/snappy-java-1.1.1.3.jar at worked for me. |
This issue has been marked as stale due to 280 days of inactivity. |
This issue has been closed due to lack of activity. If you think that |
Over in https://groups.google.com/forum/#!searchin/druid-user/snappy/druid-user/WzaqgiM4SO4/7SR0-rU4CgAJ I described I problem I was having getting Druid to read snappy files out of HDFS. Note that I am using CDH 5.5.2/Hadoop 2.6.0 for both hadoop-dependencies and for druid-hdfs-storage. I've done this by setting
-Dhadoop.mapreduce.job.user.classpath.first=true
indruid.indexer.runner.javaOpts
, creatinghadoop-dependencies/hadoop-client/cdh
and setting"hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:cdh"]
in my index task .json. I've swapped out the 2.3.0 hard-included with the druid-hdfs-storage extension for my 2.6.0 dependencies.As far as I can tell, both Hadoop and Snappy native libs are loaded properly when I set LD_LIBRARY_PATH. LD_LIBRARY_PATH is prepended to java.library.path.
I prepped some code to help me make sure I wasn’t doing something dumb:
https://gist.github.com/ottomata/6caf158d3b787a1c3439d936a1e28916#file-snappynativetest-java
I am able to load native hadoop and snappy using the same classpath and java.library.path that druid uses.
At the bottom of this issue are the relevant middleManager logs that leads up to this error. In summary I see:
So ja, something is fishy with the Peon’s java.library.path. Even though the java.library.path is clearly set properly when the Peon starts up, it does not register the shared library files, as indicated by the ‘Unable to load native-hadoop library…’ message.
Ultimately I was able to get around this by forcing the indexing task to write Gzip files in HDFS by setting
"jobProperties" : {"mapreduce.output.fileoutputformat.compress": "org.apache.hadoop.io.compress.GzipCodec"}
in my index task .json.Actual logs below. I’ve removed stuff that looked like noise. I see classpaths, extensions, and hadoop-dependencies all loading as expected.
The text was updated successfully, but these errors were encountered: