Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to guice-4.1.0. #3222

Merged
merged 1 commit into from Jul 18, 2016
Merged

Update to guice-4.1.0. #3222

merged 1 commit into from Jul 18, 2016

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Jul 6, 2016

Fixes google/guice#757 among other issues. I tried out the quickstart (batch & streaming with tranquility) and it still worked after this change.

Possibly related: https://groups.google.com/d/msg/druid-user/i1oPmt9ltR8/Z1xAIk3uBgAJ

@gianm gianm added this to the 0.9.2 milestone Jul 6, 2016
@drcrallen
Copy link
Contributor

What versions did you test with? I tried updating to guice 4.0 previously, but #1628

@drcrallen
Copy link
Contributor

drcrallen commented Jul 6, 2016

and related #1608 (comment)

@gianm
Copy link
Contributor Author

gianm commented Jul 6, 2016

Just local mode, I didn't try a remote cluster. I'll see if we can give that a shot too.

@jon-wei
Copy link
Contributor

jon-wei commented Jul 6, 2016

I tested this patch (on top of druid master) against a few hadoop remote clusters, using a batch ingestion task with S3 as both input and deep storage.

Running without hadoop.mapreduce.job.user.classpath.first=true configured on the middleManager, the ingestion task failed on the following hadoop versions:

sequenceIQ docker images: 
2.3.0 through 2.5.2
Error: com.google.inject.util.Types.collectionOf(Ljava/lang/reflect/Type;)Ljava/lang/reflect/ParameterizedType;

2.6.0, 2.7.1:
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

Amazon EMR:
task succeeded

cloudera 5.7.0.0 docker (2.6.0):
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

hortonworks 2.4 sandbox (2.7.1):
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

If i configure hadoop.mapreduce.job.user.classpath.first=true, then the ingestion task succeeds on all of those versions.

@drcrallen
Copy link
Contributor

hadoop.mapreduce.job.user.classpath.first doesn't strike me as something we should be advocating by default.

@gianm
Copy link
Contributor Author

gianm commented Jul 6, 2016

Added https://groups.google.com/d/msg/druid-user/i1oPmt9ltR8/Z1xAIk3uBgAJ to the PR description as possibly related.

@gianm
Copy link
Contributor Author

gianm commented Jul 6, 2016

@drcrallen I believe not all of those hadoop versions work even without the guice upgrade. What's your reasoning behind not wanting to recommend hadoop.mapreduce.job.user.classpath.first?

@jon-wei how many of those hadoop versions work with stock Druid but without hadoop.mapreduce.job.user.classpath.first=true?

@jon-wei
Copy link
Contributor

jon-wei commented Jul 6, 2016

@gianm

Running druid 0.9.1.1 without user.classpath.first=true:

SequenceIQ hadoop docker:

  • 2.3.0, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.5.2 work
  • 2.6.0 and 2.7.1 failed with error:
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

EMR works.

Cloudera 5.7.0.0 (2.6.0) and Hortonworks Sandbox 2.4 (2.7.1) fail with

Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

@drcrallen
Copy link
Contributor

I started digging to try and find what's causing the conflict, but hadoop uses org.codehaus.jackson not com.fasterxml.jackson

This error should only be caused by something pulling in com.fasterxml.jackson.core:jackson-databind older than 2.4.0.

I went through and excluded all the things which were even referencing jackson.core: #3225 @jon-wei can you try that branch and see if it fixes things?

Otherwise we need to figure out what's pulling in an old version of jackson-databind

@jon-wei
Copy link
Contributor

jon-wei commented Jul 7, 2016

@drcrallen sure, i can give that a try

@gianm
Copy link
Contributor Author

gianm commented Jul 7, 2016

I think it's hadoop-aws pulling in jackson, most of the distros include that.

@gianm
Copy link
Contributor Author

gianm commented Jul 7, 2016

i.e. the actual hadoop machines have com.fasterxml.jackson stuff on their classpath. I think their databind replaces the one Druid wants, but then Druid adds in the jackson guava module, which doesn't work with the databind provided by the hadoop machine. (and that's also why setting user classpath first helps)

That's my story and I'm sticking to it :)

@drcrallen
Copy link
Contributor

@gianm that sounds like a reasonable theory. In such a case the proper solution is for hadoop to not pollute the application classloader with extension jars. But, if mapreduce.job.user.classpath.pollute=false doesn't work then mapreduce.job.user.classpath.first=true is about the best alternative.

@drcrallen
Copy link
Contributor

If hadoop-aws indeed turns out to be the culprit, then we should file an issue against https://issues.apache.org/jira/browse/HADOOP re: conflicting jackson classes caused by hadoop-aws. Related to https://issues.apache.org/jira/browse/HADOOP-12705 and https://issues.apache.org/jira/browse/HADOOP-11074

@jon-wei
Copy link
Contributor

jon-wei commented Jul 7, 2016

@drcrallen The patch you provided runs into the same errors described in my last comment #3222 (comment)

@jon-wei
Copy link
Contributor

jon-wei commented Jul 7, 2016

@gianm @drcrallen Not sure if jackson-databind is only used by hadoop-aws, but I do see it on the hadoop classpath on the 2.6.0+ versions that failed with stock druid 0.9.1.1

e.g.

HDP 2.4 sandbox:

# find / -name jackson-databind*
/var/lib/atlas/server/webapp/atlas/WEB-INF/lib/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/oozie-server/webapps/oozie/WEB-INF/lib/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/lib/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/libtools/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/share/lib/spark/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/share/lib/hive/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/share/lib/oozie/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/share/lib/pig/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/share/lib/distcp/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/share/lib/hive2/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/share/lib/mapreduce-streaming/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/share/lib/sqoop/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/share/lib/hcatalog/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/oozie/libserver/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/atlas/bridge/hive/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/hadoop/lib/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/falcon/client/lib/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/falcon/webapp/falcon/WEB-INF/lib/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/hadoop-yarn/lib/jackson-databind-2.2.3.jar
/usr/hdp/2.4.0.0-169/slider/lib/jackson-databind-2.2.3.jar
/hadoop/yarn/local/filecache/143/mapreduce.tar.gz/hadoop/share/hadoop/yarn/lib/jackson-databind-2.2.3.jar
/hadoop/yarn/local/filecache/143/mapreduce.tar.gz/hadoop/share/hadoop/common/lib/jackson-databind-2.2.3.jar


# hadoop classpath
/usr/hdp/2.4.0.0-169/hadoop/conf:/usr/hdp/2.4.0.0-169/hadoop/lib/*:/usr/hdp/2.4.0.0-169/hadoop/.//*:/usr/hdp/2.4.0.0-169/hadoop-hdfs/./:/usr/hdp/2.4.0.0-169/hadoop-hdfs/lib/*:/usr/hdp/2.4.0.0-169/hadoop-hdfs/.//*:/usr/hdp/2.4.0.0-169/hadoop-yarn/lib/*:/usr/hdp/2.4.0.0-169/hadoop-yarn/.//*:/usr/hdp/2.4.0.0-169/hadoop-mapreduce/lib/*:/usr/hdp/2.4.0.0-169/hadoop-mapreduce/.//*::mysql-connector-java-5.1.17.jar:mysql-connector-java-5.1.31-bin.jar:mysql-connector-java.jar:/usr/hdp/2.4.0.0-169/tez/*:/usr/hdp/2.4.0.0-169/tez/lib/*:/usr/hdp/2.4.0.0-169/tez/conf

cloudera

# hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*

# find / -name jackson-databind*
/var/lib/sqoop2/tomcat-deployment/webapps/sqoop/WEB-INF/lib/jackson-databind-2.3.1.jar
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1467855701658_0001/container_1467855701658_0001_01_000002/jackson-databind-2.4.6.jar
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/root/appcache/application_1467855701658_0001/container_1467855701658_0001_01_000001/jackson-databind-2.4.6.jar
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/filecache/98/jackson-databind-2.4.6.jar
/usr/share/cmf/common_jars/jackson-databind-2.1.0.jar
/usr/share/cmf/cloudera-navigator-audit-server/jackson-databind-2.1.0.jar
/usr/share/cmf/lib/jackson-databind-2.1.0.jar
/usr/share/cmf/cloudera-navigator-server/libs/cdh5/jackson-databind-2.1.0.jar
/usr/share/cmf/cloudera-navigator-server/jars/jackson-databind-2.1.0.jar
/usr/lib/search/lib/jackson-databind-2.3.1.jar
/usr/lib/search/lib/search-crunch/jackson-databind-2.3.1.jar
/usr/lib/sqoop2/webapps/sqoop/WEB-INF/lib/jackson-databind-2.3.1.jar
/usr/lib/hadoop-mapreduce/jackson-databind-2.2.3.jar
/usr/lib/whirr/lib/jackson-databind-2.1.0.jar
/usr/lib/parquet/lib/jackson-databind-2.2.3.jar
/usr/lib/hive/lib/jackson-databind-2.2.2.jar
/usr/lib/hadoop/client/jackson-databind-2.2.3.jar
/usr/lib/hadoop/client/jackson-databind.jar
/usr/lib/hbase-solr/lib/jackson-databind-2.3.1.jar
/usr/lib/impala/lib/jackson-databind-2.2.3.jar
/usr/lib/sentry/lib/jackson-databind-2.2.2.jar
/usr/lib/hbase/lib/jackson-databind-2.2.3.jar
/usr/lib/oozie/libserver/jackson-databind-2.2.2.jar
/usr/lib/oozie/oozie-sharelib-yarn/lib/spark/jackson-databind-2.2.3.jar
/usr/lib/oozie/oozie-sharelib-yarn/lib/hive/jackson-databind-2.2.2.jar
/usr/lib/oozie/oozie-sharelib-yarn/lib/hcatalog/jackson-databind-2.2.2.jar
/usr/lib/oozie/oozie-sharelib-yarn/lib/sqoop/jackson-databind-2.3.1.jar
/usr/lib/oozie/libtools/jackson-databind-2.2.2.jar
/usr/lib/oozie/oozie-sharelib-mr1/lib/spark/jackson-databind-2.2.3.jar
/usr/lib/oozie/oozie-sharelib-mr1/lib/hive/jackson-databind-2.2.2.jar
/usr/lib/oozie/oozie-sharelib-mr1/lib/hcatalog/jackson-databind-2.2.2.jar
/usr/lib/oozie/oozie-sharelib-mr1/lib/sqoop/jackson-databind-2.3.1.jar
/usr/lib/hadoop-0.20-mapreduce/lib/jackson-databind-2.2.3.jar
/usr/lib/sqoop/lib/jackson-databind-2.3.1.jar
/usr/lib/kite/lib/jackson-databind-2.3.1.jar
/usr/lib/flume-ng/lib/jackson-databind-2.3.1.jar
/usr/jars/jackson-databind-2.1.0.jar
/usr/jars/jackson-databind-2.2.3.jar
/usr/jars/jackson-databind-2.3.1.jar
/usr/jars/jackson-databind-2.2.2.jar

For the later sequenceIQ generic hadoop images (2.6.0+), jackson-databind is stored at:

bash-4.1# find / -name jackson-databind*
/usr/local/hadoop-2.7.1/share/hadoop/tools/lib/jackson-databind-2.2.3.jar

and I had to add that directory to the classpath on the hadoop side to use S3 successfully.

@gianm
Copy link
Contributor Author

gianm commented Jul 13, 2016

On https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml I see a property mapreduce.job.classloader, defaults to false, that might be a nicer way of dealing with this. I suggest the following:

  • no matter what, we should update guice
  • we should document mapreduce.job.classloader = true as a good default setting for druid jobs, assuming it actually solves the problems
  • if mapreduce.job.classloader = true doesn't solve the problems, we should document mapreduce.job.user.classpath.first = true as a workaround that people can use if they run into problems with guice/jackson/guava whatever

And after doing that we should also look into updating Jackson :) [in a separate PR]

@drcrallen
Copy link
Contributor

@gianm Looks like that setting has been around since at least 2.4.1.

@jon-wei is it reasonable to test if mapreduce.job.classloader=true fixes things at least as well as mapreduce.job.user.classpath.first=true?

@drcrallen
Copy link
Contributor

If so then I think @gianm 's comments in #3222 (comment) are the best approach we have right now.

@jon-wei
Copy link
Contributor

jon-wei commented Jul 13, 2016

@drcrallen @gianm

Ran this again with mapreduce.job.classloader=true and without the mapreduce.job.user.classpath.first setting.

SequenceIQ hadoop 2.30: fails with:

Error: java.lang.RuntimeException: readObject can't find class
  at org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:136)
  at org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readFields(TaggedInputSplit.java:122)
  at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
  at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
  at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:371)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:403)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.ClassNotFoundException: Class io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper not found
  at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1788)
  at org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit.readClass(TaggedInputSplit.java:134)
  ... 11 more

SequenceIQ 2.4.0 to 2.5.2, fails with:

Error: java.lang.ClassNotFoundException: javax.validation.Validator
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  at org.apache.hadoop.yarn.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:158)
  at org.apache.hadoop.yarn.util.ApplicationClassLoader.loadClass(ApplicationClassLoader.java:126)
  at io.druid.guice.ConfigModule.configure(ConfigModule.java:39)
  at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
  at com.google.inject.spi.Elements.getElements(Elements.java:110)
  at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138)
  at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
  at com.google.inject.Guice.createInjector(Guice.java:99)
  at com.google.inject.Guice.createInjector(Guice.java:73)
  at io.druid.guice.GuiceInjectors.makeStartupInjector(GuiceInjectors.java:59)
  at io.druid.indexer.HadoopDruidIndexerConfig.<clinit>(HadoopDruidIndexerConfig.java:99)
  at io.druid.indexer.HadoopDruidIndexerMapper.setup(HadoopDruidIndexerMapper.java:48)
  at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.setup(DetermineHashedPartitionsJob.java:223)
  at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:281)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

SequenceIQ 2.6.0, 2.7.1 succeeds

Cloudera Quickstart succeeds

Hortonworks Sandbox 2.4 succeeds

EMR 4.7.1 succeeds


Looks like mapreduce.job.classloader=true works for Hadoop versions 2.6.0+.

Running now with both mapreduce.job.classloader=true and mapreduce.job.user.classpath.first=true...

@drcrallen
Copy link
Contributor

@jon-wei those are odd, makes me think SequenceIQ did something weird to Hadoop classloading. Especially since Hortonworks 2.4 works

@jon-wei
Copy link
Contributor

jon-wei commented Jul 13, 2016

@drcrallen HortonWorks 2.4 is based on hadoop 2.7.1, I believe

@drcrallen
Copy link
Contributor

ah, I see

@gianm
Copy link
Contributor Author

gianm commented Jul 13, 2016

Based on those I'm pretty happy with recommending mapreduce.job.classpath=true by default, and suggesting that on older hadoops you might need mapreduce.job.user.classpath.first.

If that sounds good then I will update the docs on this PR.

@jon-wei
Copy link
Contributor

jon-wei commented Jul 13, 2016

@gianm I just got some results from running with both mapreduce.job.classloader=true and mapreduce.job.user.classpath.first=true.

Having mapreduce.job.classloader=true still causes failures (same kind as before) on the < 2.6.0 versions I tried, so the user will probably have to set mapreduce.job.classloader=false and mapreduce.job.user.classpath.first=true for those older versions.

@gianm
Copy link
Contributor Author

gianm commented Jul 17, 2016

@jon-wei @drcrallen I updated the hadoop docs in #3252 as discussed (see the section "Tip 2: Classloader modification on Hadoop").

@nishantmonu51
Copy link
Member

👍

@drcrallen drcrallen merged commit 13d8d96 into apache:master Jul 18, 2016
@gianm gianm deleted the upgrade-guice branch July 18, 2016 20:18
gianm added a commit to implydata/druid-public that referenced this pull request Jul 22, 2016
@gianm gianm mentioned this pull request Sep 23, 2016
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants