New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADAM version 0.19.0 will not run on Spark version 2.0.0 #1093

Closed
heuermh opened this Issue Jul 29, 2016 · 11 comments

Comments

Projects
None yet
3 participants
@heuermh
Member

heuermh commented Jul 29, 2016

We knew this would happen; with Spark version 2.0.0 installed from Homebrew and ADAM version 0.19.0 installed from Homebrew Science

$ spark-submit --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0
      /_/

Branch 
Compiled by user jenkins on 2016-07-19T21:16:09Z
Revision 
Url 
Type --help for more information.

$ adam-submit --version
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit
java.lang.NoClassDefFoundError: org/apache/spark/Logging
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:225)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 19 more

ADAM git HEAD version 0.19.1-SNAPSHOT runs ok

$ ./bin/adam-submit --version
Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain
Using SPARK_SUBMIT=/usr/local/bin/spark-submit

       e         888~-_          e             e    e
      d8b        888   \        d8b           d8b  d8b
     /Y88b       888    |      /Y88b         d888bdY88b
    /  Y88b      888    |     /  Y88b       / Y88Y Y888b
   /____Y88b     888   /     /____Y88b     /   YY   Y888b
  /      Y88b    888_-~     /      Y88b   /          Y888b

ADAM version: 0.19.1-SNAPSHOT
Commit: c23aace7b1739e29935cd3f46781ae0befc0a756 Build: 2016-07-29
Built for: Scala 2.10 and Hadoop 2.6.0

Note the Spark binary tarball used by Homebrew was built against Scala 2.11 and Hadoop 2.7
https://www.apache.org/dyn/closer.lua?path=spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.7.tgz

whereas we still build by default for Hadoop 2.6.0 and use the Scala 2.10 binary tarball for the Homebrew Science formula.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 13, 2016

Member

Due to this we're in a catch-22 for Homebrew, see Homebrew/homebrew-science#3849.

@fnothaft @jpdna Should we cherry pick the logging fix to the 0.19.0 release branches and cut a 0.19.1?

Member

heuermh commented Aug 13, 2016

Due to this we're in a catch-22 for Homebrew, see Homebrew/homebrew-science#3849.

@fnothaft @jpdna Should we cherry pick the logging fix to the 0.19.0 release branches and cut a 0.19.1?

@tomwhite

This comment has been minimized.

Show comment
Hide comment
@tomwhite

tomwhite Aug 23, 2016

Member

@heuermh that would be very useful, as we're seeing more people who want to use ADAM with Spark 2.0. (broadinstitute/gatk#2073 is one example; there are others.)

Member

tomwhite commented Aug 23, 2016

@heuermh that would be very useful, as we're seeing more people who want to use ADAM with Spark 2.0. (broadinstitute/gatk#2073 is one example; there are others.)

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 23, 2016

Member

Hi @tomwhite! OOC, the GATK ticket you crossreferenced, that looks like a Scala 2.11 issue, not a Spark 2.0 issue. Did I misread?

Not that I disagree with getting Spark 2.0 support up and going! However, we run CI on both Scala 2.10 and 2.11, so if our CI is missing a Scala 2.11 bug, we must have a test gap somewhere, hence my confusion!

Member

fnothaft commented Aug 23, 2016

Hi @tomwhite! OOC, the GATK ticket you crossreferenced, that looks like a Scala 2.11 issue, not a Spark 2.0 issue. Did I misread?

Not that I disagree with getting Spark 2.0 support up and going! However, we run CI on both Scala 2.10 and 2.11, so if our CI is missing a Scala 2.11 bug, we must have a test gap somewhere, hence my confusion!

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 23, 2016

Member

@fnothaft @jpdna Should we cherry pick the logging fix to the 0.19.0 release branches and cut a 0.19.1?

I am OK with that. However, we have one more logging fix to drop in! I will open a PR momentarily.

Member

fnothaft commented Aug 23, 2016

@fnothaft @jpdna Should we cherry pick the logging fix to the 0.19.0 release branches and cut a 0.19.1?

I am OK with that. However, we have one more logging fix to drop in! I will open a PR momentarily.

@tomwhite

This comment has been minimized.

Show comment
Hide comment
@tomwhite

tomwhite Aug 24, 2016

Member

Thanks @fnothaft. However it looks like it's a Spark 2.0 issue still (see my comment on the GATK ticket). So a 0.19.1 release would be very welcome!

Member

tomwhite commented Aug 24, 2016

Thanks @fnothaft. However it looks like it's a Spark 2.0 issue still (see my comment on the GATK ticket). So a 0.19.1 release would be very welcome!

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 24, 2016

Member

@tomwhite thanks for pinging back! After you pinged the ticket yesterday, I started looking into this with ADAM ToT and we have a few areas where we run into issues with Spark 2. However, they're mostly straightforward and I've been hacking through those issues and have them pretty close to resolved. I'll open a PR to support Spark 2 on ADAM today. They need an upstream change (bigdatagenomics/utils#77) but that is fairly straightforward.

From a distribution perspective, there's no way to distribute an ADAM binary that works for both Spark 1.x and 2.x because of underlying changes to the Spark APIs (sneaky ones too, I spent an hour yesterday getting hard to trace IncompatibleClassChangeExceptions), so my plan is to have the main org.bdgenomics.adam:adam-{core,apis,cli}_2.1[0,1] artifacts stay on Spark 1.x, and to then also push org.bdgenomics.adam:adam-{core,apis,cli}-spark2_2.1[0,1] artifacts up to Maven. I assume this would work fine on your end?

Otherwise, I think we're planning to move to Spark 2 and Scala 2.11 as our default development versions in the new year, after we tag a 1.0.0 release.

Member

fnothaft commented Aug 24, 2016

@tomwhite thanks for pinging back! After you pinged the ticket yesterday, I started looking into this with ADAM ToT and we have a few areas where we run into issues with Spark 2. However, they're mostly straightforward and I've been hacking through those issues and have them pretty close to resolved. I'll open a PR to support Spark 2 on ADAM today. They need an upstream change (bigdatagenomics/utils#77) but that is fairly straightforward.

From a distribution perspective, there's no way to distribute an ADAM binary that works for both Spark 1.x and 2.x because of underlying changes to the Spark APIs (sneaky ones too, I spent an hour yesterday getting hard to trace IncompatibleClassChangeExceptions), so my plan is to have the main org.bdgenomics.adam:adam-{core,apis,cli}_2.1[0,1] artifacts stay on Spark 1.x, and to then also push org.bdgenomics.adam:adam-{core,apis,cli}-spark2_2.1[0,1] artifacts up to Maven. I assume this would work fine on your end?

Otherwise, I think we're planning to move to Spark 2 and Scala 2.11 as our default development versions in the new year, after we tag a 1.0.0 release.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 24, 2016

Member

That proposal looks reasonable to me, though we should check with other projects downstream of Spark to see what they plan to do with artifact naming. Is there any advice from the Spark team?

Member

heuermh commented Aug 24, 2016

That proposal looks reasonable to me, though we should check with other projects downstream of Spark to see what they plan to do with artifact naming. Is there any advice from the Spark team?

fnothaft added a commit to fnothaft/adam that referenced this issue Aug 24, 2016

[ADAM-1093] Move to support Spark 2.0.0.
Relies on bigdatagenomics/utils#78. Resolves #1093:

* Clean up logging in VariantContextRDD
* Add move_to_spark_2.sh script and CI hooks
@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 24, 2016

Member

Posted question on artifact name/qualifiers to #apache-spark on IRC and dev@spark.apache.org.

Member

heuermh commented Aug 24, 2016

Posted question on artifact name/qualifiers to #apache-spark on IRC and dev@spark.apache.org.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 25, 2016

Member

@heuermh did we get any resolution RE: artifact name?

Member

fnothaft commented Aug 25, 2016

@heuermh did we get any resolution RE: artifact name?

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Aug 25, 2016

Member

No, other than advice to try <qualifier> instead of embedding spark2 in the artifact name.

Since <qualifier> is also used by the Maven release plugin for the javadoc and sources artifacts, I'm hesitant to mess with it.

Member

heuermh commented Aug 25, 2016

No, other than advice to try <qualifier> instead of embedding spark2 in the artifact name.

Since <qualifier> is also used by the Maven release plugin for the javadoc and sources artifacts, I'm hesitant to mess with it.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Aug 25, 2016

Member

If all is OK with you then, perhaps let's proceed with the Spark 2 changes upstream? Can you merge bigdatagenomics/utils#78?

Member

fnothaft commented Aug 25, 2016

If all is OK with you then, perhaps let's proceed with the Spark 2 changes upstream? Can you merge bigdatagenomics/utils#78?

@heuermh heuermh modified the milestone: 0.20.0 Sep 7, 2016

@heuermh heuermh referenced this issue Sep 7, 2016

Closed

Release ADAM version 0.20.0 #1048

47 of 61 tasks complete

fnothaft added a commit to fnothaft/adam that referenced this issue Sep 7, 2016

[ADAM-1093] Move to support Spark 2.0.0.
Relies on bigdatagenomics/utils#78. Resolves #1093:

* Clean up logging in VariantContextRDD
* Add move_to_spark_2.sh script and CI hooks

fnothaft added a commit to fnothaft/adam that referenced this issue Sep 8, 2016

[ADAM-1093] Move to support Spark 2.0.0.
Relies on bigdatagenomics/utils#78. Resolves #1093:

* Clean up logging in VariantContextRDD
* Add move_to_spark_2.sh script and CI hooks

@heuermh heuermh closed this in #1123 Sep 8, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment