Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12807] [YARN] Spark External Shuffle not working in Hadoop clusters with Jackson 2.2.3 #10780

Conversation

steveloughran
Copy link
Contributor

Patch to

  1. Shade jackson 2.x in spark-yarn-shuffle JAR: core, databind, annotation
  2. Use maven antrun to verify the JAR has the renamed classes

Being Maven-based, I don't know if the verification phase kicks in on an SBT/jenkins build. It will on a mvn install

@steveloughran
Copy link
Contributor Author

Updated patch which has shading of com.fasterxml.jackson -> org.spark-project.com.fasterxml.jackson

There's a test for this working; it uses the ant plugin to probe the JAR for .class files. It works, and I've verified that it stops working if you don't have the classes in. This validation runs in the verify phase, immediately before the install action: you can't install this package without the shading working. Accordingly: its a regression test on the feature.

This test does not (obviously) verify that this fixes the conflict between YARN NM and Shuffle. This is going to need external tests

@steveloughran steveloughran changed the title [SPARK-12807] [YARN] [WIP] Spark External Shuffle not working in Hadoop clusters with Jackson 2.2.3 [SPARK-12807] [YARN] Spark External Shuffle not working in Hadoop clusters with Jackson 2.2.3 Jan 16, 2016
@SparkQA
Copy link

SparkQA commented Jan 16, 2016

Test build #49508 has finished for PR 10780 at commit 94d68a5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2016

Test build #49510 has finished for PR 10780 at commit c365551.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Jan 16, 2016

Test build #49522 has finished for PR 10780 at commit c365551.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran
Copy link
Contributor Author

The latest patch

  1. relocates everything except leveldb. Leveldb uses JNI stuff, which doesn't relocate, and on JVMs with multiple classloaders, can still only be loaded once. Jackson, netty and javax.annotation all go under org.spark-project.
  2. Added a leveldb-provided profile which excludes leveldb from both the shuffle and hadoop-hdfs dependencies. (The latter could be excluded in the root spark POM; it's server-side only).

When built with leveldb-provided, the relocation stops version conflict in simple java classes; leveldb will be needed to be added to the NM classpath —Hadoop 2.6+(?) certainly does that.

Maybe instead of leveldb-provided, the profile could just become hadoop-2.6, which is already in the top-level

@SparkQA
Copy link

SparkQA commented Jan 19, 2016

Test build #49698 has finished for PR 10780 at commit 1813955.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2016

Test build #49736 has finished for PR 10780 at commit 98f18aa.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran steveloughran force-pushed the stevel/patches/SPARK-12807-master-shuffle branch from 98f18aa to 0ae0955 Compare February 3, 2016 19:56
@SparkQA
Copy link

SparkQA commented Feb 3, 2016

Test build #50675 has finished for PR 10780 at commit 0ae0955.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Feb 4, 2016

@steveloughran given the discussion on the bug, can you remove shading for anything but Jackson?

In particular, shading the javax annotations is wrong because that would probably break tools that rely on them to do checks.

@steveloughran
Copy link
Contributor Author

Can do

@steveloughran
Copy link
Contributor Author

I've just pushed out a new version which leaves only jackson and guava as the relocations: the ones that are most trouble. I could turn off the guava relocate, though that introduces a risk of the shuffle failing if the hadoop bundled guava 11.x gets picked up first. The hadoop codebase is designed to work with later guavas, so can cope with the spark one being picked up.

leveldb can be omitted with a Pleveldb-provided profile; this is disabled by default (i.e. leveldb is bundled)

@steveloughran steveloughran force-pushed the stevel/patches/SPARK-12807-master-shuffle branch from 0ae0955 to 43d2b34 Compare February 5, 2016 12:47
@SparkQA
Copy link

SparkQA commented Feb 5, 2016

Test build #50815 has finished for PR 10780 at commit 43d2b34.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

<relocations>
<relocation>
<pattern>com.fasterxml.jackson</pattern>
<shadedPattern>org.spark-project.com.fasterxml.jackson</shadedPattern>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the shade property you created here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one is {{org/spark-project}}, the other {{org.spark-project}}; you'd have to do s/r renaming to automatically derive one from the other. With hindsight, it's a common enough use case we should have added it to Ant 15 years ago, but its too late now.

@steveloughran steveloughran force-pushed the stevel/patches/SPARK-12807-master-shuffle branch from 43d2b34 to 947a798 Compare February 8, 2016 12:08
@SparkQA
Copy link

SparkQA commented Feb 8, 2016

Test build #50918 has finished for PR 10780 at commit 947a798.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

<relocation>
<pattern>com.fasterxml.jackson</pattern>
<shadedPattern>org.spark-project.com.fasterxml.jackson</shadedPattern>
<includes>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: I wonder if this is needed; we have it for Jetty but not for Guava, and both end up being relocated the same way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maven shading is a dark mystery to me. I don't understand what it does; I don't trust it. What you see there is cargo-cult cut and paste backed by a bit of ant to verify the final outcome of the C&P matches what I want.

@vanzin
Copy link
Contributor

vanzin commented Feb 8, 2016

LGTM.

@vanzin
Copy link
Contributor

vanzin commented Feb 9, 2016

Merging this.

asfgit pushed a commit that referenced this pull request Feb 9, 2016
…ters with Jackson 2.2.3

Patch to

1. Shade jackson 2.x in spark-yarn-shuffle JAR: core, databind, annotation
2. Use maven antrun to verify the JAR has the renamed classes

Being Maven-based, I don't know if the verification phase kicks in on an SBT/jenkins build. It will on a `mvn install`

Author: Steve Loughran <stevel@hortonworks.com>

Closes #10780 from steveloughran/stevel/patches/SPARK-12807-master-shuffle.

(cherry picked from commit 34d0b70)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
@asfgit asfgit closed this in 34d0b70 Feb 9, 2016
@steveloughran
Copy link
Contributor Author

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants