Allow a single published artifact to work with multiple Hadoop versions #58

JoshRosen · 2015-08-27T21:46:02Z

This commit allows us to publish one spark-redshift artifact which is built against a fixed Hadoop version but which works with both Hadoop 1.x and 2.x. In the past, we published separate artifacts for Hadoop 1.x and 2.x in order to work around a binary incompatibility in TaskAttemptContext (see #19). This patch works around the incompatibility using reflection, similar to apache/spark#6599.

In order to make this testable, this patch also modifies our SBT build and Travis configuration so that the test Spark and Hadoop dependencies can be configured separately from the compile dependencies.

marmbrus · 2015-08-27T21:57:20Z

LGTM

codecov-io · 2015-08-27T22:18:39Z

Current coverage is `88.77%`

Merging #58 into master will increase coverage by +0.03% as of f3a4d84

@@            master     #58   diff @@
======================================
  Files           10      10       
  Stmts          391     392     +1
  Branches        93      93       
  Methods          0       0       
======================================
+ Hit            347     348     +1
  Partial          0       0       
  Missed          44      44

Review entire Coverage Diff as of f3a4d84

Powered by Codecov. Updated on successful CI builds.

JoshRosen · 2015-08-27T22:33:23Z

Merging this now.

This commit allows us to publish one `spark-avro` artifact which is built against a fixed Hadoop version but which works with both Hadoop 1.x and 2.x. In the past, we published separate artifacts for Hadoop 1.x and 2.x in order to work around a binary incompatibility in `TaskAttemptContext` (see #19). This patch works around the incompatibility using reflection, similar to apache/spark#6599. In order to make this testable, this patch also modifies our SBT build and Travis configuration so that the test Hadoop dependencies can be configured separately from the compile dependency. I made a similar fix in `spark-redshift`: databricks/spark-redshift#58 Author: Josh Rosen <joshrosen@databricks.com> Closes #79 from JoshRosen/multiple-hadoop-version-support.

JoshRosen added 2 commits August 27, 2015 14:41

Fix compile dependency versions and vary test deps.

461cc47

Bump to 1.5.0-RC2

e9aa8cb

JoshRosen added the enhancement label Aug 27, 2015

JoshRosen added this to the 0.5 milestone Aug 27, 2015

JoshRosen force-pushed the hadoop-version-fixes branch from 5e16d4c to e9aa8cb Compare August 27, 2015 21:46

JoshRosen added 2 commits August 27, 2015 15:01

Only test against Hadoop 2.2.0 with Spark 1.4.1.

0ec7f6a

Use reflection to enable multi-Hadoop-version compatibility.

d2618b5

JoshRosen changed the title ~~[WIP] Allow a single published artifact to work with multiple Hadoop versions~~ Allow a single published artifact to work with multiple Hadoop versions Aug 27, 2015

Set spIgnoreProvided to allow sbt assembly to succeed.

f2f1d4a

JoshRosen closed this in c6cdca2 Aug 27, 2015

JoshRosen deleted the hadoop-version-fixes branch August 27, 2015 22:35

JoshRosen mentioned this pull request Aug 28, 2015

Allow a single published artifact to work with multiple Hadoop versions databricks/spark-avro#79

Closed

JoshRosen mentioned this pull request Oct 9, 2015

Test against multiple Spark and Hadoop versions in Travis; fix Hadoop 1.x incompatibility harsha2010/magellan#25

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow a single published artifact to work with multiple Hadoop versions #58

Allow a single published artifact to work with multiple Hadoop versions #58

JoshRosen commented Aug 27, 2015

marmbrus commented Aug 27, 2015

codecov-io commented Aug 27, 2015

JoshRosen commented Aug 27, 2015

Allow a single published artifact to work with multiple Hadoop versions #58

Allow a single published artifact to work with multiple Hadoop versions #58

Conversation

JoshRosen commented Aug 27, 2015

marmbrus commented Aug 27, 2015

codecov-io commented Aug 27, 2015

Current coverage is 88.77%

JoshRosen commented Aug 27, 2015

Current coverage is `88.77%`