Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a single published artifact to work with multiple Hadoop versions #58

Closed
wants to merge 5 commits into from

Conversation

JoshRosen
Copy link
Contributor

This commit allows us to publish one spark-redshift artifact which is built against a fixed Hadoop version but which works with both Hadoop 1.x and 2.x. In the past, we published separate artifacts for Hadoop 1.x and 2.x in order to work around a binary incompatibility in TaskAttemptContext (see #19). This patch works around the incompatibility using reflection, similar to apache/spark#6599.

In order to make this testable, this patch also modifies our SBT build and Travis configuration so that the test Spark and Hadoop dependencies can be configured separately from the compile dependencies.

@JoshRosen JoshRosen added this to the 0.5 milestone Aug 27, 2015
@marmbrus
Copy link
Contributor

LGTM

@JoshRosen JoshRosen changed the title [WIP] Allow a single published artifact to work with multiple Hadoop versions Allow a single published artifact to work with multiple Hadoop versions Aug 27, 2015
@codecov-io
Copy link

Current coverage is 88.77%

Merging #58 into master will increase coverage by +0.03% as of f3a4d84

@@            master     #58   diff @@
======================================
  Files           10      10       
  Stmts          391     392     +1
  Branches        93      93       
  Methods          0       0       
======================================
+ Hit            347     348     +1
  Partial          0       0       
  Missed          44      44       

Review entire Coverage Diff as of f3a4d84

Powered by Codecov. Updated on successful CI builds.

@JoshRosen
Copy link
Contributor Author

Merging this now.

@JoshRosen JoshRosen closed this in c6cdca2 Aug 27, 2015
@JoshRosen JoshRosen deleted the hadoop-version-fixes branch August 27, 2015 22:35
JoshRosen added a commit to databricks/spark-avro that referenced this pull request Aug 29, 2015
This commit allows us to publish one `spark-avro` artifact which is built against a fixed Hadoop version but which works with both Hadoop 1.x and 2.x. In the past, we published separate artifacts for Hadoop 1.x and 2.x in order to work around a binary incompatibility in `TaskAttemptContext` (see #19). This patch works around the incompatibility using reflection, similar to apache/spark#6599.

In order to make this testable, this patch also modifies our SBT build and Travis configuration so that the test Hadoop dependencies can be configured separately from the compile dependency.

I made a similar fix in `spark-redshift`: databricks/spark-redshift#58

Author: Josh Rosen <joshrosen@databricks.com>

Closes #79 from JoshRosen/multiple-hadoop-version-support.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants