-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10330] Use SparkHadoopUtil TaskAttemptContext reflection methods in more places #8499
[SPARK-10330] Use SparkHadoopUtil TaskAttemptContext reflection methods in more places #8499
Conversation
I discovered this issue while trying to extend our I grepped to try to find all of the places that were using the non-reflective calls, but it's possible that I might have missed a callsite or two. I'm opening this PR now for testing + initial feedback. |
Test build #41738 has finished for PR 8499 at commit
|
/cc @pwendell also. |
My understanding is that this would not work at runtime with Hadoop 1.x, yes, without reflection. At least, that was what prompted the original change. |
Yes, what Sean says. We may want to qualify that the "without hadoop" tarball is only compatible with Hadoop 2.x (and even if we fix the binary compat issues, some parts, such as the YARN backend, will only work on Hadoop 2.x anyway). |
I'd like to pull this into branch-1.5 since this change will simplify certain compatibility checks for the |
The code looks fine, but do you want to pull this into 1.5.0? Wouldn't that mean a "-1" vote and a new rc? |
(Just to clarify: pushing to branch-1.5 now means the change will make it to 1.5.0 if there's a new rc. If it's not really meant to go into 1.5.0, we should wait until the release vote passes before pushing to the branch.) |
|
I'll add a Scalastyle rule to catch these patterns and will fix these cases. |
Regarding when to push to 1.5.x, my $0.02: in theory if a change is suitable for 1.5.1 it's suitable for 1.5.0, or if it's too risky for 1.5.0, it's probably too risky for 1.5.1. In practice I appreciate there's an extra level of caution between RCs, but implementing that does in practice mean either branching the branch so things can go into 1.5.1 that don't go into 1.5.0, or, just holding back a couple days. The RC process is still taking weeks in general, so the holding-off approach is not that tenable. |
This is a pretty common technique throughout the codebase and broken compatibility is pretty annoying for library writers, so I'd be inclined to include this in branch-1.5 (before the next RC is cut). |
Talked to @rxin, going to merge. |
…ds in more places SparkHadoopUtil contains methods that use reflection to work around TaskAttemptContext binary incompatibilities between Hadoop 1.x and 2.x. We should use these methods in more places. Author: Josh Rosen <joshrosen@databricks.com> Closes #8499 from JoshRosen/use-hadoop-reflection-in-more-places. (cherry picked from commit 6a6f3c9) Signed-off-by: Michael Armbrust <michael@databricks.com>
…obContext methods This is a followup to #8499 which adds a Scalastyle rule to mandate the use of SparkHadoopUtil's JobContext accessor methods and fixes the existing violations. Author: Josh Rosen <joshrosen@databricks.com> Closes #8521 from JoshRosen/SPARK-10330-part2.
SparkHadoopUtil contains methods that use reflection to work around TaskAttemptContext binary incompatibilities between Hadoop 1.x and 2.x. We should use these methods in more places.