[SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads#37573
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Hi, @ryan-johnson-databricks .
Apache Spark uses the PR contributor's GitHub Action resources instead of Apache Spark GitHub Action resources. Please enable GitHub Action in your repository. Currently, it seems to be disabled in your repo.
|
This reminds me so many pair methods for both Scala and Java friendly... Even there are several such methods in DataFrame. When we migrated to Scala 2.12, we figured out such issue, and our decision was let them leave to not break anything. (The direction of decision was more likely that we do not remove any existing public API.) But I agree it would be nice if we could reconsider removal of one of pair methods which can be just taken care by Scala. It's quite bugging to take workload (e.g. explicitly returns Unit? null?) due to ambiguity. |
|
Another thing to check is that whether this changes enforce end users to rebuild their app jar or not (binary compatibility). If we all consider that end users should rebuild their app jar based on the Spark version then that is OK, but we probably may not want to enforce this in every minor version upgrade, and if we have to break then we may need to clarify the benefits. |
| reader.close() | ||
| } | ||
| } | ||
| Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ => reader.close())) |
There was a problem hiding this comment.
are these cleanup possible without removing the overloads with scala lambda?
There was a problem hiding this comment.
Generally, yes -- tho they sometimes don't fit on one line until the [Unit] gets deleted.
|
Can one of the admins verify this patch? |
I did enable it -- and it even worked briefly -- but something seems to have gone wrong. I have an open ticket w/ github support but they've not yet responded. |
|
This is developer API so I'm fine with this cleanup. Can you push an empty commit to retrigger the Github Action tests? |
| * | ||
| * Exceptions thrown by the listener will result in failure of the task. | ||
| */ | ||
| @DeveloperApi |
There was a problem hiding this comment.
I confirmed that the TaskCompletionListener and TaskFailureListener classes themselves have been marked as DeveloperApi since ~2014/2015 👍
JoshRosen
left a comment
There was a problem hiding this comment.
Just to summarize my understanding of the source- and binary-compatibility aspects of this change:
- This change is binary-incompatible with older binaries that were calling the removed method. However, it's technically possible for developers to make a build which is binary-compatible with both old and new Spark versions: they just have to explicitly use the other overload.
- This change is source-incompatible in two directions:
- Code written against old versions with the explicitly-specified
[Unit]will no longer compile against new Spark versions. - Code written against new Spark versions using lambda syntax will not compile against old Spark versions because the overload would/could be ambiguous.
- Code written against old versions with the explicitly-specified
A fully-compatible option is available for users: if they explicitly use the non-lambda interface then they can be source- and binary-compatible across a range of versions.
For Spark's own internal development, I'm wondering whether this change will introduce source-compatibility concerns in our own patch backports: If I write a bugfix patch using the new lambda syntax then cherry-pick that patch to older branches then I'll run into compile failures. Of course, the option to be compatible exists but a developer might forget to use it (esp. since IDEs are likely to suggest a replacement to the lambda syntax).
I'm not sure that concern is very troublesome?
(***) By "very slow moving" I mean:
List of recent changes involving
|
|
Since this is a source level incompatibility (as per @JoshRosen's analysis), and given this has been around for a while as a |
We can't "just" mark it as deprecated and also clean up our own call sites, because this is an ambiguous overload. If we mark the polymorphic function overload as deprecated, that just forces everyone to either ignore the warning, or to create an actual listener object until we get around to removing the deprecated overload. Neither of those seems to provide much benefit? |
|
Yep deprecation doesn't help. The caller can use casts to disambiguate it, but that's ugly. I wouldn't object strongly to remove this before 4.0 as it's a developer API, but by the same token, it's a developer API. Is it worth the binary-compatibility breakage vs just having devs use casts? |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |

What changes were proposed in this pull request?
TaskContextcurrently defines two sets of functions for registering listeners:Before JDK8 and scala-2.12, the overloads were a convenient way to register a new listener without having to instantiate a new class. However, with the introduction of functional interfaces in JDK8+, and subsequent SAM support in scala-2.12, the two function signatures are now equivalent because a function whose signature matches the only method of a functional interface can be used in place of that interface.
Result: cryptic ambiguous overload errors when trying to use the function-only overload, which prompted a scala bug report (which was never addressed), as well as an attempted workaround that makes
addTaskCompletionListenergratuitously generic, so that the compiler no longer considers e.g.addTaskCompletionListener[Unit]as equivalent to the overload that accepts aTaskFailureListener. The latter workaround was never applied toaddTaskFailureListenerfor some reason.Now that scala-2.12 on JDK8 is the minimum supported version, we can dispense with the overloads and rely entirely on language SAM support to work as expected. The vast majority of call sites can now use the function form instead of the class form, which simplifies the code considerably.
While we're at it, standardize the call sites. One-liners use the functional form:
addTaskCompletionListener(_ => ...)while multi-liners use the block form:
addTaskCompletionListener { _ => ... }Why are the changes needed?
Scala SAM feature conflicts with the existing overloads. The task listener interface becomes simpler and easier to use if we align with the language.
Does this PR introduce any user-facing change?
Developers who rely on this developer API will need to remove the gratuitous
[Unit]when registering functions as listeners.How was this patch tested?
All use sites in the spark code base have been updated to use the new mechanism. The fact that they continue to compile is the strongest evidence that the change worked. The tests that exercise the changed code sites also verify correctness.