Skip to content

[SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads#37573

Closed
ryan-johnson-databricks wants to merge 2 commits intoapache:masterfrom
ryan-johnson-databricks:no-task-listener-overloads
Closed

[SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads#37573
ryan-johnson-databricks wants to merge 2 commits intoapache:masterfrom
ryan-johnson-databricks:no-task-listener-overloads

Conversation

@ryan-johnson-databricks
Copy link
Contributor

@ryan-johnson-databricks ryan-johnson-databricks commented Aug 18, 2022

What changes were proposed in this pull request?

TaskContext currently defines two sets of functions for registering listeners:

def addTaskCompletionListener(listener: TaskCompletionListener): TaskContext
def addTaskCompletionListener[U](f: (TaskContext) => U): TaskContext

def addTaskFailureListener(listener: TaskFailureListener): TaskContext
def addTaskFailureListener(f: (TaskContext, Throwable) => Unit): TaskContext

Before JDK8 and scala-2.12, the overloads were a convenient way to register a new listener without having to instantiate a new class. However, with the introduction of functional interfaces in JDK8+, and subsequent SAM support in scala-2.12, the two function signatures are now equivalent because a function whose signature matches the only method of a functional interface can be used in place of that interface.

Result: cryptic ambiguous overload errors when trying to use the function-only overload, which prompted a scala bug report (which was never addressed), as well as an attempted workaround that makes addTaskCompletionListener gratuitously generic, so that the compiler no longer considers e.g. addTaskCompletionListener[Unit] as equivalent to the overload that accepts a TaskFailureListener. The latter workaround was never applied to addTaskFailureListener for some reason.

Now that scala-2.12 on JDK8 is the minimum supported version, we can dispense with the overloads and rely entirely on language SAM support to work as expected. The vast majority of call sites can now use the function form instead of the class form, which simplifies the code considerably.

While we're at it, standardize the call sites. One-liners use the functional form:

addTaskCompletionListener(_ => ...)

while multi-liners use the block form:

addTaskCompletionListener { _ =>
  ...
}

Why are the changes needed?

Scala SAM feature conflicts with the existing overloads. The task listener interface becomes simpler and easier to use if we align with the language.

Does this PR introduce any user-facing change?

Developers who rely on this developer API will need to remove the gratuitous [Unit] when registering functions as listeners.

How was this patch tested?

All use sites in the spark code base have been updated to use the new mechanism. The fact that they continue to compile is the strongest evidence that the change worked. The tests that exercise the changed code sites also verify correctness.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @ryan-johnson-databricks .
Apache Spark uses the PR contributor's GitHub Action resources instead of Apache Spark GitHub Action resources. Please enable GitHub Action in your repository. Currently, it seems to be disabled in your repo.

Screen Shot 2022-08-18 at 5 30 49 PM

@HyukjinKwon HyukjinKwon changed the title [SPARK-40141] Remove unnecessary TaskContext addTaskXxxListener overloads [SPARK-40141][CORE] Remove unnecessary TaskContext addTaskXxxListener overloads Aug 19, 2022
@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Aug 19, 2022

This reminds me so many pair methods for both Scala and Java friendly... Even there are several such methods in DataFrame. When we migrated to Scala 2.12, we figured out such issue, and our decision was let them leave to not break anything. (The direction of decision was more likely that we do not remove any existing public API.)

But I agree it would be nice if we could reconsider removal of one of pair methods which can be just taken care by Scala. It's quite bugging to take workload (e.g. explicitly returns Unit? null?) due to ambiguity.

@HeartSaVioR
Copy link
Contributor

HeartSaVioR commented Aug 19, 2022

Another thing to check is that whether this changes enforce end users to rebuild their app jar or not (binary compatibility). If we all consider that end users should rebuild their app jar based on the Spark version then that is OK, but we probably may not want to enforce this in every minor version upgrade, and if we have to break then we may need to clarify the benefits.

reader.close()
}
}
Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ => reader.close()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these cleanup possible without removing the overloads with scala lambda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, yes -- tho they sometimes don't fit on one line until the [Unit] gets deleted.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@ryan-johnson-databricks
Copy link
Contributor Author

Hi, @ryan-johnson-databricks . Apache Spark uses the PR contributor's GitHub Action resources instead of Apache Spark GitHub Action resources. Please enable GitHub Action in your repository. Currently, it seems to be disabled in your repo.

Screen Shot 2022-08-18 at 5 30 49 PM

I did enable it -- and it even worked briefly -- but something seems to have gone wrong. I have an open ticket w/ github support but they've not yet responded.

@cloud-fan
Copy link
Contributor

This is developer API so I'm fine with this cleanup. Can you push an empty commit to retrigger the Github Action tests?

*
* Exceptions thrown by the listener will result in failure of the task.
*/
@DeveloperApi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that the TaskCompletionListener and TaskFailureListener classes themselves have been marked as DeveloperApi since ~2014/2015 👍

Copy link
Contributor

@JoshRosen JoshRosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to summarize my understanding of the source- and binary-compatibility aspects of this change:

  • This change is binary-incompatible with older binaries that were calling the removed method. However, it's technically possible for developers to make a build which is binary-compatible with both old and new Spark versions: they just have to explicitly use the other overload.
  • This change is source-incompatible in two directions:
    • Code written against old versions with the explicitly-specified [Unit] will no longer compile against new Spark versions.
    • Code written against new Spark versions using lambda syntax will not compile against old Spark versions because the overload would/could be ambiguous.

A fully-compatible option is available for users: if they explicitly use the non-lambda interface then they can be source- and binary-compatible across a range of versions.

For Spark's own internal development, I'm wondering whether this change will introduce source-compatibility concerns in our own patch backports: If I write a bugfix patch using the new lambda syntax then cherry-pick that patch to older branches then I'll run into compile failures. Of course, the option to be compatible exists but a developer might forget to use it (esp. since IDEs are likely to suggest a replacement to the lambda syntax).

@ryan-johnson-databricks
Copy link
Contributor Author

ryan-johnson-databricks commented Aug 22, 2022

For Spark's own internal development, I'm wondering whether this change will introduce source-compatibility concerns in our own patch backports: If I write a bugfix patch using the new lambda syntax then cherry-pick that patch to older branches then I'll run into compile failures. Of course, the option to be compatible exists but a developer might forget to use it (esp. since IDEs are likely to suggest a replacement to the lambda syntax).

I'm not sure that concern is very troublesome?

  1. The task listener code is very slow-moving -- code involving a task listener tends to not change after being added, other than a half dozen refactors during the last 4 years that happen to move/indent a listener. Full analysis below.
  2. In the last four years, 11 new listeners were added by a half dozen code authors. Each case would have required the author to figure out that [Unit] was needed. Meanwhile, there has only been ONE bugfix "backport" since [Unit] was added in 2018 -- and that one backport reverted one of the 11 new-code changes, from master back to a recent branch cut, to fix a regression found during release testing. So today the new dev cost of keeping [Unit] is 10x higher than the backport cost of removing it.
  3. it's pretty easy to add [Unit] back in if needed?

(***) By "very slow moving" I mean:

  • There are no prod uses of addTaskFailureListener in the spark code base today -- which probably explains why nobody realized it needed to become polymorphic before now.
  • Out of 45 prod files that use addTaskCompletionListener[Unit], 28 have no changes to that listener since the Aug 2018 refactor that added [Unit] as part of scala-2.12 effort. Of the 17 remaining files (full list below) only one change was a bug fix that would have needed a backport. That change added a new listener (2022-03-09) and merged to master less than a week before spark-3.3 branch cut (2022-03-15); a perf regression was during release testing and fixed by revert (2022-04-26).

List of recent changes involving addTaskCompletionListener[Unit]:

  • d3d2292 in Jun 2022, ArrowConverters.scala (adding a null-check to the task context a listener gets added to, as part of a refactor)
  • 20ffbf7 in Apr 2022, DataSourceRDD.scala (refactor-induced indentation change)
  • 6b5a1f9 in Apr 2022, ShuffledHashJoinExec.scala (revert a Mar 2022 change to code first added in Aug 2020)
  • 8714eef in Aug 2021, Columnar.scala (refactor-induced indentation change)
  • 3257a30 in Jun 2021, RocksDB.scala (implement new RocksDB connector)
  • cc9a158 in Jun 2021, SortMergeJoinExec.scala (added a new completion listener to update a spill size metric)
  • f11950f in Mar 2021, ExternalSorter.scala (refactor that moved existing code to a new location and also reformatted it)
  • d871b54 in Jan 2021, ObjectAggregationIterator.scala (added a new completion listener for capturing metrics)
  • 21413b7 in Nov 2020, streaming/state/package.scala (new code)
  • 713124d in Aug 2020, InMemoryRelation.scala (refactor that moved an existing completion listener)
  • d7b268a in Dec 2019, CollectMetricsExec.scala (added a new completion listener that collects metrics)
  • 05988b2 in Sep 2019, BaseArrowPythonRunner.scala (refactor that moved existing code to a new file, otherwise unchanged)
  • 3663dbe in Jul 2019, AvroPartitionReaderFactory.scala (new code, avro DSv2 support)
  • 23ebd38 in Jun 2019, ParquetPartitionReaderFactory.scala (new code, parquet DSv2 support)
  • d50603a in Apr 2019, TextPartitionReaderFactory.scala (new code, text DSv2 support)
  • 8126d09 in Feb 2019, ArrowRunner.scala (new code)
  • 1280bfd in Jan 2019, PipedRDD.scala (bug fix that required adding a new completion listener)

@mridulm
Copy link
Contributor

mridulm commented Aug 22, 2022

Since this is a source level incompatibility (as per @JoshRosen's analysis), and given this has been around for a while as a DeveloperApi, I would mark it as deprecated - and relook at removing it in the next major release.
We can ofcourse cleanup our own use

@ryan-johnson-databricks
Copy link
Contributor Author

Since this is a source level incompatibility (as per @JoshRosen's analysis), and given this has been around for a while as a DeveloperApi, I would mark it as deprecated - and relook at removing it in the next major release. We can ofcourse cleanup our own use

We can't "just" mark it as deprecated and also clean up our own call sites, because this is an ambiguous overload. If we mark the polymorphic function overload as deprecated, that just forces everyone to either ignore the warning, or to create an actual listener object until we get around to removing the deprecated overload.

Neither of those seems to provide much benefit?

@srowen
Copy link
Member

srowen commented Aug 23, 2022

Yep deprecation doesn't help. The caller can use casts to disambiguate it, but that's ugly. I wouldn't object strongly to remove this before 4.0 as it's a developer API, but by the same token, it's a developer API. Is it worth the binary-compatibility breakage vs just having devs use casts?

@github-actions
Copy link

github-actions bot commented Dec 1, 2022

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 1, 2022
@github-actions github-actions bot closed this Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants