Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8857][SPARK-8859][Core]Add an internal flag to Accumulable and send internal accumulator updates to the driver via heartbeats #7448

Closed
wants to merge 2 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Jul 16, 2015

This PR includes the following changes:

  1. Remove the thread local Accumulators.localAccums. Instead, all Accumulators in the executors will register with its TaskContext.
  2. Add an internal flag to Accumulable. For internal Accumulators, their updates will be sent to the driver via heartbeats.

@JoshRosen
Copy link
Contributor

What happens if my accumulator update is huge? Could this lead to dropped heartbeats?

@JoshRosen
Copy link
Contributor

Ah, I see now that it's only for specific accumulators and is internal-only.

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37509 has finished for PR 7448 at commit 7182813.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2015

Test build #37510 has finished for PR 7448 at commit bd7dcf1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@transient initialValue: R,
param: AccumulableParam[R, T],
val name: Option[String])
val name: Option[String],
internal: Boolean)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add internal to the scaladoc param to explain what it is

@rxin
Copy link
Contributor

rxin commented Jul 16, 2015

cc @andrewor14 for review also

@rxin
Copy link
Contributor

rxin commented Jul 16, 2015

I took a look at accumulator implementation -- one thing that is really annoying is boxing. Right now every update will box -- and it will really sucks for SQL if we do update per row.

In order for SQL to work better, I think we'd need to create a subtype of accumulator called LongAccumulator and DoubleAccumulator, that includes specialized methods for updating.

That can be done in a separate pull request.

@zsxwing
Copy link
Member Author

zsxwing commented Jul 17, 2015

Added comments.

In order for SQL to work better, I think we'd need to create a subtype of accumulator called LongAccumulator and DoubleAccumulator, that includes specialized methods for updating.

That can be done in a separate pull request.

Will do it in a follow up PR.

@rxin
Copy link
Contributor

rxin commented Jul 17, 2015

LGTM.

@SparkQA
Copy link

SparkQA commented Jul 17, 2015

Test build #37559 has finished for PR 7448 at commit c24bc5b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 17, 2015

Thanks - I've merged this.

@asfgit asfgit closed this in 812b63b Jul 17, 2015
@zsxwing zsxwing deleted the accumulators branch July 17, 2015 04:22
@andrewor14
Copy link
Contributor

LGTM2, but it would be great to have some unit tests on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants