Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12896] [WIP] Send only accumulator updates to driver, not TaskMetrics #10857

Closed
wants to merge 44 commits into from

Conversation

andrewor14
Copy link
Contributor

This patch builds on top of #10835, so the actual diff should be much smaller. This would complete SPARK-10620, which replaces much of the logic in TaskMetrics with accumulators. This was once part of #10717. It was split into its own patch so it's more reviewable.

The high level idea is that instead of having the executors send both accumulator updates and TaskMetrics, we should have them send only accumulator updates. This is because TaskMetrics can be represented as a collection of accumulators (as we have done so in #10835) and we can just reconstruct them on the driver based on the accumulator updates.

While an effort has been made to preserve as much of the public API as possible, there were a few known breaking @DeveloperAPI changes that would be very awkward to maintain. I will gather the full list shortly and post it here.

Lastly, this is still WIP because more tests are pending.

Andrew Or added 30 commits January 18, 2016 14:55
commit 269031f
Author: Andrew Or <andrew@databricks.com>
Date:   Mon Jan 18 16:33:12 2016 -0800

    Remove unused method

commit c04b5df
Author: Andrew Or <andrew@databricks.com>
Date:   Mon Jan 18 16:13:08 2016 -0800

    Review comments

commit d2e4e23
Author: Andrew Or <andrew@databricks.com>
Date:   Mon Jan 18 14:42:19 2016 -0800

    One more

commit 202d48e
Merge: e99b9af 4f11e3f
Author: Andrew Or <andrew@databricks.com>
Date:   Mon Jan 18 14:27:47 2016 -0800

    Merge branch 'master' of github.com:apache/spark into get-or-create-metrics

commit e99b9af
Merge: 34c7ce5 b8cb548
Author: Andrew Or <andrew@databricks.com>
Date:   Mon Jan 18 13:56:41 2016 -0800

    Merge branch 'master' of github.com:apache/spark into get-or-create-metrics

    Conflicts:
    	core/src/main/scala/org/apache/spark/CacheManager.scala
    	core/src/main/scala/org/apache/spark/memory/StorageMemoryPool.scala
    	core/src/test/scala/org/apache/spark/CacheManagerSuite.scala

commit 34c7ce5
Author: Andrew Or <andrew@databricks.com>
Date:   Mon Jan 18 12:46:42 2016 -0800

    Hide updatedBlocks

commit ad094f0
Author: Andrew Or <andrew@databricks.com>
Date:   Mon Jan 18 12:30:59 2016 -0800

    Clean up JsonProtocol

    This commit collapsed 10 methods into 2. The 8 that were inlined
    were only used in 1 place each, and the body of each was quite
    small. The additional level of abstraction did not add much value
    and made the code verbose.

commit 0785984
Author: Andrew Or <andrew@databricks.com>
Date:   Mon Jan 18 12:20:28 2016 -0800

    Replace set with register

    JsonProtocol remains the only place where we still call set
    on each of the *Metrics classes.

commit b9d7fbf
Author: Andrew Or <andrew@databricks.com>
Date:   Mon Jan 18 12:10:17 2016 -0800

    Clean up places where we set OutputMetrics

    Note: there's one remaining place, which is JsonProtocol.

commit 62c96e1
Author: Andrew Or <andrew@databricks.com>
Date:   Mon Jan 18 11:50:04 2016 -0800

    Add register* methods (get or create)
Previously we would always zero out an accumulator when we
deserialize it. Certainly we don't want to do that on the driver.
The changes in this commit are temporary and will be reverted
in SPARK-12896.
Tests don't pass yet, obviously...
+ fix tests, which are still failing
... by setting it to zero on the executors, always.
…e-accums

Conflicts:
	core/src/main/scala/org/apache/spark/CacheManager.scala
	core/src/main/scala/org/apache/spark/executor/Executor.scala
	core/src/main/scala/org/apache/spark/executor/ShuffleWriteMetrics.scala
	core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
	core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala
	core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
	core/src/test/scala/org/apache/spark/ui/jobs/JobProgressListenerSuite.scala
	core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
Also add some useful error printing if things don't match.
We never updated the original ShuffleReadMetrics accumulators;
we were updating a copy.
Andrew Or added 13 commits January 20, 2016 15:20
As of this commit, we currently only send accumulator updates
from the executors to the driver. Note that executors still send
back TaskMetrics to the driver when they heartbeat. This will be
resolved in a future commit.

There are a number of API breaking changes in this commit. These
will be detailed in a higher level audit of all public APIs.

Tests do not compile yet.
These changes were introduced because SPARK-12895 makes us send
accumulators both ways between drivers and executors. This is no
longer true as of the previous commit, so we can remove these
TODOs and the code associated with them.
@SparkQA
Copy link

SparkQA commented Jan 21, 2016

Test build #49835 has finished for PR 10857 at commit abde0ed.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 21, 2016

Test build #49842 has finished for PR 10857 at commit 1a590e6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

While I realize that this has the potential to create merge conflicts, it would be super helpful if you could squash all of the commits which came from #10835 into a single commit here in order to make it easier to spot the delta from that PR to this one.

Seq[String](
output.WRITE_METHOD,
output.BYTES_WRITTEN,
output.RECORDS_WRITTEN).map(create)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will count on failures, not consistent than SQL ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll add a comment.

@andrewor14 andrewor14 closed this Jan 25, 2016
@andrewor14
Copy link
Contributor Author

I just merged this into #10835

@andrewor14 andrewor14 deleted the dont-send-task-metrics branch February 3, 2016 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants