Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29544][CORE] collect the runtime statistics of row count in map stage #26309

Closed
wants to merge 4 commits into from

Conversation

JkSelf
Copy link
Contributor

@JkSelf JkSelf commented Oct 30, 2019

What changes were proposed in this pull request?

Similar with the approach of collecting data size, this PR collect the row count info when shuffle write and wrap it in MapStatus, then driver can get the row count info from the returned MapStatus.

Why are the changes needed?

In order to optimize the skewed partition, we need collect the row count statistics in map stage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

unit tests

@JkSelf
Copy link
Contributor Author

JkSelf commented Oct 30, 2019

@cloud-fan This is the first PR of optimizing skewed partition. Please help me review. Thanks.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112889 has finished for PR 26309 at commit 511481d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class MapInfo

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112893 has finished for PR 26309 at commit ba283d1.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 30, 2019

Test build #112904 has finished for PR 26309 at commit 0032448.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JkSelf JkSelf changed the title [SPARK-28560] [1] collect the runtime statistics of row count in map stage [SPARK-29544] [1] collect the runtime statistics of row count in map stage Oct 31, 2019
@JkSelf JkSelf changed the title [SPARK-29544] [1] collect the runtime statistics of row count in map stage [SPARK-29544] collect the runtime statistics of row count in map stage Oct 31, 2019
@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #112980 has finished for PR 26309 at commit b43e693.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Oct 31, 2019

Test build #112999 has finished for PR 26309 at commit b43e693.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JkSelf
Copy link
Contributor Author

JkSelf commented Oct 31, 2019

@cloud-fan please help retest again. The error may be not related. Thanks.

@JkSelf
Copy link
Contributor Author

JkSelf commented Nov 1, 2019

@cloud-fan please help retest again. Thanks.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Nov 1, 2019

Test build #113086 has finished for PR 26309 at commit b43e693.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29544] collect the runtime statistics of row count in map stage [SPARK-29544][CORE] collect the runtime statistics of row count in map stage Nov 13, 2019
@dongjoon-hyun
Copy link
Member

Retest this please

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 13, 2019

BTW, @JkSelf .

  • The component name depends on the code you touched. This PR touches core module only.
  • In addition to that, this PR looks insufficient to resolve SPARK-29544. If you are not going to resolve SPARK-29544 completely by this PR, you had better have another JIRA issue which is narrow-downed to this PR.

@SparkQA
Copy link

SparkQA commented Nov 14, 2019

Test build #113736 has finished for PR 26309 at commit b43e693.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Feb 23, 2020
@github-actions github-actions bot closed this Feb 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants