Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28933][ML] Reduce unnecessary shuffle in ALS when initializing factors #25639

Closed
wants to merge 1 commit into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Aug 30, 2019

What changes were proposed in this pull request?

When Initializing factors in ALS, we should use mapPartitions instead of current map, so we can preserve existing partition of the RDD of InBlock. The RDD of InBlock is already partitioned by src block id. We don't change the partition when initializing factors.

Why are the changes needed?

This patch can reduce unnecessary shuffle after initializing factors.

Does this PR introduce any user-facing change?

No

How was this patch tested?

It should not change existing tests. It should pass added test that verifies shuffle dependency of factor RDDs.

@viirya
Copy link
Member Author

viirya commented Aug 30, 2019

cc @felixcheung

@SparkQA
Copy link

SparkQA commented Aug 31, 2019

Test build #109976 has finished for PR 25639 at commit 60b3d5d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Aug 31, 2019

cc @srowen

@viirya
Copy link
Member Author

viirya commented Sep 1, 2019

Thanks all. I will try to merge this tomorrow.

@viirya viirya closed this in 19f882c Sep 2, 2019
@viirya
Copy link
Member Author

viirya commented Sep 2, 2019

Merged to master. Thanks all for review!

@viirya viirya deleted the fix-als-partition branch December 27, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants