Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR][CORE] Improved statistical shuffle write time #19693

Closed
wants to merge 1 commit into from

Conversation

heary-cao
Copy link
Contributor

What changes were proposed in this pull request?

Creating the file to write to and creating a disk writer both involve interacting with the disk, and can take a long time when we open or close many files, so should be included in the shuffle write time.

so we call mergeSpillsWithTransferTo, only contains the write file the time, but did not included in the shuffle write time when open and close many merges spill files .

How was this patch tested?

existed test cases.

@heary-cao heary-cao changed the title [CORE]improved statistical shuffle write time [MINOR][CORE]improved statistical shuffle write time Nov 8, 2017
@heary-cao heary-cao changed the title [MINOR][CORE]improved statistical shuffle write time [MINOR][CORE] Improved statistical shuffle write time Nov 8, 2017
@jiangxb1987
Copy link
Contributor

Personally I don't think we should take the file open/close time into account, cc @cloud-fan .

@cloud-fan
Copy link
Contributor

I don't have a strong preference, cc @srowen

@jerryshao
Copy link
Contributor

Whether shuffle write time should include the file open/close time is debatable, also we don't know whether the actual open action is lazy or not (depends on OS). But one downside of this change is that it makes this metric incomparable to the old one, because we changed the semantics now.

@jiangxb1987
Copy link
Contributor

Seems it's better to keep the previous behavior.

@heary-cao
Copy link
Contributor Author

@cloud-fan @jerryshao @jiangxb1987 thank you for review it.
Whether shuffle write time should include the file open/close time is debatable I agree with you @jerryshao . Because review spark code, see the realization of this controversy, But this PR hope to unify.

@cloud-fan
Copy link
Contributor

This can be a debate when we introduced this metric, now I think it's too late to change.

@HyukjinKwon
Copy link
Member

Shall we leave this closed for now @heary-cao?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@heary-cao heary-cao closed this Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants