-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use Iterator#size in RDD#count #736
Conversation
Can one of the admins verify this patch? |
Thanks for submitting this. Do you have a link to the Scala compiler/collection library ticket that impacted this? |
Jenkins, test this please. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@rxin I'm sorry I didn't got a link for that, but I didn't find any discussion about performance issue of Iterator#size, either. I just checked the source code of Iterator and desugar scala for loop to see what happened. |
This is not equivalent performance wise from casual look. |
I wrote a simple benchmark to test performance, Iterator#size really sucks... Sorry for my mistake, I'll close this pull request :( |
in RDD#count, we used while loop to get the size of Iterator because that Iterator#size used a for loop, which was slightly slower in that version of Scala. But for now, the current version of scala will translate the for loop in Iterator#size into
foreach
, which uses while loop to iterate the Iterator. So we can use Iterator#size directly now.