Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to parallelization and related issues #27

Merged

Conversation

DarwinAwardWinner
Copy link
Contributor

With the changes in d36ff29, the number of chunks should now always be an exact multiple of the number of workers, if that number is available.

Note that I've chosen to adjust the number of workers upward to the next smallest multiple of the number of workers. So for example, if n_workers = 60, then the number of chunks will be 120. This also means that if n_workers is greater over 100, then n_chunks will just be set to n_workers, no matter how large that may be. It might make more sense instead to adjust the number of chunks downward to the largest multiple of n_workers less than or equal to 100. But in that case you need to add a special case for n_workers > 100, in which you just set n_chunks to 100, since otherwise the largest multiple would be 0.

Another note: I'm not sure if you chose 100 chunks for reasons relating to memory usage or other concerns, which I why I implemented things as described above. However, it might make sense to simply always set n_chunks = n_workers whenever n_workers is available. Obviously, the code for that would be much simpler, and I'd be happy to implement that if you think it would be better.

Lastly, I've also adjusted the logic for avoiding excessive splitting of small objects by enforcing a minimum chunk size and adjusting n_chunks accordingly, instead of just setting n_chunks to 1 if the object is small enough. For example, with min_chunk_size = 20 and nrow = 100, n_chunks will be capped at 5, regardless of all other options. I've set min_chunksize to 20, but that's an arbitrary and untested choice, so if you have a better intuition for this, feel free to change it.

As for the other commits:

Ryan C. Thompson added 3 commits November 23, 2020 11:35
Instead of using REDUCE=c in bpiterate, use do.call(c, ...) on the
result, to perform all the reductions at once. This saves a good bit
of time in the aggregation step.
Now the number of chunks should always be an exact multiple of the
number of workers, if that number is available.
@DarwinAwardWinner
Copy link
Contributor Author

And for what it's worth, dream with 200 workers is actually somewhat faster on the cluster than 100 workers for our dataset (although nowhere near 2x faster thanks to the overhead of job submission).

Previously, if a model fit took 10000 seconds or longer, the runtime
message would say "Total:%GabrielHoffman#4.0-1e s", which is a glitch caused by
trying to print scientific notation with digits=0. This change sets
scientific=FALSE in each call to format, so it always just prints an
integer number of seconds with no scientific notation.
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken error check in .fitVarPartModel produces uninterpretable error message
2 participants