Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelisation helper function #20

Open
hsteptoe opened this issue Apr 17, 2020 · 1 comment
Open

Parallelisation helper function #20

hsteptoe opened this issue Apr 17, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@hsteptoe
Copy link

hsteptoe commented Apr 17, 2020

With reference to George Ford's ASKS presentation (attached below), I wonder if there is something that can be incorporated here...

Need to check that his suggested method is still up-to-date with respect to Dask etc. but I think the principal still holds... could be used in #17

20181107_GFord_parallel_python.pptx

@zmaalick zmaalick added the enhancement New feature or request label May 20, 2020
@thomascrocker
Copy link
Collaborator

A dask version of the approach outlined in George's presentation is trivial to implement:
e.g. instead of:

results = []
for i in iterable:
   results.append(my_function(i))

Something like:

import dask.bag as db

# this is needed if running on SPICE to correctly determine CPUs requested
if "SLURM_NTASKS" in os.environ.keys():
    WORKERS = int(os.environ["SLURM_NTASKS"])
else:
    WORKERS = os.cpu_count()

iter_bag = db.from_sequence(iterable)
results =  iter_bag.map(my_function).compute(num_workers=WORKERS)

This is the approach that I've used for some time to do e.g. multiple MOOSE retrievals in parallel, parallel processing of multiple files on disk, and to break up some other calculation work (e.g. by year).

I have been investigating a little more with the use of dask.array for more sophisticated stuff that should be more efficient for some tasks with time / memory, but it will need to wait till I have my Antigua stuff out the way as it's a bit more involved (mainly in setting up the correct dask environment before hand, the code itself is in some cases even simpler), and my approaches using dask.bag are good enough for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants