Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelise descriptor computation. #39

Closed
zhubonan opened this issue Jun 8, 2020 · 3 comments
Closed

parallelise descriptor computation. #39

zhubonan opened this issue Jun 8, 2020 · 3 comments

Comments

@zhubonan
Copy link
Collaborator

zhubonan commented Jun 8, 2020

It would be nice to use all those core of a PC.

At the moment ASAP is serial. It would be relatively easy to parallelise over the frames using multiprocessing.Pool in the following code:

ASAP/asaplib/data/xyz.py

Lines 197 to 205 in 2ccd298

for i in sbs:
frame = self.frames[i]
# compute atomic descriptor
desc_dict_now, atomic_desc_dict_now = global_desc.compute(frame)
self.global_desc[i].update(desc_dict_now)
if keep_atomic:
self.atomic_desc[i].update(atomic_desc_dict_now)
# we mark down that this descriptor has been computed
self.computed_desc_dict['descriptors'][tag] = global_desc.desc_spec_dict

There can be some complications while combining this with having a progress bar, but solutions can be found at tqdm/tqdm#484.

Alternatively, there is already parallel implementation in dscribe, but one needs to modify the call stack and group the calls to SOAP.create together.

@BingqingCheng
Copy link
Owner

I prefer to do this at the asap level, as not all descriptors are parallelized, and this is really a trivial parrallelzation.

To do this properly, I'm thinking perhaps we first sort the frames according to sizes (as a proxy to the associated computing cost), and then start N processes for each N frames with similar sizes.

What do you think?

@zhubonan
Copy link
Collaborator Author

zhubonan commented Jun 9, 2020

I agree. Sorting is probably not necessary as one can just use strides of the input frames? There might be some complications for avoiding using too much memory. For example, if the child process just inherent all the resources, the memory cost will increase N fold.

It may be better to use the joblib library, this is what dscribe uses.

https://joblib.readthedocs.io/en/latest/parallel.html#embarrassingly-parallel-for-loops

@BingqingCheng
Copy link
Owner

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants