Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database initialization takes too much time due to stereoisomer enumeration #38

Open
DrrDom opened this issue May 21, 2024 · 2 comments

Comments

@DrrDom
Copy link
Contributor

DrrDom commented May 21, 2024

For the --init process, yes I notice that the compound is initialized very slowly a long time ago because some molecules take a long time to generate the isomers. That's why to speed up the process, I tend to multiply the ncpu needed with the cpu in the config.yml for docking (I hardcoded it since I don't want to add up more argument to --init_db at that time), which speeds up the process in linear fashion if I remembered (it takes around 3 hours to initialize ~600k compound including isomers with 150 CPU).

Originally posted by @Feriolet in #35 (comment)

@DrrDom
Copy link
Contributor Author

DrrDom commented May 21, 2024

Currently init_db function takes ncpu argument, which comes from the command line argument ncpu. The issue here is that the command line arg ncpu has different meaning if docking is launched on a single server or with dask on multiple servers. In dask-mode, this is the number of CPUs used for any other processing rather than docking. In docking on a single server this is additionally the number of molecules docked in parallel.

The obvious solution is to set ncpu in all functions to Pool.cpu_count() and a user will lose the control on those parts of a program and the control only on docking will remain. Not sure this is the best solution, but I do not see another option currently.

@DrrDom
Copy link
Contributor Author

DrrDom commented May 21, 2024

Another slow down is caused by not parallelized post-processing of molecules after protonation (in add_protonation), if molecules were submitted as 3D structures. There is an additional and time-consuming step of assigning correct bond orders. This can be also addressed in the context of this issue. I have a draft implementation to solve this, but did not test it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant