Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple number of runs from active processes #20

Merged
merged 7 commits into from Apr 10, 2021
Merged

Decouple number of runs from active processes #20

merged 7 commits into from Apr 10, 2021

Conversation

m-rph
Copy link
Contributor

@m-rph m-rph commented Mar 2, 2021

Hi Simon. This is what I had in mind, basically it allows joblib or multiprocessing to use to use n_jobs processes to execute n_run "jobs". Note that it is a breaking change.

Solves:

Using n_jobs to get a large number of runs results in spawning too many processes which end up filling the memory and cause thrashing. The patch limits the number of active processes to n_jobs while still allowing hyperactive to execute n_runs times the optimization process.

@SimonBlanke
Copy link
Owner

Hello @partiallytyped,

now i understand what you wanted to do. This looks very good and i think it would make sense to include this into Hyperactive in the future. I have some questions/things to add:

  • This should be commited to the dev branch for now
  • It is an API change. A very small one but still a change, so we have to release this in v4.0.0
  • What do you mean by "it is a breaking change"?
  • The default n_jobs should be "auto" and internally set to the total number of runs. If you pass an int to n_jobs it should work like in your commit. This way it works better "out of the box".

@SimonBlanke SimonBlanke linked an issue Mar 3, 2021 that may be closed by this pull request
@m-rph
Copy link
Contributor Author

m-rph commented Mar 3, 2021

What do you mean by "it is a breaking change"?
I meant that it is an API change.
It is an API change. A very small one but still a change, so we have to release this in v4.0.0

Your suggestions sound good, I will add the changes and push it.

Copy link
Owner

@SimonBlanke SimonBlanke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to leave a formal review. The important points are:

  • Commit to the dev branch
  • The default n_jobs should be "auto" and internally set to the total number of runs. If you pass an int to n_jobs it should work like in your commit. This way it works better "out of the box".

Thanks again for your work!

@m-rph m-rph marked this pull request as draft March 14, 2021 18:54
@m-rph m-rph changed the base branch from master to dev March 14, 2021 18:58
@m-rph m-rph marked this pull request as ready for review March 28, 2021 23:06
@SimonBlanke
Copy link
Owner

Okay, very nice!

I had an idea how to make this compatible with the current API. I would like your opinion.

We could leave the n_jobs/n_runs (old/new) parameter in .add_search(...) and rename the new parameter in Hyperactive(...) to something else. Maybe one of those:

  • n_processes
  • n_parallel
  • n_threads

I guess you have more experience with multiprocessing. Which would you choose?
After that I will resolve the conflicts and merge the pull request.

@m-rph
Copy link
Contributor Author

m-rph commented Apr 10, 2021

I'd go with n_processes since joblib and multiprocess use processes instead of threads.

Sorry for taking too long to respond, exams..

@SimonBlanke
Copy link
Owner

@partiallytyped,
No problem it wasn't that urgent. Thank you very much for your contribution!

@SimonBlanke SimonBlanke merged commit 2fc57ae into SimonBlanke:dev Apr 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

n_jobs opens a limited number of processes
2 participants