Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pySOT is slow to generate new points with large dataset #49

Closed
sjohnson-FLL opened this issue Jan 30, 2023 · 1 comment
Closed

pySOT is slow to generate new points with large dataset #49

sjohnson-FLL opened this issue Jan 30, 2023 · 1 comment

Comments

@sjohnson-FLL
Copy link

sjohnson-FLL commented Jan 30, 2023

I am using pySOT with EIStrategy and GPRegressor surrogate model for a 6-dimensional optimization problem (all axes are continuous). I am using 10 worker threads. I have Python 3.11.1 and Windows 10.

In recent runs, I have observed that after a little more than 1000 data points, generating new points starts to slow WAY down. Before this occurs, most time is spent evaluating the objective function, with worker threads being assigned a new task within seconds (or ms even) of finishing the previous evaluation. However, at a little over 1000 data points, I noticed that almost all the time is spent waiting for new assignments.

This is confirmed by computing resource allocation. Prior to 1k, the evaluations called by worker threads take up all available CPU. After 1k, Python takes ~30% and worker threads rarely take any at all. In fact, only one worker thread is ever actually evaluating the objective at once because it gets done evaluating the objective before the next point has been generated.

This leads me to suspect that the process of generating new points slows WAY down around 1k points.

  1. Is this expected behavior? It is totally possible I've got a bug in my code somewhere that is causing this.
  2. Any tips to work around this?
    a. I would be happy to accept generating less optimal points if it meant more could be generated faster.
    b. I have some capacity to rewrite and/or multithread functions in C, if there is a specific function that may be the bottleneck.

Edit: this SE article seems relevant. https://stats.stackexchange.com/questions/326446/gaussian-process-regression-for-large-datasets

@sjohnson-FLL
Copy link
Author

Update: After further investigation, I now believe my problem was caused by the dtol setting in EIStrategy(). The default value of dtol is 10^-3*norm(ub-lb). It makes sense that generating new points starts to slow down when the requirement for proximity to previous points is overly strict. I solved my problem by changing to dtol=5*10^-5*norm(ub-lb). This solved my problem for at least 7000 points. I imagine if it starts to slow down again I can decrease the scaling on dtol again.

Closing the issue, but wanted to leave this comment here in case anyone else has similar troubles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant