Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple threads does not work #26

Closed
minhhg opened this issue Feb 8, 2019 · 10 comments
Closed

Multiple threads does not work #26

minhhg opened this issue Feb 8, 2019 · 10 comments

Comments

@minhhg
Copy link

minhhg commented Feb 8, 2019

Hi,

I run a simple example with num_threads = 1, it is OK but with
multiple threads=4, I got the following error. I use the same example code (example_mars()).
However I replace the objective function as:

import pySOT.optimization_problems as sprob
PORT=8500
url="http://localhost:" + str(PORT) + "pysot_test"

class PySOTTestProblem(sprob.OptimizationProblem):
def init(self):
self.dim=1
self.info = "PySOT test Problem" #optional
self.lb = 0.9np.array([0.0])
self.ub = 1.1
np.array([2*np.pi])
self.int_var = np.array([])
self.cont_var = np.arange(0, 1)

def eval(self, x):
    """Evaluate the objective function at x
    
    :param x: Data point
    :type x: numpy.array
    :return: Value at x
    :rtype: float
    """
    self.__check_input__(x)       
    print("x=", x)
    spec=dict(x=x[0])
    results = requests.post(url,json=spec, timeout=6400).json()
    objective =  float(results['objective'])
    print("x=", beta, "objective=", objective)
    return objective

==============================================
on the server side
I have a web server (gunicorn and falcon) running and return the value of np.sin(x).

###========begin of error===========================

ValueError Traceback (most recent call last)
in
----> 1 equity_model_adaptation()

in equity_model_adaptation()
38
39 # Run the optimization strategy
---> 40 result = controller.run()
41
42 print('Best value found: {0}'.format(result.value))

~/miniconda3/envs/dp36/lib/python3.6/site-packages/poap/controller.py in run(self, merit, filter)
342 """
343 try:
--> 344 return self._run(merit=merit, filter=filter)
345 finally:
346 self.call_term_callbacks()

~/miniconda3/envs/dp36/lib/python3.6/site-packages/poap/controller.py in _run(self, merit, filter)
312 self._run_queued_messages()
313 time.sleep(0) # Yields to other threads
--> 314 proposal = self.strategy.propose_action()
315 if not proposal:
316 self._run_message()

~/miniconda3/envs/dp36/lib/python3.6/site-packages/pySOT/strategy.py in propose_action(self)
279 self.phase = 2
280 if self.asynchronous: # Always make proposal with asynchrony
--> 281 self.generate_evals(num_pts=1)
282 elif self.pending_evals == 0: # Make sure the entire batch is done
283 self.generate_evals(num_pts=self.batch_size)

~/miniconda3/envs/dp36/lib/python3.6/site-packages/pySOT/strategy.py in generate_evals(self, num_pts)
562 opt_prob=self.opt_prob, num_pts=num_pts, surrogate=self.surrogate,
563 X=self.X, fX=self.fX, Xpend=self.Xpend, weights=weights,
--> 564 sampling_radius=self.sampling_radius, num_cand=self.num_cand)
565
566 for i in range(num_pts):

~/miniconda3/envs/dp36/lib/python3.6/site-packages/pySOT/auxiliary_problems.py in candidate_srbf(num_pts, opt_prob, surrogate, X, fX, weights, Xpend, sampling_radius, subset, dtol, num_cand)
106 """
107 # Find best solution
--> 108 xbest = np.copy(X[np.argmin(fX), :]).ravel()
109
110 # Fix default values

~/miniconda3/envs/dp36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in argmin(a, axis, out)
1099
1100 """
-> 1101 return _wrapfunc(a, 'argmin', axis=axis, out=out)
1102
1103

~/miniconda3/envs/dp36/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
49 def _wrapfunc(obj, method, *args, **kwds):
50 try:
---> 51 return getattr(obj, method)(*args, **kwds)
52
53 # An AttributeError occurs if the object does not have

ValueError: attempt to get argmin of an empty sequence
==========end of error========================================

@dme65
Copy link
Owner

dme65 commented Feb 8, 2019

I think I need more information to understand why this error occurs, but I don't think it's actually because of a bug in pySOT. Can you try to post the code for a MCVE and I'll see what is going on.

@minhhg
Copy link
Author

minhhg commented Feb 11, 2019

Here is the test server code (pysottest.py)

'
"""
file: pysottest.py
"""
import numpy as np
import falcon
import json

def pysot_test(spec):
x = spec['x']
obj = np.sin(x)
return dict(objective=obj)
class PySOTTestResource(object):
"""
Serving api for 1 pysottest specification.

"""
def on_post(self, req, resp):
    spec=None
    data = None
    try:
        data = req.stream.read().decode('utf-8')
        log.info("data=%s", data)
        spec = json.loads(data)
        log.info("receive_request, spec= %s", spec)
        res = pysot_test(spec)
    except Exception as error:
        res = dict(systemError= str(error))
        log.error("System error %s\n, input data=%s", error, data, exc_info=True)
        resp.status = falcon.HTTP_500

    resp.body = json.dumps(res)
    log.info("end_request, res=%s",res)

app = application = falcon.API()
app.add_route('/pysot_test', PySOTTestResource())
'

@minhhg
Copy link
Author

minhhg commented Feb 11, 2019

then run
"gunicorn -b 0.0.0.0:8500 --reload pysottest --timeout 640 -w 28"
This should run the webserver on localhost at port 8500 and ready to service requests from client.

@minhhg
Copy link
Author

minhhg commented Feb 11, 2019

On other related issue, is that possible to support function eval(list_of_points) instead of one point?

@dme65
Copy link
Owner

dme65 commented Feb 12, 2019

Can you try to isolate the bug a bit more? I've very little bandwidth at the moment and will need an MCVE in order to look into this.

Regarding the eval function: No, this isn't possible, since each worker can only do one evaluation at a time.

@minhhg
Copy link
Author

minhhg commented Feb 13, 2019

The example is the smallest code for the case. It is just a calculation of sin(x). I have chosen it on purpose for investigation. The bug seems to me is that it is the way pySOT collects results. It is very simple, just copy the above code and run it. For the test case, please do not write a new one with np.sin(x). The main purpose of the web API in this example is simulation of complex functions.

Suggestion for eval(list_of_points) instead of multi-threads which does not work any the moment. If you support 1 thread with eval(list_of_points) and let user handle parallel computation of multiple points, then that is another way to do this. PySOT should focus on other aspects instead of parallel evaluation of supposed heavy functions. This is just my suggestion

@dme65
Copy link
Owner

dme65 commented Feb 13, 2019

There is nothing here that indicates that there is a bug in the pySOT multithreading. All of the tests are passing and I'm having no issues with the threading. There are several undefined variables in your functions, e.g., beta in PySOTTestProblem, so your setup may just be broken.

@dme65 dme65 closed this as completed Feb 13, 2019
@sharma-n
Copy link

sharma-n commented Feb 2, 2021

Hi @dme65,
First of all, kudos to the work! This package has been tons of help and very interesting to use. However, I was recently using the DYCORS strategy with RBF interpolant. I used ThreadController, and I tried it with both asynchronous=True and asynchronous=False. I am facing the same issue as @minhhg . I did some checking and the problem seems to be in candidate_dycors.py, line 57. The selection of new candidates involves selecting the current xbest. But in this case, fX=[] and because of that, argmin(fx) throws up an error.

Atleast for me, the candidate_dycors is called from surrogate_strategy.py, line282-287. Can you think of any reason why candidate_dycors() may be called with an empty fX?

The issue seems to be more common with asynchronous=True.

Any help would be appreciated!

@dme65
Copy link
Owner

dme65 commented Feb 3, 2021

Hi @sharma-n,

This may be because of a similar issue as in #37? If you have more workers than experimental design points, then it is possible that you end up in candidate_dycors without having any finished evaluations. There should really be some better input checking or a friendlier exception here, but it's a bit tricky since the strategy currently doesn't know the number of workers in the asynchronous setting. Can you try setting the number of experimental design points according to my comment in #37 and let me know if that fixes the issue?

Cheers,
David

@sharma-n
Copy link

sharma-n commented Feb 4, 2021

Hey @dme65 , thanks for the quick reply! Yes! that might be the problem. I will update this post if all goes well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants