Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zmq.error.Again: Resource temporarily unavailable #24

Closed
huminpurin opened this issue Dec 18, 2017 · 5 comments
Closed

zmq.error.Again: Resource temporarily unavailable #24

huminpurin opened this issue Dec 18, 2017 · 5 comments

Comments

@huminpurin
Copy link

huminpurin commented Dec 18, 2017

While runing the examples, I'm getting error like below

INFO:tensorflow:Starting queue runners. WARNING:worker_1:worker_1: started training at step: 120 Exception in thread Thread-4: Traceback (most recent call last): File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/threading.py", line 914, in _bootstrap_inner self.run() File "/home/humin/btgym/btgym/algorithms/runner.py", line 75, in run self._run() File "/home/humin/btgym/btgym/algorithms/runner.py", line 96, in _run self.queue.put(next(rollout_provider), timeout=600.0) File "/home/humin/btgym/btgym/algorithms/runner.py", line 238, in env_runner episode_stat = env.get_stat() # get episode statistic File "/home/humin/btgym/btgym/envs/backtrader.py", line 680, in get_stat if self._force_control_mode(): File "/home/humin/btgym/btgym/envs/backtrader.py", line 508, in _force_control_mode self.server_response = self.socket.recv_pyobj() File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/sugar/socket.py", line 491, in recv_pyobj msg = self.recv(flags) File "zmq/backend/cython/socket.pyx", line 693, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 727, in zmq.backend.cython.socket.Socket.recv File "zmq/backend/cython/socket.pyx", line 150, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc zmq.error.Again: Resource temporarily unavailable
I tried to reduce the number of workers but it looks irrelevant. This error is not interrupting the training process. Is this normal or should I be concerned about it?
My environment: Ubuntu 16.04. Python 3.5

@Kismuz
Copy link
Owner

Kismuz commented Dec 18, 2017

@huminpurin,
no it's not normal at all :)
which example gives you this error? Is it regular or time-to-time?

@huminpurin
Copy link
Author

huminpurin commented Dec 18, 2017

@Kismuz
First I tried example "async_btgym_workers". after I changed num_workers into 4 (I think this number is irrelevant with the error. I mentioned this because this is the only part i changed in example) and launched worker as the example
worker.daemon = False
worker.start()
workers.append(worker)
Then I got

Env.step: server unreachable with status: <receive_failed_due_to_connect_timeout>.
Env.step: server unreachable with status: <receive_failed_due_to_connect_timeout>.
Env.step: server unreachable with status: <receive_failed_due_to_connect_timeout>.
Env.step: server unreachable with status: <receive_failed_due_to_connect_timeout>.
Process BTgymServer-2:
Traceback (most recent call last):
File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "../btgym/server.py", line 334, in run
service_input = socket.recv_pyobj()
File "/home/humin/anaconda3/lib/python3.6/site-packages/zmq/sugar/socket.py", line 491, in recv_pyobj
msg = self.recv(flags)
File "zmq/backend/cython/socket.pyx", line 693, in zmq.backend.cython.socket.Socket.recv (zmq/backend/cython/socket.c:7683)
File "zmq/backend/cython/socket.pyx", line 727, in zmq.backend.cython.socket.Socket.recv (zmq/backend/cython/socket.c:7460)
File "zmq/backend/cython/socket.pyx", line 150, in zmq.backend.cython.socket._recv_copy (zmq/backend/cython/socket.c:2437)
File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy (zmq/backend/cython/socket.c:2344)
File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/socket.c:9823)
zmq.error.Again: Resource temporarily unavailable
Process BTgymServer-3:1:
Traceback (most recent call last):
File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "../btgym/server.py", line 433, in run
raise RuntimeError('Failed to assert Dataset is ready. Exiting.')
RuntimeError: Failed to assert Dataset is ready. Exiting.
Process Worker-3:
Traceback (most recent call last):
File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "", line 17, in run
obs = self.env.reset()
File "/home/humin/anaconda3/lib/python3.6/site-packages/gym/core.py", line 104, in reset
return self._reset()
File "../btgym/envs/backtrader.py", line 561, in _reset
self.env_response = self._step(0)
File "../btgym/envs/backtrader.py", line 658, in _step
raise ConnectionError(msg)
ConnectionError: Env.step: server unreachable with status: <receive_failed_due_to_connect_timeout>.

...worker_0 has joined.

BtgymServer: data_server unreachable with status: <receive_failed_due_to_connect_timeout>.
Process BTgymServer-6:1:
Traceback (most recent call last):
File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "../btgym/server.py", line 414, in run
raise ConnectionError(msg)
ConnectionError: BtgymServer: data_server unreachable with status: <receive_failed_due_to_connect_timeout>.
BtgymServer: data_server unreachable with status: <receive_failed_due_to_connect_timeout>.
Process BTgymServer-4:1:
Traceback (most recent call last):
File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "../btgym/server.py", line 414, in run
raise ConnectionError(msg)
ConnectionError: BtgymServer: data_server unreachable with status: <receive_failed_due_to_connect_timeout>.
BtgymServer: data_server unreachable with status: <receive_failed_due_to_connect_timeout>.
Process BTgymServer-5:1:
Traceback (most recent call last):
File "/home/humin/anaconda3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "../btgym/server.py", line 414, in run
raise ConnectionError(msg)
ConnectionError: BtgymServer: data_server unreachable with status: <receive_failed_due_to_connect_timeout>.

...worker_1 has joined.
...worker_2 has joined.
...worker_3 has joined.
data_master: environment closed.`

In annother example "a3c_random_on_synth_or_real_data_4_6", I got some similar error after launcher.run() as below

WARNING:worker_1:AAC_1: learn_rate: 0.000100, entropy_beta: 0.010317
Process BTgymServer-2:2:
Traceback (most recent call last):
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap
self.run()
File "/home/humin/btgym/btgym/server.py", line 334, in run
service_input = socket.recv_pyobj()
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/sugar/socket.py", line 491, in recv_pyobj
msg = self.recv(flags)
File "zmq/backend/cython/socket.pyx", line 693, in zmq.backend.cython.socket.Socket.recv
File "zmq/backend/cython/socket.pyx", line 727, in zmq.backend.cython.socket.Socket.recv
File "zmq/backend/cython/socket.pyx", line 150, in zmq.backend.cython.socket._recv_copy
File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy
File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc
zmq.error.Again: Resource temporarily unavailable
Process BTgymServer-3:1:
Traceback (most recent call last):
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap
self.run()
File "/home/humin/btgym/btgym/server.py", line 334, in run
service_input = socket.recv_pyobj()
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/sugar/socket.py", line 491, in recv_pyobj
msg = self.recv(flags)
File "zmq/backend/cython/socket.pyx", line 693, in zmq.backend.cython.socket.Socket.recv
File "zmq/backend/cython/socket.pyx", line 727, in zmq.backend.cython.socket.Socket.recv
File "zmq/backend/cython/socket.pyx", line 150, in zmq.backend.cython.socket._recv_copy
File "zmq/backend/cython/socket.pyx", line 145, in zmq.backend.cython.socket._recv_copy
File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc
zmq.error.Again: Resource temporarily unavailable
Process Worker-3:
Traceback (most recent call last):
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 468, in make_tensor_proto
str_values = [compat.as_bytes(x) for x in proto_values]
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 468, in
str_values = [compat.as_bytes(x) for x in proto_values]
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/util/compat.py", line 65, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got {'sample_num': <tf.Tensor 'local/on_policy_state_in_metadata_sample_num_pl:0' shape=(?,) dtype=float32>, 'first_row': <tf.Tensor 'local/on_policy_state_in_metadata_first_row_pl:0' shape=(?,) dtype=float32>, 'trial_num': <tf.Tensor 'local/on_policy_state_in_metadata_trial_num_pl:0' shape=(?,) dtype=float32>, 'type': <tf.Tensor 'local/on_policy_state_in_metadata_type_pl:0' shape=(?,) dtype=float32>}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap
self.run()
File "/home/humin/btgym/btgym/algorithms/worker.py", line 189, in run
**self.trainer_kwargs,
File "/home/humin/btgym/btgym/algorithms/aac.py", line 972, in init
**kwargs
File "/home/humin/btgym/btgym/algorithms/aac.py", line 423, in init
self.inc_step = self.global_step.assign_add(tf.shape(pi.on_state_in[list(pi.on_state_in.keys())[0]])[0])
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 271, in shape
return shape_internal(input, name, optimize=True, out_type=out_type)
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 295, in shape_internal
input_tensor = ops.convert_to_tensor(input)
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 836, in convert_to_tensor
as_ref=False)
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 926, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/humin/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 472, in make_tensor_proto
"supported type." % (type(values), values))
TypeError: Failed to convert object of type <class 'dict'> to Tensor. Contents: {'sample_num': <tf.Tensor 'local/on_policy_state_in_metadata_sample_num_pl:0' shape=(?,) dtype=float32>, 'first_row': <tf.Tensor 'local/on_policy_state_in_metadata_first_row_pl:0' shape=(?,) dtype=float32>, 'trial_num': <tf.Tensor 'local/on_policy_state_in_metadata_trial_num_pl:0' shape=(?,) dtype=float32>, 'type': <tf.Tensor 'local/on_policy_state_in_metadata_type_pl:0' shape=(?,) dtype=float32>}. Consider casting elements to a supported type.

@Kismuz
Copy link
Owner

Kismuz commented Dec 18, 2017

@huminpurin,
you have spotted my mistake! Please update btgym, it's corrected now,
here:

data_master = BTgymEnv(
    dataset=MyDataset,  # It is the only environment here for which dataset is required:
    port=5050,
    data_port=data_port,
    data_master=True,
    connect_timeout=10,  # set server connection timeout to 10 second (default is 60).
    verbose=0,
)

o = data_master.reset() # <=== CORRECTED HERE: fake reset() tells data_master to start data_server_process

 # Make and launch workers in separate processes:
for i in range(num_workers):
    # Worker environment configuration:
    env_config=dict(

@huminpurin
Copy link
Author

huminpurin commented Dec 18, 2017

@Kismuz
Thanks! It works great.
By the way there came another error

AttributeError: 'FigureCanvasGTKAgg' object has no attribute 'renderer'

As far as i searched it's concerned with matplotlib, though i have installed version 2.0.2 as reccomended. I uncommented fig.canvas.draw() line in plotting.py file under rending folder. After that things work perfect!

@Kismuz
Copy link
Owner

Kismuz commented Dec 19, 2017

thanks, updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants