New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Couldn't start socket communication because worker number 0 is still in use. #1505
Comments
|
I tried to reproduce your error on OSX but the environment closes as expected on version v0.6. What version are you using ? If you are using v0.6, this could be a Linux specific error. |
|
It's the latest version. I just download and installed it yesterday. Any advice? |
|
Hi @taylerallen6 , in your line |
|
I understand that but opening multiple workers isn't a solution. Then I'd just have multiple workers that dont get closed out. |
|
OK, I was able to reproduce this bug. This is a bug only happens under Linux platform, I've tried the same under Mac and this bug doesn't occur. Way to reproduce: When running
I will log the bug for now. |
|
Ok thanks |
|
@xiaomaogy |
|
@xiaomaogy Hi! I tried to run the PPO2 through gym on Windows 10, but also had the problem. |
|
Hi all, this is actually a normal behavior with sockets on most platforms. When a socket is closed, it enters a TIME_WAIT state: https://stackoverflow.com/questions/337115/setting-time-wait-tcp. By default, Ubuntu sets this time to 60 seconds. So a minute later, the socket is released from TIME_WAIT and you'll be able to open the environment again. We are looking for a workaround for ml-agents. There are ways to shorten TIME_WAIT in your system. Also, one workaround for Linux is to add |
|
@ervteng Thanks a lot! However, I still have the same problem when running the code after maybe 20hours. It reminded me - “Couldn't start socket communication because worker number 0 is still in use”. |
|
Hi @zheyangshi, what library are you using to run ML-Agents? Also, I've edited my comment to add a workaround for Ubuntu specifically. You can give it a try. |
|
@ervteng Thank you very much, and I will try it later. I just directly ran the code about PPO2 on https://github.com/Unity-Technologies/ml-agents/blob/master/gym-unity/README.md , and received the error on Win10. What's more, I am a little confused because the code of DQN on the same page can be run successfully. |
|
Hey @zheyangshi , were you able to run the code? It seems that the issue isn't with ML-Agents if DQN does work. Does PPO2 run on, e.g. Cartpole or Atari? |
|
Hi @ervteng, I tried it on this morning and It still cannot work as expected. ps: By running "python -m baselines.run --alg=ppo2 --env=PongNoFrameskip-v4", ppo2 works under this circumstance. |
|
Hey @zheyangshi, I'm assuming you've rebooted your machine since last week. If not, that may help. Also, how many parallel environments are you running with ppo2? Make sure the rank param is being incremented properly in your run code. |
|
Hello @ervteng, actually I have rebooted the computer but I still got the same result. I also change the number of environments from 1 to 4, and it seemed not to be worked. What'more, would you mind explaining what is the meaning of “Make sure the rank param is being incremented properly in your run code.” Thanks a lot! |
|
my workaround is to store last id in the file. The problem is solved |
|
@ivankush, thank you very much! I think it could be a nice solution for me. |
|
Hey @zheyangshi, the baselines PPO2 code uses @ivankush, thanks for the workaround! That should work as well. Anything that will ensure the worker_id is unique between environments. |
|
@ervteng Thanks for your kind reply. Finally, I got your points. |
|
Hi all. We have recently reworked the trainer code as of v0.8. Due to inactivity I am closing this issue. Please let us know if you still run into this issue in the latest version. |
|
I am still experiencing this issue in ubuntu 18.04, ml-agents 0.9.1. I have to wait sometime before I re-run mlagents-learn. |
|
I am still experiencing this issue as well on Linux Build. Works when I change the base-port but is there any way to manually force close the previous env? |
|
+1 |
|
Got the same issue, is it resolved? |
|
I updated ml-agents and am using Windows now and I have not been running into this issue |
|
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
I am using Ubuntu 16.04. While going through the getting-started.ipynb I can run the script once and it work just fine, but the second time I try to run it I get this error:
It's like the env.close() isn't actually closing it.
Here is the script I am running:
../envs/3dball1 is of course the executable for my platform. Its just the 3DBall scene with a 3DBallLearning brain. No changes. Any ideas?
The text was updated successfully, but these errors were encountered: