Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run the script showed in start.sh but failed。 #16

Open
zsdfaker opened this issue Sep 22, 2020 · 3 comments
Open

run the script showed in start.sh but failed。 #16

zsdfaker opened this issue Sep 22, 2020 · 3 comments

Comments

@zsdfaker
Copy link

My working environment is as follows:
gtx 1650 ×1
core i7-9700 ×1
tensorflow==1.14.0
gym[atari]
numpy
tensorboardX
opencv-python
windows 10
I run the script showed in start.sh but failed and show me:
2020-09-22 09:24:12.969790: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:1
2020-09-22 09:24:12.972181: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:2
2020-09-22 09:24:12.975636: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:3
2020-09-22 09:24:12.977599: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:4
2020-09-22 09:24:12.979766: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:5
2020-09-22 09:24:12.981744: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:6
2020-09-22 09:24:12.983991: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:7
2020-09-22 09:24:12.986332: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:8
2020-09-22 09:24:12.988586: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:9
2020-09-22 09:24:12.992850: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:10
2020-09-22 09:24:12.995346: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:11
2020-09-22 09:24:12.997290: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:12
2020-09-22 09:24:12.999624: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:13
2020-09-22 09:24:13.002224: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:14
2020-09-22 09:24:13.005818: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:15
2020-09-22 09:24:13.007926: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:16
2020-09-22 09:24:13.013232: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:17
2020-09-22 09:24:13.015965: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:18
2020-09-22 09:24:13.018205: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:19
2020-09-22 09:24:13.020086: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:20
2020-09-22 09:24:13.022604: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:21
2020-09-22 09:24:13.026224: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:22
2020-09-22 09:24:13.028211: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:23
2020-09-22 09:24:13.030634: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:24
2020-09-22 09:24:13.032666: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:25
2020-09-22 09:24:13.035993: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:26
2020-09-22 09:24:13.037987: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:27
2020-09-22 09:24:13.040183: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:28
2020-09-22 09:24:13.042793: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:29
2020-09-22 09:24:13.044992: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:30
2020-09-22 09:24:13.046805: I tensorflow/core/distributed_runtime/master.cc:268] CreateSession still waiting for response from worker: /job:actor/replica:0/task:31

@chagmgang
Copy link
Collaborator

I think you may put command like

sh start.sh

But I consider that you have to command like

nohup sh start.sh

Thank you.

@sunchipsster1
Copy link

Hello! I also ran into this problem. I ran the code for 24 hours on a gpu server and output = CreateSession still waiting for response from worker: /job:actor/replica:0/task:17... etc. was present the whole time. Does anyone know how to fix this? Thanks for the kind help!

@chagmgang
Copy link
Collaborator

I think you maybe not running 17th actor task.
Rerun the 17th task by command

python trainer_invader.py --num_actors=32 --task=17 --batch_size=32 --queue_size=128 --trajectory=20 --learning_frame=1000000000 --start_learning=0.0006 --end_learning=0.0 --discount_factor=0.99 --entropy_coef=0.05 --baseline_loss_coef=1.0 --gradient_clip_norm=40.0 --job_name=actor --reward_clipping=abs_one --lstm_size=256 &

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants