-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker repos missing files #7
Comments
Hello,
I am not 100% that the gpu image works (i have to fix a bug where tf is installed without gpu support), however the cpu image works , it used for continuous integration. Edit: for the files, that is normal (cf stable baselines doc where the command is explained) |
The GPU image doesn`t work, error msg like: ... Resolved by in the container: Now it works! Thanks for setting up this repository and the docker images, very helpful. Merry Christmas! :-) |
Ok, I'll try to update the image then. |
Hello again, |
Hi!
Thanks for writing.
Looking at GitHub, neither the docker file nor the docker build file have been changed. Still tried…
docker@sddub:~/Downloads$ docker run -it --runtime=nvidia --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/root/code/stable-baselines,type=bind araffin/stable-baselines bash -c 'cd /root/code/stable-baselines/ && pytest tests/'
================================================ test session starts =================================================
platform linux -- Python 3.5.2, pytest-3.5.1, py-1.7.0, pluggy-0.6.0
rootdir: /root/code/stable-baselines, inifile:
plugins: cov-2.6.0
============================================ no tests ran in 0.00 seconds ============================================
ERROR: file not found: tests/
when going by terminal into image, cd
root@cedb2fb4ba37:/# ls
bin dev home lib64 mnt proc run srv tmp var
boot etc lib media opt root sbin sys usr
root@cedb2fb4ba37:/# cd root
root@cedb2fb4ba37:~# ls
code venv
root@cedb2fb4ba37:~# cd code
root@cedb2fb4ba37:~/code# ls
=0.10.9
I’m not sure if I’ve understood this right. Forgive me as I’m a novice to this. Was stable baselines supposed to be on board the docker container? It isn’t there. Was it supposed to be mapped/mounted to a stable baselines implementation on the host machine?
I looked thru the build file, there’s no mention of git stable baselines or similar there, only other dependencies.
Looking forward to hearing from you.
Kind regards
On 18 January 2019 at 00:55:47, Antonin RAFFIN (notifications@github.com) wrote:
Hello again,
I updated the docker image, it should be fixed now, can you confirm this?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Are you using this dockerfile: https://github.com/araffin/rl-baselines-zoo/blob/master/docker/Dockerfile.gpu ? Stable-Baselines is installed here The built image: https://hub.docker.com/r/araffin/rl-baselines-zoo EDIT: Oh, I see, since the beginning you seems to be using stable-baselines docker image instead of the rl zoo docker image. |
Hi! Thanks for replying so quickly.
Yes, erroneously, I was using stable-baselines. I’ll get the RL-zoo image and try it out.
Still, it means that the documentation of stable-baselines needs to be updated, or the Dockerfiles/images need to be changed.¨
Kind regards
On 23 January 2019 at 20:34:23, Antonin RAFFIN (notifications@github.com) wrote:
Are you using this dockerfile: https://github.com/araffin/rl-baselines-zoo/blob/master/docker/Dockerfile.gpu ?
Stable-Baselines is installed here
The built image: https://hub.docker.com/r/araffin/rl-baselines-zoo
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
The doc is already updated ... cf https://stable-baselines.readthedocs.io/en/master/guide/install.html#using-docker-images Otherwise, the following images contained all the dependencies for stable-baselines but not the stable-baselines package itself. They are made for development. |
Hi!
Have twice now tried to run this on Ubuntu 18 desktop, two different installations, once natively, once with Docker (run_docker_gpu.sh). The image I’m using is araffin/rl-baselines-zoo. With both installations I have this issue:
Fatal server error:
(EE) Cannot establish any listening sockets - Make sure an X server isn't already running(EE)
++ seq 1 10
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 1/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 1/10)
+ sleep 1
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ‘]'
and so forth (see below).
On the first installation, I thought I’d removed some lock files before. I’ve scoured the web for solutions to this issue, haven’t found anything. Would appreciate any ideas on how to address this.
Kind regards
REPOSITORY TAG IMAGE ID CREATED SIZE
araffin/rl-baselines-zoo latest c799b5127cf3 9 days ago 3.85GB
nvidia/cuda 9.0-base 74f5aea45cf6 2 months ago 134MB
sudo bash run_docker_gpu.sh python train.py --algo ppo2 --env CartPole-v1
Executing in the docker (gpu image):
python train.py --algo ppo2 --env CartPole-v1
+ export DISPLAY=:1
+ DISPLAY=:1
+ display=1
+ file=/tmp/.X11-unix/X1
+ sleep 1
+ Xvfb :1 -screen 0 1024x768x24
_XSERVTransSocketUNIXCreateListener: ...SocketCreateListener() failed
_XSERVTransMakeAllCOTSServerListeners: server already running
(EE)
Fatal server error:
(EE) Cannot establish any listening sockets - Make sure an X server isn't already running(EE)
++ seq 1 10
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 1/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 1/10)
+ sleep 1
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 2/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 2/10)
+ sleep 2
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 3/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 3/10)
+ sleep 3
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 4/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 4/10)
+ sleep 4
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 5/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 5/10)
+ sleep 5
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 6/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 6/10)
+ sleep 6
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 7/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 7/10)
+ sleep 7
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 8/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 8/10)
+ sleep 8
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 9/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 9/10)
+ sleep 9
+ for i in '$(seq 1 10)'
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Waiting for /tmp/.X11-unix/X1 to be created (try 10/10)'
Waiting for /tmp/.X11-unix/X1 to be created (try 10/10)
+ sleep 10
+ '[' -e /tmp/.X11-unix/X1 ']'
+ echo 'Timing out: /tmp/.X11-unix/X1 was not created'
Timing out: /tmp/.X11-unix/X1 was not created
+ exit 1
m@ub-desk:~/rl-baselines-zoo$
lshw
WARNING: you should run this program as super-user.
ub-desk
description: Computer
width: 64 bits
capabilities: smp vsyscall32
*-core
description: Motherboard
physical id: 0
*-memory
description: System memory
physical id: 0
size: 47GiB
*-cpu
product: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
vendor: Intel Corp.
physical id: 1
bus info: cpu@0
size: 1199MHz
capacity: 3800MHz
width: 64 bits
On 23 January 2019 at 20:48:33, Antonin RAFFIN (notifications@github.com) wrote:
The doc is already updated ... cf https://stable-baselines.readthedocs.io/en/master/guide/install.html#using-docker-images
"
If you are looking for docker images with stable-baselines already installed in it, we recommend using images from RL Baselines Zoo.
Otherwise, the following images contained all the dependencies for stable-baselines but not the stable-baselines package itself. They are made for development.
"
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Ok, did you try the cpu image? |
Hi!
Thanks for your speedy answer!
Tried the cpu image, same error.
Thanks for the hint about the post installation, did that.
So, it must be something with my system. Will have to figure that out.
Kind regards.
On 27 January 2019 at 18:26:21, Antonin RAFFIN (notifications@github.com) wrote:
Ok, did you try the cpu image?
If it does not work with the cpu image, I'm afraid the problem may come from your machine, because the cpu image is tested at each push on Travic CI.
What you are seeing is the entrypoint.sh trying to create a fake X server in order to be able to launch any env that requires one.
Btw, why do you have to use sudo? Did you follow the post-installation?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi again.
I modified entrypoint.sh, rebuilt the GPU image, ran the container:
ee@ub-desk:~/Desktop/docker$ bash run_docker_gpu.sh python train.py --algo ppo2 --env CartPole-v1
Executing in the docker (gpu image):
python train.py --algo ppo2 --env CartPole-v1
Traceback (most recent call last):
File "train.py", line 11, in <module>
from stable_baselines.common import set_global_seeds
File "/root/venv/lib/python3.5/site-packages/stable_baselines/__init__.py", line 4, in <module>
from stable_baselines.a2c import A2C
File "/root/venv/lib/python3.5/site-packages/stable_baselines/a2c/__init__.py", line 1, in <module>
from stable_baselines.a2c.a2c import A2C
File "/root/venv/lib/python3.5/site-packages/stable_baselines/a2c/a2c.py", line 5, in <module>
import tensorflow as tf
File "/root/venv/lib/python3.5/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/root/venv/lib/python3.5/site-packages/tensorflow/python/__init__.py", line 63, in <module>
from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin
File "/root/venv/lib/python3.5/site-packages/tensorflow/python/framework/framework_lib.py", line 104, in <module>
from tensorflow.python.framework.importer import import_graph_def
File "/root/venv/lib/python3.5/site-packages/tensorflow/python/framework/importer.py", line 32, in <module>
from tensorflow.python.framework import function
File "/root/venv/lib/python3.5/site-packages/tensorflow/python/framework/function.py", line 36, in <module>
from tensorflow.python.ops import resource_variable_ops
File "/root/venv/lib/python3.5/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 35, in <module>
from tensorflow.python.ops import variables
File "/root/venv/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 40, in <module>
class Variable(checkpointable.CheckpointableBase):
AttributeError: module 'tensorflow.python.training.checkpointable' has no attribute 'CheckpointableBase'
Pretty sure this is an error in the code, unrelated to the fake X server issue.
Do you have any suggestions?
Kind regards
On 27 January 2019 at 22:37:24, Bjørn A. Helland-Hansen (bjornprivate@runbox.com) wrote:
Hi!
Thanks for your speedy answer!
Tried the cpu image, same error.
Thanks for the hint about the post installation, did that.
So, it must be something with my system. Will have to figure that out.
Kind regards.
On 27 January 2019 at 18:26:21, Antonin RAFFIN (notifications@github.com) wrote:
Ok, did you try the cpu image?
If it does not work with the cpu image, I'm afraid the problem may come from your machine, because the cpu image is tested at each push on Travic CI.
What you are seeing is the entrypoint.sh trying to create a fake X server in order to be able to launch any env that requires one.
Btw, why do you have to use sudo? Did you follow the post-installation?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
ADDITIONAL INFO> EXTRACTS FROM BUILD LOG GPU IMAGEI edited the entrypoint.sh to not try and make a fake X server. Then I can build and run. I don
So docker build gave some warnings, but for some reason built the image anyway. I`m not sure that explains the issues in the previous entry or not. Now, every time I try to build a new Docker image, it just uses local files. Not sure how I can force it to redo from download, or if that has any merit at all. |
* Refactored benchmark.py now using f-string everywhere * More Cleaner parser.add_arguments * Added TODO. Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Have rl-baselines-zoo, GPU edition, pulled, not built.
Trying to run:
docker run -it --runtime=nvidia --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/root/code/stable-baselines,type=bind araffin/stable-baselines bash -c 'cd /root/code/stable-baselines/ && pytest tests/'
Am running:
sudo docker run --runtime=nvidia -it araffin/stable-baselines bash
Traversing into /root/code/, the directory is empty. It seems there is something wrong about the repository. Similar issues with the rl-zoo image.
I have little experience with docker, so I might well have missed something.
Kind regards
The text was updated successfully, but these errors were encountered: