Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple issues with docker, python versions and build script #29

Open
Holt59 opened this issue Mar 14, 2019 · 3 comments
Open

Multiple issues with docker, python versions and build script #29

Holt59 opened this issue Mar 14, 2019 · 3 comments
Assignees

Comments

@Holt59
Copy link

Holt59 commented Mar 14, 2019

I am trying to test my environment with docker on a GCloud VM.

I noticed multiple issues while trying to build and run the docker:

1. The only version working with the README tutorial is python 3.6

This is kind of annoying:

  • 3.5 does not work due to aicrowd-repo2docker using f-strings.
  • 3.7 does not work because ml-agents cannot be installed with 3.7

Python 3.7 can be used, but aicrowd-repo2docke must be installed without using requirements.txt.

A note should be added to the README. I have python 3.5 by default, and I compiled python 3.7 from scratch thinking it would work, just to notice it does not with ml-agents... Had I known, I would have built python 3.6.

2. Small issue with build.sh

This:

./build.sh

...does not work if the shell is not bash-compliant (e.g. fish). A shabang should be added, or the line should be changed bash build.sh.

3. Cannot run the docker containers if there are agents running

The docker containers cannot be launch if there are agents running aside on the same host due to the --network=host. And the worker ID cannot be changed without modifying the source code of run.py.

4. Cannot run the docker containers

Even after modifying the worker ID or trying to put the two dockers on a docker network --network=ot-network, the agent fails to launch with a unity time-out exception:

mlagents_envs.exception.UnityTimeOutException: The Unity environment took too long to respond. Make sure that :
         The environment does not need user interaction to launch
         The Academy and the External Brain(s) are attached to objects in the Scene
         The environment and the Python interface have compatible versions.

I am using a GCloud VM created following the tutorial. I tried running sudo /usr/bin/X :0 and adding --env DISPLAY=:0 to the docker command line but it did not work.

@awjuliani
Copy link
Contributor

Adding @harperj who may be able to provide some context. These are good recommendations that we will take into account in the next round of the contest.

@harperj
Copy link
Contributor

harperj commented Mar 26, 2019

  1. You could run aicrowd-repo2docker and your ml-agents script using different versions of Python. I believe we should be able to relax the requirement of ml-agents environments to allow Python 3.7; I'll bring that up with the team. The reason ml-agents hasn't supported Python 3.7 is that until recently Tensorflow hasn't supported Python 3.7.

  2. Agreed.

  3. You're intended to change the run.py script -- if you're running agents on the host as well as in a docker container you're doing something outside of what the guide is explaining (how to test out evaluation) and I'd expect that anyone would want to customize the run script in this case.

  4. This could be a number of issues, but one thing to check is the Player.log file created by Unity. You can find it under ~/.config/. Could you share that?

@Holt59
Copy link
Author

Holt59 commented Mar 27, 2019

@harperj

I've solved the problem, and I don't have the log file anymore. I think the issue had something to do with worker_id when in evaluation mode.

I'm not saying that anything should be fixed regarding this in the docker examples, but it would be great if some information could be added to the README. It took me some time to realize that when OTC_EVALUATION_ENABLED is true, the behavior is different.

In particular, environment_filename is set to None automatically when is_grading() is True but worker_id is not set to 0, which caused me some headache until I checked the actual code... (I was trying to set the worker ID, but this was throwing some strange exception).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants