Useful docker images for deep learning and machine learning in general.
Current version: Pytorch, tensorflow-gpu, and usual useful python libraries (numpy, pandas, dask, sklearn, matplotlib, seaborn ray, optuna, hydra, lightning...)
To build images (don't forget the final dot!):
docker build -f ./Dockerfile -t mayhem-pegasus:<TAG> .
To run a jupyter lab session and mounting a shared folder (host: ./mnt
will be paired with /io
in the container) :
docker run -u $(id -u):$(id -g) -v $PWD/mnt:/io -p 0.0.0.0:<local jupyter port>:8888 -p 0.0.0.0:<local tensorboard port>:6006 -it --gpus <device> --rm mayhem-pegasus:<TAG>
where <device>
is the device number. If you want to use multiple devices, see usage examples.
I you are lazy, just put the startup script start-mayhem.sh
somewhere in your PATH
and chmod +x
it. Read the script and edit it before using it to match the ports you are using!
--gpus 0
to use only device 0
--gpus '"device=1,2"'
to use devices 1 and 2
--gpus all
to use all devices
The provider docker file / container suppose NVidia GPU usage. You need to install nvidia-docker2 to be able to use --runtime=nvidia
, see the following installation instructions.
Autocompletion in Jupyter might be veeeery slow or inactive. This might be caused by issues with jedi
(see many issues in jupyter repo on this matter). To deactivate jedi
and get autocompletion jupyter lab, you can run the following magic:
%config Completer.use_jedi = False
Jupyterlab extensions providing alternative autocompletion do not play way with docker for now. I'm still investigating this issue, suggestions are welcome.
Ray default ip and workdir do not play well with docker. To avoid any issue, you should override temp_dir
and webui_host
parameters in init
ray at the beginning of you ray jobs:
import ray
from ray import tune
ray.init(temp_dir='/io/ray_log_dir', webui_host='0.0.0.0')
analysis = tune.run(..., local_dir='/io/ray_results_dir', resources_per_trial={'gpu': 1})
Default port of the ray UI (8265
) is exposed by docker. Note the this UI can be an easy way to manage tensorboard and the cluster or server ressources (CPU / RAM / IO ; no GPU monitoring yet sadly).
To monitor experiments, go to the tune tab enter the tune local_dir location in "Tune Log Directory Here:", submit. Embedded tensorboard from this interface is not functional at the moment, see other options below to run tensorboard.
If you are not using Ray, you can launch tensorboard manually using Jupyter Lab terminal
tensorboard --host 0.0.0.0 --logdir /io/path_to_logdir
and access it through tensorboard default port (6006) exposed by this docker image. Another solution consist in using notebook-embedded tensorboard by using magic commands
%load_ext tensorboard
%tensorboard --logdir /io/results --host=0.0.0.0
The resulting cell will load an iframe displaying tensorboard. If you delete the cell result, tensorboard will still be running in the background. To display it again, you can run
from tensorboard import notebook
notebook.list() # View open TensorBoard instances
# Control TensorBoard display. If no port is provided,
# the most recently launched TensorBoard is used
notebook.display(port=6006, height=1000)
To kill tensorboard, use the kill command (in shell or in a notebook cell with the !
prefix)
kill [tensorboard_pid]
The pid can be found in the output of notebook.list()
. For more information, please refer to tensorflow documentation.