The nvidia runtime is only supported on Linux, not Mac or Windows. If you start with Docker 19.03, nvidia gpus are supported by default, by doing docker run --gpus all
. However, it doesn't support the older plugin syntax docker run --runtime=nvidia
, and docker-compose doesn't support the --gpus
arg yet. Here's the steps to make it work
sudo apt-get install nvidia-container-runtime
- Add a daemon config file:
sudo tee /etc/docker/daemon.json <<EOF
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
sudo pkill -SIGHUP dockerd
- Defaults to using all gpus, but the env var
NVIDIA_VISIBLE_DEVICES
can change it.
Clone the repo and start the containers as in the [readme][Readme.md].
- If you've made changes to the dockerfiles, rebuild the containers:
docker-compose up -d --build
- Check the logs
docker logs hpccloud-services_ansible_1
anddocker-compose logs ansible
are roughly equivalentdocker logs -f hpccloud-services_ansible_1
will follow the log, showing output from an active container.
Start a simulation, and visualize
- Login as demo/letmein
- Start a project using the
+
- any name you like
- Type: PyFR
- Mesh: couette_flow_2d.msh (from
sample_data
folder) - Ini: couette_flow_2d.ini (from
sample_data
folder) - Click 'Create Project'
- Start a simulation using the
+
- any name you like
- Ini: leave blank, it will using the one from the project
- Set simulation params
- Click on the simulation summary line in the list
- click
Input
,Solver
,- Now default: change
Shock capturing
tonone
- Now default: change
Final time
to1.0
or similar, for shorter runtime.
- Now default: change
- click
Simulation
on the left- Leave all defaults (unless intend to run on AWS, then
Server Type
becomesEC2
) - click
run simulation
- Leave all defaults (unless intend to run on AWS, then
- Monitor simulation
Jobs
shows the running sim. Gear icon expands to show details and logs- expect
complete(6) running (2)
while the main sim is running, for a few minutes. Then output is uploaded. - Final will show
complete (12)
andOutput Files files(29)
- click
Visualize
- ParaViewWeb Visualizer
- leave default params, click
Start visualization
- Should show
Jobs complete(4) running(2)
, clickVisualize
- Visualizer shows a single slice with simulation data over time.
- click
couette_flow_2d
and change fromsolid color
to(p2) Velocity
- on colormap, click gear then double arrow on left, and clock at the bottow, to get data range over all time steps.
- in titlebar, click play icon to see animation of flow over time.
- nvidia runtime should animate noticably faster.
- click
- leave default params, click
docker container ls
shows running containersdocker image ls
shows the images containers are launched from, including intermediatesdocker volume ls
shows shared volumes build images:docker-compose up -d --build
- builds all images in the
docker-compose.xml
, using caching.
- builds all images in the
docker build --rm --no-cache --file docker/celery/Dockerfile -t kitware/hpccloud:celery .
- builds a single image, tags with the same tag as above command, no-cache forces a complete rebuild
- celery is not in the top-level
docker-compose.xml
, so when it's dockerfile is changed, it needs to be built manually. access containers:
docker run --rm --entrypoint bash -ti 84d3be2cfd70
- start the container, over-ride the entrypoint so you get a command prompt. On windows gitbash, add
winpty
in front if needed.
- start the container, over-ride the entrypoint so you get a command prompt. On windows gitbash, add
docker exec -ti hpccloud-services_command_1 bash
- get a command prompt for an already-running container. clean start:
docker-compose down
docker volume prune
, sayy
es- removes the persistent volumes not in use by containers, so if all containers are stopped, removes everything.
- this resets the girder storage, so you have to re-create the project and simulation from scratch, as described above
- I've needed this a few times when the ansible container says
FAILED The error was: 'dict object' has no attribute '_id'
on theWait for compute cluster
step. - Useful tool, lazydocker
Ansible
- after
docker-compose up
, checkingdocker-compose logs ansible
is critical, to make sure ansible has completed the configuration of all containers.- use
docker volume prune
if the logs saysFAILED The error was: 'dict object' has no attribute '_id'
on theWait for compute cluster
step.
- use