-
Notifications
You must be signed in to change notification settings - Fork 392
Open
Description
I'm trying to have nvidia driver available during build which works with the default build command but not when using buildkit.
I have this minimal Dockerfile
FROM nvidia/cuda:11.1-base
RUN ls /dev/nvidia*
RUN nvidia-smi Which I can build as follow:
❯ docker build -f Dockerfile . --no-cache
Sending build context to Docker daemon 5.167kB
Step 1/3 : FROM nvidia/cuda:11.1-base
---> 287475453634
Step 2/3 : RUN ls /dev/nvidia*
---> Running in e8c12b8a398b
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia0
/dev/nvidiactl
Removing intermediate container e8c12b8a398b
---> 17884a1f0b6a
Step 3/3 : RUN nvidia-smi
---> Running in 4a5bbd2337c0
Thu Sep 23 14:08:59 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.119.03 Driver Version: 450.119.03 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 35C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Removing intermediate container 4a5bbd2337c0
---> 308dfa443901
Successfully built 308dfa443901
But when using buildkit I get
❯ DOCKER_BUILDKIT=1 docker build -f Dockerfile .
[+] Building 0.4s (5/6)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 116B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:11.1-base 0.0s
=> CACHED [1/3] FROM docker.io/nvidia/cuda:11.1-base 0.0s
=> ERROR [2/3] RUN ls /dev/nvidia* 0.3s
------
> [2/3] RUN ls /dev/nvidia*:
NVIDIA/nvidia-container-runtime#5 0.299 ls: cannot access '/dev/nvidia*': No such file or directory
------
executor failed running [/bin/sh -c ls /dev/nvidia*]: exit code: 2
Then I figured out I have to use RUN --security=insecure and use docker buildx as follows
# Dockerfile.buildkit
# syntax = docker/dockerfile:experimental
FROM nvidia/cuda:11.1-base
RUN --security=insecure nvidia-smiI create the builder
❯ docker buildx create --driver docker-container --name local --buildkitd-flags '--allow-insecure-entitlement security.insecure' --use
local
then I build the image as follows
❯ docker buildx build -f Dockerfile.buildkit . --allow security.insecure
WARN[0000] No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
[+] Building 0.7s (9/9) FINISHED
=> [internal] load build definition from Dockerfile.buildkit 0.0s
=> => transferring dockerfile: 150B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> resolve image config for docker.io/docker/dockerfile:experimental 0.1s
=> CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:600e5c62eedff338b3f7a0850beb7c05866e0ef27b2d2e8c02aa468e78496ff5 0.0s
=> => resolve docker.io/docker/dockerfile:experimental@sha256:600e5c62eedff338b3f7a0850beb7c05866e0ef27b2d2e8c02aa468e78496ff5 0.0s
=> [internal] load .dockerignore 0.0s
=> [internal] load build definition from Dockerfile.buildkit 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:11.1-base 0.1s
=> CACHED [1/2] FROM docker.io/nvidia/cuda:11.1-base@sha256:c6bb47a62ad020638aeaf66443de9c53c6dc8a0376e97b2d053ac774560bd0fa 0.0s
=> => resolve docker.io/nvidia/cuda:11.1-base@sha256:c6bb47a62ad020638aeaf66443de9c53c6dc8a0376e97b2d053ac774560bd0fa 0.0s
=> ERROR [2/2] RUN --security=insecure nvidia-smi 0.1s
------
> [2/2] RUN --security=insecure nvidia-smi:
NVIDIA/nvidia-container-runtime#8 0.074 /bin/sh: 1: nvidia-smi: not found
------
Dockerfile.buildkit:3
--------------------
1 | # syntax = docker/dockerfile:experimental
2 | FROM nvidia/cuda:11.1-base
3 | >>> RUN --security=insecure nvidia-smi
4 |
--------------------
error: failed to solve: process "/bin/sh -c nvidia-smi" did not complete successfully: exit code: 127
I know the insecure flag works when I use another command that requires privilege (i.e.: mount --bind /dev /tmp)
This is my daemon.json
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}The output of docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.6.1-docker)
scan: Docker Scan (Docker Inc., v0.8.0)
Server:
Containers: 10
Running: 1
Paused: 0
Stopped: 9
Images: 191
Server Version: 20.10.8
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia
Default Runtime: nvidia
Init Binary: docker-init
containerd version: e25210fe30a0a703442421b0f60afac609f950a3
runc version: v1.0.1-0-g4144b63
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-1056-aws
Operating System: Ubuntu 18.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 59.86GiB
Name: ip-172-31-13-189
ID: JSBF:DURT:RVBM:P7XL:YIWL:IKJU:3WIS:25N6:UH72:ALJA:XDRO:R35Q
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
The output of nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I0923 14:17:11.705485 3265 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df)
I0923 14:17:11.705537 3265 nvc.c:346] using root /
I0923 14:17:11.705556 3265 nvc.c:347] using ldcache /etc/ld.so.cache
I0923 14:17:11.705571 3265 nvc.c:348] using unprivileged user 1000:1000
I0923 14:17:11.705600 3265 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0923 14:17:11.705791 3265 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
W0923 14:17:11.711421 3266 nvc.c:269] failed to set inheritable capabilities
W0923 14:17:11.711475 3266 nvc.c:270] skipping kernel modules load due to failure
I0923 14:17:11.711714 3267 driver.c:101] starting driver service
I0923 14:17:11.715296 3265 nvc_info.c:676] requesting driver information with ''
I0923 14:17:11.717376 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.450.119.03
I0923 14:17:11.717652 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.450.119.03
I0923 14:17:11.717748 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.450.119.03
I0923 14:17:11.717819 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.450.119.03
I0923 14:17:11.717897 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.450.119.03
I0923 14:17:11.718008 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.450.119.03
I0923 14:17:11.718111 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.450.119.03
I0923 14:17:11.718180 3265 nvc_info.c:171] skipping /usr/lib/x86_64-linux-gnu/libnvidia-nscq-dcgm.so.450.51.06
I0923 14:17:11.718256 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.450.119.03
I0923 14:17:11.718322 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.450.119.03
I0923 14:17:11.718430 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.450.119.03
I0923 14:17:11.718548 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.450.119.03
I0923 14:17:11.718625 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.450.119.03
I0923 14:17:11.718713 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.450.119.03
I0923 14:17:11.718794 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.450.119.03
I0923 14:17:11.718902 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.450.119.03
I0923 14:17:11.719021 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.450.119.03
I0923 14:17:11.719108 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.450.119.03
I0923 14:17:11.719184 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.450.119.03
I0923 14:17:11.719293 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.450.119.03
I0923 14:17:11.719381 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.450.119.03
I0923 14:17:11.719472 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.450.119.03
I0923 14:17:11.719897 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.450.119.03
I0923 14:17:11.720133 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.450.119.03
I0923 14:17:11.720217 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.450.119.03
I0923 14:17:11.720300 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.450.119.03
I0923 14:17:11.720380 3265 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.450.119.03
W0923 14:17:11.720453 3265 nvc_info.c:350] missing library libnvidia-nscq.so
W0923 14:17:11.720469 3265 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so
W0923 14:17:11.720491 3265 nvc_info.c:354] missing compat32 library libnvidia-ml.so
W0923 14:17:11.720501 3265 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
W0923 14:17:11.720516 3265 nvc_info.c:354] missing compat32 library libnvidia-nscq.so
W0923 14:17:11.720534 3265 nvc_info.c:354] missing compat32 library libcuda.so
W0923 14:17:11.720540 3265 nvc_info.c:354] missing compat32 library libnvidia-opencl.so
W0923 14:17:11.720547 3265 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so
W0923 14:17:11.720561 3265 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so
W0923 14:17:11.720567 3265 nvc_info.c:354] missing compat32 library libnvidia-allocator.so
W0923 14:17:11.720582 3265 nvc_info.c:354] missing compat32 library libnvidia-compiler.so
W0923 14:17:11.720600 3265 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
W0923 14:17:11.720612 3265 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so
W0923 14:17:11.720628 3265 nvc_info.c:354] missing compat32 library libnvidia-encode.so
W0923 14:17:11.720642 3265 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so
W0923 14:17:11.720658 3265 nvc_info.c:354] missing compat32 library libnvcuvid.so
W0923 14:17:11.720667 3265 nvc_info.c:354] missing compat32 library libnvidia-eglcore.so
W0923 14:17:11.720674 3265 nvc_info.c:354] missing compat32 library libnvidia-glcore.so
W0923 14:17:11.720686 3265 nvc_info.c:354] missing compat32 library libnvidia-tls.so
W0923 14:17:11.720693 3265 nvc_info.c:354] missing compat32 library libnvidia-glsi.so
W0923 14:17:11.720710 3265 nvc_info.c:354] missing compat32 library libnvidia-fbc.so
W0923 14:17:11.720725 3265 nvc_info.c:354] missing compat32 library libnvidia-ifr.so
W0923 14:17:11.720740 3265 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
W0923 14:17:11.720755 3265 nvc_info.c:354] missing compat32 library libnvoptix.so
W0923 14:17:11.720774 3265 nvc_info.c:354] missing compat32 library libGLX_nvidia.so
W0923 14:17:11.720792 3265 nvc_info.c:354] missing compat32 library libEGL_nvidia.so
W0923 14:17:11.720799 3265 nvc_info.c:354] missing compat32 library libGLESv2_nvidia.so
W0923 14:17:11.720807 3265 nvc_info.c:354] missing compat32 library libGLESv1_CM_nvidia.so
W0923 14:17:11.720818 3265 nvc_info.c:354] missing compat32 library libnvidia-glvkspirv.so
W0923 14:17:11.720823 3265 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
I0923 14:17:11.721761 3265 nvc_info.c:276] selecting /usr/bin/nvidia-smi
I0923 14:17:11.721799 3265 nvc_info.c:276] selecting /usr/bin/nvidia-debugdump
I0923 14:17:11.721837 3265 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
I0923 14:17:11.721887 3265 nvc_info.c:276] selecting /usr/bin/nv-fabricmanager
I0923 14:17:11.721930 3265 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-control
I0923 14:17:11.721972 3265 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-server
I0923 14:17:11.722021 3265 nvc_info.c:438] listing device /dev/nvidiactl
I0923 14:17:11.722043 3265 nvc_info.c:438] listing device /dev/nvidia-uvm
I0923 14:17:11.722049 3265 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
I0923 14:17:11.722056 3265 nvc_info.c:438] listing device /dev/nvidia-modeset
W0923 14:17:11.722103 3265 nvc_info.c:321] missing ipc /var/run/nvidia-persistenced/socket
W0923 14:17:11.722155 3265 nvc_info.c:321] missing ipc /var/run/nvidia-fabricmanager/socket
W0923 14:17:11.722193 3265 nvc_info.c:321] missing ipc /tmp/nvidia-mps
I0923 14:17:11.722213 3265 nvc_info.c:733] requesting device information with ''
I0923 14:17:11.729066 3265 nvc_info.c:623] listing device /dev/nvidia0 (GPU-f82fe76f-d403-d34b-8b80-9a9316b19b18 at 00000000:00:1e.0)
NVRM version: 450.119.03
CUDA version: 11.0
Device Index: 0
Device Minor: 0
Model: Tesla K80
Brand: Tesla
GPU UUID: GPU-f82fe76f-d403-d34b-8b80-9a9316b19b18
Bus Location: 00000000:00:1e.0
Architecture: 3.7
I0923 14:17:11.729156 3265 nvc.c:423] shutting down library context
I0923 14:17:11.729661 3267 driver.c:163] terminating driver service
I0923 14:17:11.730042 3265 driver.c:203] driver service terminated successfully
Looks like the builder is not using the nvidia runtime. What am I missing?
pktiuk and jwmdykes
Metadata
Metadata
Assignees
Labels
No labels