title | layout |
---|---|
Apache Mesos - Nvidia GPU Support |
documentation |
Mesos 1.0.0 added first-class support for Nvidia GPUs.
The minimum required Nvidia driver version is 340.29
.
Getting up and running with GPU support in Mesos is fairly straightforward once you know the steps necessary to make it work as expected. On one side, this includes setting the necessary agent flags to enumerate GPUs and advertise them to the Mesos master. On the other side, this includes setting the proper framework capabilities so that the Mesos master will actually include GPUs in the resource offers it sends to a framework. So long as all of these constraints are met, accepting offers that contain GPUs and launching tasks that consume them should be just as straightforward as launching a traditional task that only consumes CPUs, memory, and disk.
Mesos exposes GPUs as a simple SCALAR
resource in the same
way it always has for CPUs, memory, and disk. That is, a resource
offer such as the following is now possible:
cpus:8; mem:1024; disk:65536; gpus:4;
However, unlike CPUs, memory, and disk, only whole numbers of GPUs
can be selected. If a fractional amount is selected, launching the
task will result in a TASK_ERROR
.
At the time of this writing, Nvidia GPU support is only available for tasks launched through the Mesos containerizer (i.e., no support exists for launching GPU capable tasks through the Docker containerizer). That said, the Mesos containerizer now supports running docker images natively, so this limitation should not affect most users.
Moreover, we mimic the support provided by nvidia-docker to automatically mount the proper Nvidia drivers and tools directly into your docker container. This means you can easily test your GPU-enabled docker containers locally and deploy them to Mesos with the assurance that they will work without modification.
In the following sections we walk through all of the flags and framework capabilities necessary to enable Nvidia GPU support in Mesos. We then show an example of setting up and running an example test cluster that launches tasks both with and without docker containers. Finally, we conclude with a step-by-step guide of how to install any necessary Nvidia GPU drivers on your machine.
The following isolation flags are required to enable Nvidia GPU support on an agent.
--isolation="filesystem/linux,cgroups/devices,gpu/nvidia"
The filesystem/linux
flag tells the agent to use Linux-specific
commands to prepare the root filesystem and volumes (e.g., persistent
volumes) for containers that require them. Specifically, it relies on
Linux mount namespaces to prevent the mounts of a container from being
propagated to the host mount table. In the case of GPUs, we require
this flag to properly mount certain Nvidia binaries (e.g.,
nvidia-smi
) and libraries (e.g., libnvidia-ml.so
) into a container
when necessary.
The cgroups/devices
flag tells the agent to restrict access to a
specific set of devices for each task that it launches (i.e., a subset
of all devices listed in /dev
). When used in conjunction with the
gpu/nvidia
flag, the cgroups/devices
flag allows us to grant /
revoke access to specific GPUs on a per-task basis.
By default, all GPUs on an agent are automatically discovered and sent to the Mesos master as part of its resource offer. However, it may sometimes be necessary to restrict access to only a subset of the GPUs available on an agent. This is useful, for example, if you want to exclude a specific GPU device because an unwanted Nvidia graphics card is listed alongside a more powerful set of GPUs. When this is required, the following additional agent flags can be used to accomplish this:
--nvidia_gpu_devices="<list_of_gpu_ids>"
--resources="gpus:<num_gpus>"
For the --nvidia_gpu_devices
flag, you need to provide a comma
separated list of GPUs, as determined by running nvidia-smi
on the
host where the agent is to be launched (see
below for instructions on what external
dependencies must be installed on these hosts to run this command).
Example output from running nvidia-smi
on a machine with four GPUs
can be seen below:
+------------------------------------------------------+
| NVIDIA-SMI 352.79 Driver Version: 352.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 0000:04:00.0 Off | 0 |
| N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 0000:05:00.0 Off | 0 |
| N/A 35C P0 39W / 150W | 34MiB / 7679MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 Off | 0000:83:00.0 Off | 0 |
| N/A 38C P0 40W / 150W | 34MiB / 7679MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 Off | 0000:84:00.0 Off | 0 |
| N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 97% Default |
+-------------------------------+----------------------+----------------------+
The GPU id
to choose can be seen in the far left of each row. Any
subset of these ids
can be listed in the --nvidia_gpu_devices
flag (i.e., all of the following values of this flag are valid):
--nvidia_gpu_devices="0"
--nvidia_gpu_devices="0,1"
--nvidia_gpu_devices="0,1,2"
--nvidia_gpu_devices="0,1,2,3"
--nvidia_gpu_devices="0,2,3"
--nvidia_gpu_devices="3,1"
etc...
For the --resources=gpus:<num_gpus>
flag, the value passed to
<num_gpus>
must equal the number of GPUs listed in
--nvidia_gpu_devices
. If these numbers do not match, launching the
agent will fail. This can sometimes be a source of confusion, so it
is important to emphasize it here for clarity.
Once you launch an agent with the flags above, GPU resources will be
advertised to the Mesos master along side all of the traditional
resources such as CPUs, memory, and disk. However, the master will
only forward offers that contain GPUs to frameworks that have
explicitly enabled the GPU_RESOURCES
framework capability.
The choice to make frameworks explicitly opt-in to this GPU_RESOURCES
capability was to keep legacy frameworks from accidentally consuming
non-GPU resources on GPU-capable machines (and thus preventing your GPU
jobs from running). It's not that big a deal if all of your nodes have
GPUs, but in a mixed-node environment, it can be a big problem.
An example of setting this capability in a C++-based framework can be seen below:
FrameworkInfo framework;
framework.add_capabilities()->set_type(
FrameworkInfo::Capability::GPU_RESOURCES);
GpuScheduler scheduler;
driver = new MesosSchedulerDriver(
&scheduler,
framework,
127.0.0.1:5050);
driver->run();
In this section we walk through two examples of configuring GPU-capable clusters and running tasks on them. The first example demonstrates the minimal setup required to run a command that consumes GPUs on a GPU-capable agent. The second example demonstrates the setup necessary to launch a docker container that does the same.
Note: Both of these examples assume you have installed the external dependencies required for Nvidia GPU support on Mesos. Please see below for more information.
The commands below show a minimal example of bringing up a GPU-capable
Mesos cluster on localhost
and executing a task on it. The required
agent flags are set as described above, and the mesos-execute
command has been told to enable the GPU_RESOURCES
framework
capability so it can receive offers containing GPU resources.
$ mesos-master \
--ip=127.0.0.1 \
--work_dir=/var/lib/mesos
$ mesos-agent \
--master=127.0.0.1:5050 \
--work_dir=/var/lib/mesos \
--isolation="cgroups/devices,gpu/nvidia"
$ mesos-execute \
--master=127.0.0.1:5050 \
--name=gpu-test \
--command="nvidia-smi" \
--framework_capabilities="GPU_RESOURCES" \
--resources="gpus:1"
If all goes well, you should see something like the following in the
stdout
out of your task:
+------------------------------------------------------+
| NVIDIA-SMI 352.79 Driver Version: 352.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 0000:04:00.0 Off | 0 |
| N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
The commands below show a minimal example of bringing up a GPU-capable
Mesos cluster on localhost
and running a docker container on it. The
required agent flags are set as described above, and the
mesos-execute
command has been told to enable the GPU_RESOURCES
framework capability so it can receive offers containing GPU
resources. Additionally, the required flags to enable support for
docker containers (as described here) have been
set up as well.
$ mesos-master \
--ip=127.0.0.1 \
--work_dir=/var/lib/mesos
$ mesos-agent \
--master=127.0.0.1:5050 \
--work_dir=/var/lib/mesos \
--image_providers=docker \
--executor_environment_variables="{}" \
--isolation="docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia"
$ mesos-execute \
--master=127.0.0.1:5050 \
--name=gpu-test \
--docker_image=nvidia/cuda \
--command="nvidia-smi" \
--framework_capabilities="GPU_RESOURCES" \
--resources="gpus:1"
If all goes well, you should see something like the following in the
stdout
out of your task.
+------------------------------------------------------+
| NVIDIA-SMI 352.79 Driver Version: 352.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 0000:04:00.0 Off | 0 |
| N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Any host running a Mesos agent with Nvidia GPU support MUST have a valid Nvidia kernel driver installed. It is also highly recommended to install the corresponding user-level libraries and tools available as part of the Nvidia CUDA toolkit. Many jobs that use Nvidia GPUs rely on CUDA and not including it will severely limit the type of GPU-aware jobs you can run on Mesos.
Note: The minimum supported version of CUDA is 6.5
.
The Nvidia kernel driver can be downloaded at the link below. Make sure to choose the proper model of GPU, operating system, and CUDA toolkit you plan to install on your host:
http://www.nvidia.com/Download/index.aspx
Unfortunately, most Linux distributions come preinstalled with an open
source video driver called Nouveau
. This driver conflicts with the
Nvidia driver we are trying to install. The following guides may prove
useful to help guide you through the process of uninstalling Nouveau
before installing the Nvidia driver on CentOS or Ubuntu.
http://www.dedoimedo.com/computers/centos-7-nvidia.html
http://www.allaboutlinux.eu/remove-nouveau-and-install-nvidia-driver-in-ubuntu-15-04/
After installing the Nvidia kernel driver, you can follow the instructions in the link below to install the Nvidia CUDA toolkit:
http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/
In addition to the steps listed in the link above, it is highly
recommended to add CUDA's lib
directory into your ldcache
so that
tasks launched by Mesos will know where these libraries exist and link
with them properly.
sudo bash -c "cat > /etc/ld.so.conf.d/cuda-lib64.conf << EOF
/usr/local/cuda/lib64
EOF"
sudo ldconfig
If you choose not to add CUDAs lib
directory to your ldcache
,
you MUST add it to every task's LD_LIBRARY_PATH
that requires
it.
Note: This is not the recommended method. You have been warned.
Once the kernel driver has been installed, you can make sure
everything is working by trying to run the bundled nvidia-smi
tool.
nvidia-smi
You should see output similar to the following:
Thu Apr 14 11:58:17 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.79 Driver Version: 352.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 0000:04:00.0 Off | 0 |
| N/A 34C P0 39W / 150W | 34MiB / 7679MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 0000:05:00.0 Off | 0 |
| N/A 35C P0 39W / 150W | 34MiB / 7679MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 Off | 0000:83:00.0 Off | 0 |
| N/A 38C P0 38W / 150W | 34MiB / 7679MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 Off | 0000:84:00.0 Off | 0 |
| N/A 34C P0 38W / 150W | 34MiB / 7679MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
To verify your CUDA installation, it is recommended to go through the instructions at the link below:
http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/#install-samples
Finally, you should get a developer to run Mesos's Nvidia GPU-related unit tests on your machine to ensure that everything passes (as described below).
At the time of this writing, the following Nvidia GPU specific unit tests exist on Mesos:
DockerTest.ROOT_DOCKER_NVIDIA_GPU_DeviceAllow
DockerTest.ROOT_DOCKER_NVIDIA_GPU_InspectDevices
NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_VerifyDeviceAccess
NvidiaGpuTest.ROOT_INTERNET_CURL_CGROUPS_NVIDIA_GPU_NvidiaDockerImage
NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_FractionalResources
NvidiaGpuTest.NVIDIA_GPU_Discovery
NvidiaGpuTest.ROOT_CGROUPS_NVIDIA_GPU_FlagValidation
NvidiaGpuTest.NVIDIA_GPU_Allocator
NvidiaGpuTest.ROOT_NVIDIA_GPU_VolumeCreation
NvidiaGpuTest.ROOT_NVIDIA_GPU_VolumeShouldInject)
The capitalized words following the '.'
specify test filters to
apply when running the unit tests. In our case the filters that apply
are ROOT
, CGROUPS
, and NVIDIA_GPU
. This means that these tests
must be run as root
on Linux machines with cgroups
support that
have Nvidia GPUs installed on them. The check to verify that Nvidia
GPUs exist is to look for the existence of the Nvidia System
Management Interface (nvidia-smi
) on the machine where the tests are
being run. This binary should already be installed if the instructions
above have been followed correctly.
So long as these filters are satisfied, you can run the following to execute these unit tests:
[mesos]$ GTEST_FILTER="" make -j check
[mesos]$ sudo bin/mesos-tests.sh --gtest_filter="*NVIDIA_GPU*"