|
| 1 | +--- |
| 2 | +title: Apache Mesos - Cgroups v2 Support |
| 3 | +layout: documentation |
| 4 | +--- |
| 5 | + |
| 6 | +# Using Mesos on systems with Cgroups2 enabled |
| 7 | + |
| 8 | +As part of the move towards Cgroups2, the Cgroups isolator has been updated to |
| 9 | +support the updated interface, Changes are outlined below, and it is recommended |
| 10 | +to read up on the [Cgroups v2](https://docs.kernel.org/admin-guide/cgroup-v2.html) |
| 11 | +documentation for an deeper understanding. |
| 12 | + |
| 13 | +### Requirements |
| 14 | + |
| 15 | +The `cgroups2` filesystem must be mounted at `/sys/fs/cgroup`. This allows Mesos |
| 16 | +to pick the Cgroups2 Isolator when creating the Mesos Containerizer. |
| 17 | + |
| 18 | +### Cgroup Names |
| 19 | + |
| 20 | +A cgroup called “CGROUP_NAME” has a path `/sys/fs/cgroup/$CGROUP_NAME`. This |
| 21 | +applies for all cgroups. A cgroup's name is the cgroup's path relative to |
| 22 | +`/sys/fs/cgroup`, where the cgroup2 filesystem is mounted. |
| 23 | + |
| 24 | +`flags.cgroups_root` (default: "mesos"): Root cgroup name. |
| 25 | + |
| 26 | +The client has control over the name of the root cgroup subtree under |
| 27 | +`/sys/fs/cgroup` that Mesos manages. The default name is “mesos”. |
| 28 | + |
| 29 | +### Process Cgroup |
| 30 | + |
| 31 | +Every process Mesos manages will have a cgroup, and a leaf cgroup under it which |
| 32 | +contains the pids. This is done to adhere to the [No Internal Process Constraint](https://docs.kernel.org/admin-guide/cgroup-v2.html#no-internal-process-constraint) |
| 33 | +imposed by Cgroups v2. |
| 34 | + |
| 35 | +### Container |
| 36 | + |
| 37 | +When the cgroups v2 isolator is `prepare`d for a new container, cgroups are |
| 38 | +created for the new container. When the cgroups v2 isolator `isolate`s, the new |
| 39 | +container is moved into it's leaf cgroup. |
| 40 | + |
| 41 | +Container Non-leaf Cgroup: `<flags.cgroups_root>/<containerId>` |
| 42 | + |
| 43 | +Container Leaf Cgroup: `<flags.cgroups_root>/<containerId>/leaf` |
| 44 | + |
| 45 | +### Nested Containers |
| 46 | + |
| 47 | +The Cgroups v2 isolator supports nested containers. |
| 48 | + |
| 49 | +Unlike Cgroups v1, we now create cgroups for all containers, even if they |
| 50 | +indicated they do not want their own resource isolation. This is to make it |
| 51 | +easier to keep track of a container’s processes. |
| 52 | + |
| 53 | +If a container does not wish to have its own resource isolation, it can pass in |
| 54 | +a flag `share_cgroups` and the isolator will not update any controllers for it. |
| 55 | + |
| 56 | +### Systemd Integration |
| 57 | + |
| 58 | +We currently do not have systemd integration. This section should be updated |
| 59 | +with our approach if systemd support is implemented. |
| 60 | + |
| 61 | +### Linux Launcher & Cgroups v2 Isolator |
| 62 | + |
| 63 | +On Linux systems that support cgroups v2, the Mesos Containerizer will use the [Linux Launcher](https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/linux_launcher.cpp) and the [Cgroups v2 Isolator](https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/isolators/cgroups2/cgroups2.cpp). |
| 64 | + |
| 65 | +It’s recommended to review to code to gain a complete understanding of these steps. |
| 66 | + |
| 67 | +Operations on startup: |
| 68 | + |
| 69 | +- Linux Launcher `recover`: Parse the cgroups subtree rooted at |
| 70 | +`flags.cgroups_root` to obtain container ids. Compares the persisted state to |
| 71 | +the recovered dcontainers to determine what contains are orphans. |
| 72 | +- Cgroups v2 Isolator `recover`: Create internal state to track recovered |
| 73 | +containers. Calls `recover` on all of the controllers that are used by each of |
| 74 | +the recovered containers. |
| 75 | + |
| 76 | +Operations when a new container is started: |
| 77 | + |
| 78 | +- Cgroups v2 Isolator `prepare`: Creates cgroups for the new container and adds |
| 79 | +the container to isolator's internal state. Configures namespace creation flags |
| 80 | +and mount setups; does not create mounts or namespaces. Calls `prepare` on all |
| 81 | +of the controllers that are used by the new container. |
| 82 | +- Linux Launcher `fork`: Forks the Mesos Agent process to create the new |
| 83 | +container's process. Also moves the child processes into the container's leaf |
| 84 | +cgroup. Creates mounts and namespaces. |
| 85 | +- Cgroups v2 Isolator `watch`: Calls `watch` on each of the controllers that |
| 86 | +are used by the container. When a resource-watch promise is resolved a handler |
| 87 | +is invoked. |
| 88 | +- Cgroups v2 Isolator `isolate`: Calls `isolate` on each of the controllers that |
| 89 | +are used by the container. Then moves the container process into the container's |
| 90 | +leaf cgroup; at this point the container is isolated. |
0 commit comments