Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable compatibility with Nomad in cgroups.v2 mode #133

Merged
merged 1 commit into from
May 9, 2022

Conversation

shoenig
Copy link
Contributor

@shoenig shoenig commented Apr 23, 2022

This PR updates the containerd driver to support changes in how Nomad
manages cgroups when running on a machine using cgroups v2. The behavior
only activates on nodes where cgroups v2 are mounted at /sys/fs/cgroup
(same as Nomad 1.3).

  • The namespace is now set to nomad.slice, which containerd uses as
    the cgroup parent.

  • The container name is re-oriented to the new naming convention,
    i.e. <allocID>.<taskName>.scope. This is necessary for Nomad to
    be able to manage the cpuset resource.

Example

The OS / cgroup configuration

➜ cat /etc/os-release | grep VERSION=
VERSION="22.04 (Jammy Jellyfish)"

➜ mount -l | grep cgroup 
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)

The nomad.hcl file

log_level  = "info"
data_dir   = "/tmp/nomad1"
plugin_dir = "/opt/nomad/plugins"

server {
  enabled          = true
  bootstrap_expect = 1
}

client {
  enabled = true
}

plugin "containerd-driver" {
  config {
    enabled            = true
    containerd_runtime = "io.containerd.runc.v2"
    allow_privileged   = false
  }
}

An example.nomad job file

job "example" {
  datacenters = ["dc1"]
  type        = "service"

  # 1 task using 3 cores & 33 mb
  group "sleep-3" {
    count = 1
    task "sleep3" {
      driver = "containerd-driver"
      config {
        image = "bash:5"
        args  = ["sleep", "1000"]
      }
      resources {
        cores  = 3
        memory = 33
      }
    }
  }

  # 2 tasks using 2 cores & 22 mb
  group "sleep-2" {
    count = 2
    task "sleep2" {
      driver = "containerd-driver"
      config {
        image = "bash:5"
        args  = ["sleep", "1000"]
      }
      resources {
        cores  = 2
        memory = 22
      }
    }
  }

  # 3 tasks using 1 core and 11 mb
  group "sleep-1" {
    count = 3
    task "sleep1" {
      driver = "containerd-driver"
      config {
        image = "bash:5"
        args  = ["sleep", "1000"]
      }
      resources {
        cores  = 1
        memory = 11
      }
    }
  }
}

A show.hcl script for showing the cgroup interface files

#!/usr/bin/env bash
set -euo pipefail
for f in $(find /sys/fs/cgroup/nomad.slice -type d -name "*.scope")
do
    alloc=$(basename $f)
    pids=$(cat $f/cgroup.procs)
    cpuset=$(cat $f/cpuset.cpus)
    mem=$(cat $f/memory.max)
    echo "$alloc:"
    echo "  pids: $pids"
    echo "  cpus: $cpuset"
    echo "  mems: $mem"
done

After nomad job run example.nomad, run the script and get something like

➜ ./show.sh 
8f979fe5-dec9-a6f5-ab2b-50175adf60e3.sleep2.scope:
  pids: 77841
  cpus: 5-6,10-11
  mems: 23068672
5b918b66-0ae3-ca86-31eb-1c065ba9abe7.sleep1.scope:
  pids: 77849
  cpus: 8,10-11
  mems: 11534336
f97d44ae-17d9-a063-5403-d0da420d0b91.sleep2.scope:
  pids: 77673
  cpus: 3-4,10-11
  mems: 23068672
9b123379-aaff-de4b-cb25-154e4a3592fe.sleep1.scope:
  pids: 77638
  cpus: 7,10-11
  mems: 11534336
a20c8758-7c54-8724-a721-cc59162c4d2b.sleep3.scope:
  pids: 77629
  cpus: 0-2,10-11
  mems: 34603008
906e2bb5-05a7-fd8e-3b95-db6ec5e7533a.sleep1.scope:
  pids: 77939
  cpus: 9-11
  mems: 11534336

what cpuset is doing

8f979fe5 - cores 5 and 6 reserved, 10-11 shared
5b918b66 - core 8 reserved, 10-11 shared
f97d44ae - cores 3 and 4 reserved, 10-11 shared
9b123379 - core 7 reserved, 10-11 shared
a20c8758 - cores 0, 1, 2 reserved, 10-11 shared
906e2bb5 - core 9 reserved, 10-11 shared

This PR updates the containerd driver to support changes in how Nomad
manages cgroups when running on a machine using cgroups.v2

- The namespace is now set to "nomad.slice", which containerd uses as
  the cgroup parent.

- The container name is re-oriented to the new naming convention,
  i.e. "<allocID>.<taskName>.scope". This is necessary for Nomad to
  be able to manage the cpuset resource.
Copy link
Collaborator

@shishir-a412ed shishir-a412ed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @shoenig This is awesome!

@shishir-a412ed shishir-a412ed merged commit d8c2c2f into Roblox:master May 9, 2022
@afbjorklund
Copy link

afbjorklund commented Jun 17, 2022

@shishir-a412ed Would it be possible to include this in a driver release ?

Also note that current release says "0.9.2" due to being bumped too late...

@github-actions
Copy link

CLA Signature Action: All authors have signed the CLA. You may need to manually re-run the blocking PR check if it doesn't pass in a few minutes.

@afbjorklund afbjorklund mentioned this pull request Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants