# Linux Control Groups Demo

This notebook explores cgroups on Linux. In walking through this notebook you will examine how cgroups are presented in the Linux filesystem, explore creating a new cgroup and assigning a PID to the group and finally examine how Docker translates many resource limits into cgroup based configuration. 

This notebook has been tested on an EC2 instance running Amazon Linux 2. 

During the public demonstration of this notebook at the AWS Sydney Summit 2019, the notebook was executed on a c4.xlarge instance. 

## Initial Exploration of cgroups

Each process running in Linux is a member of a single cgroup per heirarchy. Cgroups are made visible via the /proc and /sys filesystems.

Firstly, you can find the cgroup heiarchies listed under the /sys filesystem:

In [4]:
ls /sys/fs/cgroup

[0m[38;5;27mblkio[0m    [38;5;27mcpu,cpuacct[0m  [38;5;27mfreezer[0m  [38;5;51mnet_cls[0m           [38;5;27mperf_event[0m
[38;5;51mcpu[0m      [38;5;27mcpuset[0m       [38;5;27mhugetlb[0m  [38;5;27mnet_cls,net_prio[0m  [38;5;27mpids[0m
[38;5;51mcpuacct[0m  [38;5;27mdevices[0m      [38;5;27mmemory[0m   [38;5;51mnet_prio[0m          [38;5;27msystemd[0m


Now, lets look at the cgroups our shell is currently in. First, get the PID:

In [5]:
echo $$

29924


Now, lets find which cgroups this PID is currently mapped to:

In [5]:
cat /proc/$$/cgroup

11:pids:/user.slice
10:freezer:/
9:blkio:/user.slice
8:perf_event:/
7:hugetlb:/
6:cpu,cpuacct:/user.slice
5:cpuset:/
4:net_cls,net_prio:/
3:devices:/user.slice
2:memory:/user.slice
1:name=systemd:/user.slice/user-1000.slice/session-1413.scope


From the above, you can see for some heirarchies, the process is assigned to the root of the heirarchy: for example freezer& cpuset. Note that in other heirarchies, for example blkio, the process is assigned to a sub-folder within the heirarchy: /user.slice . 

Within each cgroup, there are virtual files which are used by the resource controller to manage the processes allocated to the group. There are also reserved files, such as 'tasks' which are used to identify, and map, which PIDs are in this part of the cgroup heirarchy:

In [6]:
ls /sys/fs/cgroup/pids

cgroup.clone_children  [0m[38;5;27mdocker[0m             [38;5;27msystem.slice[0m
cgroup.procs           notify_on_release  tasks
cgroup.sane_behavior   release_agent      [38;5;27muser.slice[0m


From here in the root of the pids cgroup, you can see both the virtual files used by the resource controller, as we ll as the folder containing the user.slice section of the heirarchy we noted earlier

The user.slice folder contains its own version of the policy files, as well as its own tasks file:

In [7]:
ls /sys/fs/cgroup/pids/user.slice/

cgroup.clone_children  notify_on_release  pids.events  tasks
cgroup.procs           pids.current       pids.max


Lets examine the tasks file in the user.slice folder. This file contains all of the PIDs that are in this part of the cgroup heiarchy, including our shell!

In [7]:
cat /sys/fs/cgroup/pids/user.slice/tasks | grep $$

[01;31m[K29924[m[K


### Creating a new cgroup heirarchy: Controlling CPU Mapping

In this section, we will create a new cgroup heirarchy within the cpuset cgroup. Within this, we will show how we can limit a process to only use specific CPUs in the host by configuring these limits on a cgroup and placing a process within this part of the heirarchy. 

First, lets look at the controls available in the cpuset cgroup:

In [8]:
ls /sys/fs/cgroup/cpuset

cgroup.clone_children   cpuset.memory_pressure_enabled
cgroup.procs            cpuset.memory_spread_page
cgroup.sane_behavior    cpuset.memory_spread_slab
cpuset.cpu_exclusive    cpuset.mems
cpuset.cpus             cpuset.sched_load_balance
cpuset.effective_cpus   cpuset.sched_relax_domain_level
cpuset.effective_mems   [0m[38;5;27mdocker[0m
cpuset.mem_exclusive    notify_on_release
cpuset.mem_hardwall     release_agent
cpuset.memory_migrate   tasks
cpuset.memory_pressure


There are a number of controls available to manage the allocation of CPUs and memory to processes controlled by the cgroup.

To create a new section of the heirarchy, we simply create a folder within the base cgroup folder:

In [9]:
sudo mkdir /sys/fs/cgroup/cpuset/containers-demo

In [10]:
ls /sys/fs/cgroup/cpuset/containers-demo

cgroup.clone_children  cpuset.memory_pressure
cgroup.procs           cpuset.memory_spread_page
cpuset.cpu_exclusive   cpuset.memory_spread_slab
cpuset.cpus            cpuset.mems
cpuset.effective_cpus  cpuset.sched_load_balance
cpuset.effective_mems  cpuset.sched_relax_domain_level
cpuset.mem_exclusive   notify_on_release
cpuset.mem_hardwall    tasks
cpuset.memory_migrate


To allow us to tweak the the cpuset virtual files we need to define baseline limits on memory and CPU cores. The kernel will not allow us to add a process to the this cgroup until this is complete. 

This demo is running a 4 vCPU EC2 instance (c4.xlarge) so we will define all 4 vCPUs as accessible initially:

In [11]:
echo 0 | sudo tee /sys/fs/cgroup/cpuset/containers-demo/cpuset.mems
echo 0-3 | sudo tee /sys/fs/cgroup/cpuset/containers-demo/cpuset.cpus

0
0-3


Next, lets move our shell to the new cgroup we've just created by adding the PID to the tasks file:

In [12]:
echo $$ | sudo tee /sys/fs/cgroup/cpuset/containers-demo/tasks

29924


Now lets check which PIDs are mapped to this section of the cgroup heirarchy:

In [13]:
cat /sys/fs/cgroup/cpuset/containers-demo/tasks

29924
30045


two processes! Our shell, but also the cat command as the shell is forking the cat process and the child process inherits its cgroup mapping from its parent.

Lets explore using cgroups to limit access to CPU Cores. Our host is a c4.xlarge which has 4 vCPU.

If we run sysbench with 4 threads, lets see what result we get:

In [14]:
/usr/bin/sysbench cpu --threads=4 run

sysbench 1.0.9 (using system LuaJIT 2.0.4)

Running the test with following options:
Number of threads: 4
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  3327.13

General statistics:
    total time:                          10.0007s
    total number of events:              33279

Latency (ms):
         min:                                  1.15
         avg:                                  1.20
         max:                                 11.73
         95th percentile:                      1.21
         sum:                              39963.37

Threads fairness:
    events (avg/stddev):           8319.7500/12.46
    execution time (avg/stddev):   9.9908/0.00



Now, lets modify the cpuset.cpus setting to limit the process to 2 of the 4 vCPUs:

In [15]:
echo 2-3 | sudo tee /sys/fs/cgroup/cpuset/containers-demo/cpuset.cpus

2-3


Now if we re-run the same test, we should see approximately half the CPU performance: 

In [16]:
/usr/bin/sysbench cpu --threads=4 run

sysbench 1.0.9 (using system LuaJIT 2.0.4)

Running the test with following options:
Number of threads: 4
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1850.46

General statistics:
    total time:                          10.0015s
    total number of events:              18510

Latency (ms):
         min:                                  1.08
         avg:                                  2.16
         max:                                 37.09
         95th percentile:                     12.98
         sum:                              39911.11

Threads fairness:
    events (avg/stddev):           4627.5000/196.33
    execution time (avg/stddev):   9.9778/0.02



## Docker and cgroups

Now lets look at the same thing with docker

Before proceeding, reset the Jupyter Kernel (under Kernel->Restart) to spawn a new shell which is not mapped to the cgroup we just created. 

Run the following commands to validate the kernel has a new PID:

In [17]:
echo $$

29924


In [18]:
cat /proc/$$/cgroup

11:pids:/user.slice
10:freezer:/
9:blkio:/user.slice
8:perf_event:/
7:hugetlb:/
6:cpu,cpuacct:/user.slice
5:cpuset:/containers-demo
4:net_cls,net_prio:/
3:devices:/user.slice
2:memory:/user.slice
1:name=systemd:/user.slice/user-1000.slice/session-2251.scope


For this demo, we will use the Amazon Linux 2 Docker image to explore how Docker leverages cgroups. First, pull the image on to the host:

In [19]:
docker pull amazonlinux:2

2: Pulling from library/amazonlinux
Digest: sha256:d4a4328d679534af47c7a765d62a9195eb27f9a95c03213fca0a18f95aa112cd
Status: Image is up to date for amazonlinux:2


Now lets run the container in an infinite loop and tell Docker to set cpuset-cpus to 2-3, just like we did in our previous demo:

In [20]:
docker run --cpuset-cpus 2-3 --rm -d --cidfile /tmp/docker_amazonlinux.cid amazonlinux:2 /bin/tail -f /dev/null
CONTAINER_CID=`cat /tmp/docker_amazonlinux.cid`

24c69cb086e0df84c2019cfcc8879d2fd783c222c3741007b4a9ce066b48e68f


In [21]:
docker ps

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
24c69cb086e0        amazonlinux:2       "/bin/tail -f /dev/n…"   6 seconds ago       Up 5 seconds                            blissful_gates


Docker creates a cgroup heirarchy for itself, and then within that a sub-heirarchy for each container based on the container id. In the above Docker command, we exported the container ID and we reference it here in the shell variable $CONTAINER_CID for briefness:

In [22]:
ls /sys/fs/cgroup/cpuset/docker/$CONTAINER_CID/

cgroup.clone_children  cpuset.memory_pressure
cgroup.procs           cpuset.memory_spread_page
cpuset.cpu_exclusive   cpuset.memory_spread_slab
cpuset.cpus            cpuset.mems
cpuset.effective_cpus  cpuset.sched_load_balance
cpuset.effective_mems  cpuset.sched_relax_domain_level
cpuset.mem_exclusive   notify_on_release
cpuset.mem_hardwall    tasks
cpuset.memory_migrate


If we inspect the cpuset.cpus value, we will see it matches that passed to the Docker command when we launched the container: 

In [23]:
cat /sys/fs/cgroup/cpuset/docker/$CONTAINER_CID/cpuset.cpus

2-3


## Lab Clean Up

The following commands remove the cgroups created in this notebook. 

Prior to running the below commands, Click on Kernel->Restart in Jupyter to start a new underlying shell, as you will not be able to remove a cgroup if there is a process (ie the original Jupyter Kernel) still mapped to it. 

After restarting the kernel, execute the next line to remove the containers-demo cgroup:

In [2]:
sudo rmdir /sys/fs/cgroup/cpuset/containers-demo

Execute the following command to remove the temporary container id file created by the Docker example in this lab:

In [3]:
rm /tmp/docker_amazonlinux.cid