Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman container created via libpod RESTful api must set swap explicitly, otherwise it will not killed by OOM killer #13145

Closed
ttys3 opened this issue Feb 4, 2022 · 9 comments · Fixed by #13839
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@ttys3
Copy link
Contributor

ttys3 commented Feb 4, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman container created via /libpod/containers/create api which only specific memory.limit, will not set MemorySwap (memory.swap ) to double of Memory limit like the cli or compat api (/containers/create).

so the cgroup v2 swap limit is always max, so the container will not killed by OOM even it reached the hard memory limit.

after explicit set the memory.swap value, the created the container works as the same one created by cli.

Steps to reproduce the issue:

  1. first run via command:
sudo podman run --cgroup-manager=systemd --memory="512M" --name oom --rm -ti docker.io/80x86/eatmem

the container will been killed soon since it need 1G memory but we only give it 512M memory (for podman, actually it is 512M memory + 512M swap limit, but since the eatmem will try to allocate exactly 1024M, because the container tini init program in this container also need small bit memory, so eatmem program will got killed)

  1. then we create a container via the api:
#!/bin/sh
# create a podman container with fifo named pipe as log file
# for demonstrate issue https://github.com/containers/podman/issues/13081

CTR_NAME="eatmem-demo"

function print_title() {
    title=$1
    echo ""
    echo "|=================== $title ===================>>>"
    echo ""
}

print_title "remove container first, name=$CTR_NAME"
sudo podman rm -f $CTR_NAME

print_title "begin POST /v1.0.0/libpod/containers/create"

PODMAN_API_CALL='sudo curl --max-time 60 -v --unix-socket /run/podman/podman.sock'

sudo podman pull docker.io/80x86/eatmem

$($PODMAN_API_CALL -X POST http://d/v1.0.0/libpod/containers/create -H 'Content-Type: application/json' -d '{ "name": "eatmem-demo", "image": "docker.io/80x86/eatmem", "resource_limits": { "memory": { "limit": 536870912 }, "cpu": { "shares": 200 } } }' 

sudo podman inspect $CTR_NAME

and then start it:

sudo podman run eatmem-demo

Describe the results you received:

eatmem-demo not killed by OOM killer.

the container created via command:

cat /sys/fs/cgroup/machine.slice/libpod-c83f626da2c872b0e2789b9331191497b2f12cea6ab0f971d8ee56b352ec82f1.scope/container/memory.max
536870912

cat /sys/fs/cgroup/machine.slice/libpod-c83f626da2c872b0e2789b9331191497b2f12cea6ab0f971d8ee56b352ec82f1.scope/container/memory.swap.max
536870912

the container created via api:

cat /sys/fs/cgroup/machine.slice/libpod-5af22c4658cbfa7fd48e6832342a637719088877378fb6ea9e00a9b9e3173e83.scope/container/memory.max
536870912

cat /sys/fs/cgroup/machine.slice/libpod-5af22c4658cbfa7fd48e6832342a637719088877378fb6ea9e00a9b9e3173e83.scope/container/memory.swap.max
max

Describe the results you expected:

eatmem-demo should killed by OOM killer.

cgroup v2 memory.swap.max of the container created via api should be 536870912, not max

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.0-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: bdb4f6e56cd193d40b75ffc9725d4b74a18cb33c'
  cpus: 16
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  hostname: wudeng
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.16.5-arch1-1
  linkmode: dynamic
  logDriver: journald
  memFree: 1982287872
  memTotal: 33594998784
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.4.2-1
    path: /usr/bin/crun
    version: |-
      crun version 1.4.2
      commit: f6fbc8f840df1a414f31a60953ae514fa497c748
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.1.12-1
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 15819862016
  swapTotal: 17179865088
  uptime: 12h 38m 30.18s (Approximately 0.50 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  hub.k8s.lan:
    Blocked: false
    Insecure: true
    Location: hub.k8s.lan
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: hub.k8s.lan
  search:
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 14
    paused: 0
    running: 14
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 152
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.4.4
  Built: 1639074640
  BuiltTime: Fri Dec 10 02:30:40 2021
  GitCommit: f6526ada1025c2e3f88745ba83b8b461ca659933
  GoVersion: go1.17.4
  OsArch: linux/amd64
  Version: 3.4.4

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.1.0-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: bdb4f6e56cd193d40b75ffc9725d4b74a18cb33c'
  cpus: 16
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  hostname: wudeng
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.16.5-arch1-1
  linkmode: dynamic
  logDriver: journald
  memFree: 1839431680
  memTotal: 33594998784
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 1.4.2-1
    path: /usr/bin/crun
    version: |-
      crun version 1.4.2
      commit: f6fbc8f840df1a414f31a60953ae514fa497c748
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.1.12-1
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 15819862016
  swapTotal: 17179865088
  uptime: 12h 38m 48.27s (Approximately 0.50 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  hub.k8s.lan:
    Blocked: false
    Insecure: true
    Location: hub.k8s.lan
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: hub.k8s.lan
  search:
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 14
    paused: 0
    running: 14
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 152
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.4.4
  Built: 1639074640
  BuiltTime: Fri Dec 10 02:30:40 2021
  GitCommit: f6526ada1025c2e3f88745ba83b8b461ca659933
  GoVersion: go1.17.4
  OsArch: linux/amd64
  Version: 3.4.4

Package info (e.g. output of rpm -q podman or apt list podman):

Name            : podman
Version         : 3.4.4-1
Description     : Tool and library for running OCI-based containers in pods
Architecture    : x86_64
URL             : https://github.com/containers/podman
Licenses        : Apache
Groups          : None
Provides        : None
Depends On      : cni-plugins  conmon  containers-common  crun  fuse-overlayfs  iptables  libdevmapper.so=1.02-64  libgpgme.so=11-64  libseccomp.so=2-64  slirp4netns
Optional Deps   : apparmor: for AppArmor support
                  btrfs-progs: support btrfs backend devices [installed]
                  catatonit: --init flag support
                  podman-docker: for Docker-compatible CLI
Required By     : cockpit-podman  podman-compose
Optional For    : None
Conflicts With  : None
Replaces        : None
Installed Size  : 72.79 MiB
Packager        : David Runge <dvzrv@archlinux.org>
Build Date      : Fri 10 Dec 2021 02:30:40 AM CST
Install Date    : Thu 03 Feb 2022 02:45:48 AM CST
Install Reason  : Explicitly installed
Install Script  : No
Validated By    : Signature

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 4, 2022
@ttys3
Copy link
Contributor Author

ttys3 commented Feb 4, 2022

I think I got the reason

// pkg/specgenutil/specgen.go
// func FillOutSpecGen(s *specgen.SpecGenerator, c *entities.ContainerCreateOptions, args []string) error

s.ResourceLimits.Memory, err = getMemoryLimits(s, c)

The logic is: if client only specific memory.Limit limit, it auto set memory.Swap = memory.Limit * 2

/sys/fs/cgroup/machine.slice/libpod-xxxxxxx.scope/container/memory.swap.max will not be max only if memory.Swap is set.

but specgenutil.FillOutSpecGen() is only called in func create(cmd *cobra.Command, args []string) error or func run(cmd *cobra.Command, args []string) error and /containers/create (handler is compat.CreateContainer)

so this logic does not exists for api call via /libpod/containers/create, which handler is libpod.CreateContainer

func getMemoryLimits(s *specgen.SpecGenerator, c *entities.ContainerCreateOptions) (*specs.LinuxMemory, error) {
	var err error
	memory := &specs.LinuxMemory{}
	hasLimits := false
	if m := c.Memory; len(m) > 0 {
		ml, err := units.RAMInBytes(m)
		if err != nil {
			return nil, errors.Wrapf(err, "invalid value for memory")
		}
		if ml > 0 {
			memory.Limit = &ml
			if c.MemorySwap == "" {
				limit := 2 * ml
				memory.Swap = &(limit)
			}
			hasLimits = true
		}
	}
	if m := c.MemoryReservation; len(m) > 0 {
		mr, err := units.RAMInBytes(m)
		if err != nil {
			return nil, errors.Wrapf(err, "invalid value for memory")
		}
		memory.Reservation = &mr
		hasLimits = true
	}
	if m := c.MemorySwap; len(m) > 0 {
		var ms int64
		// only set memory swap if it was set
		// -1 indicates unlimited
		if m != "-1" {
			ms, err = units.RAMInBytes(m)
			memory.Swap = &ms
			if err != nil {
				return nil, errors.Wrapf(err, "invalid value for memory")
			}
			hasLimits = true
		}
	}
	if c.MemorySwappiness >= 0 {
		swappiness := uint64(c.MemorySwappiness)
		memory.Swappiness = &swappiness
		hasLimits = true
	}
	if c.OOMKillDisable {
		memory.DisableOOMKiller = &c.OOMKillDisable
		hasLimits = true
	}
	if !hasLimits {
		return nil, nil
	}
	return memory, nil
}

@ttys3
Copy link
Contributor Author

ttys3 commented Feb 4, 2022

so, currently, the solution for the client api call is, the swap limit must be set:

{
  "name": "eatmem-demo",
  "image": "docker.io/80x86/eatmem",
  "resource_limits": {
    "memory": {
      "limit": 536870912,
      "swap": 536870912
    },
    "cpu": {
      "shares": 200
    }
  }
}

after this, the container killed by OOM killer correctly.

or create the contianer via the compat api: http://d/v1.0.0/containers/create

@ttys3
Copy link
Contributor Author

ttys3 commented Feb 4, 2022

there is also another question, why the inspect get the result is MemorySwap is double of Memory, but the /sys/fs/cgroup/machine.slice/libpod-xxxxxxx.scope/container/memory.swap.max only set to Memory ?

❯ sudo podman inspect oom | rg memory
                "--memory=512M",
            "Memory": 536870912,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 1073741824,
            "MemorySwappiness": -1,

and podman will set both memory.swap.max and memory.max to the same value.

sudo podman run --cgroup-manager=systemd --memory="512M" --name oom --rm -ti docker.io/80x86/eatmem will killed by OOM

sudo podman run --cgroup-manager=systemd --memory="514M" --name oom --rm -ti docker.io/80x86/eatmem will not.
(so it is 514M memory + 514M swap, which > 1024M)

update:

I got the related code, but do not know why crun do in this way (swap -= memory->limit):

crun has this:

  swap = memory->swap;
  if (cgroup2 && memory->swap != -1)
    {
      if (! memory->limit_present)
        return crun_make_error (err, 0, "cannot set swap limit without the memory limit");
      if (memory->swap < memory->limit)
        return crun_make_error (err, 0, "cannot set memory+swap limit less than the memory limit");

      swap -= memory->limit;
    }

oh no, https://github.com/opencontainers/runtime-spec/blob/8958f93039ab90be53d803cd7e231a775f644451/specs-go/config.go#L304

I got the finall anwser:

Swap: Total memory limit (memory + swap)

this is really bad name, MemorySwap is better.

the inspect result uses MemorySwap, seems try to fix the bad name -_-

@ttys3 ttys3 changed the title podman container run via api will not killed by OOM killer podman container created via libpod RESTful api will not killed by OOM killer Feb 4, 2022
@ttys3 ttys3 changed the title podman container created via libpod RESTful api will not killed by OOM killer podman container created via libpod RESTful api must set swap explicitly, otherwise it will not killed by OOM killer Feb 4, 2022
@mheon mheon added the 4.0 label Feb 7, 2022
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Mar 10, 2022

@mheon @ttys3 I can not tell if this issue is solved or there are subissues?

@ttys3
Copy link
Contributor Author

ttys3 commented Mar 10, 2022

@rhatdan it is just a little API compability problem. I think it would be better if the behavior of the various APIs (the cli, the comat api, the libpod api) were unified.

The detail is as I described in the issue "Description" section above

@rhatdan
Copy link
Member

rhatdan commented Mar 10, 2022

Interested in opening a PR to unify them?

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Apr 11, 2022

@cdoern PTAL

cdoern pushed a commit to cdoern/podman that referenced this issue Apr 12, 2022
in specgen, CLI path uses the given memory limit to define the swap value (if not already specified)
add a route to this piece of code from within the api handlers

resolves containers#13145

Signed-off-by: cdoern <cbdoer23@g.holycross.edu>
cdoern pushed a commit to cdoern/podman that referenced this issue Apr 12, 2022
in specgen, CLI path uses the given memory limit to define the swap value (if not already specified)
add a route to this piece of code from within the api handlers

resolves containers#13145

Signed-off-by: cdoern <cbdoer23@g.holycross.edu>
cdoern pushed a commit to cdoern/podman that referenced this issue Apr 14, 2022
in specgen, CLI path uses the given memory limit to define the swap value (if not already specified)
add a route to this piece of code from within the api handlers

resolves containers#13145

Signed-off-by: cdoern <cbdoer23@g.holycross.edu>
Signed-off-by: cdoern <cdoern@redhat.com>
cdoern pushed a commit to cdoern/podman that referenced this issue Apr 14, 2022
in specgen, CLI path uses the given memory limit to define the swap value (if not already specified)
add a route to this piece of code from within the api handlers

resolves containers#13145

Signed-off-by: cdoern <cbdoer23@g.holycross.edu>
Signed-off-by: cdoern <cdoern@redhat.com>
Signed-off-by: cdoern <cbdoer23@g.holycross.edu>
cdoern pushed a commit to cdoern/podman that referenced this issue Apr 18, 2022
in specgen, CLI path uses the given memory limit to define the swap value (if not already specified)
add a route to this piece of code from within the api handlers

resolves containers#13145

Signed-off-by: cdoern <cbdoer23@g.holycross.edu>
cdoern pushed a commit to cdoern/podman that referenced this issue Apr 18, 2022
in specgen, CLI path uses the given memory limit to define the swap value (if not already specified)
add a route to this piece of code from within the api handlers

resolves containers#13145

Signed-off-by: cdoern <cdoern@redhat.com>
gbraad pushed a commit to gbraad-redhat/podman that referenced this issue Jul 13, 2022
in specgen, CLI path uses the given memory limit to define the swap value (if not already specified)
add a route to this piece of code from within the api handlers

resolves containers#13145

Signed-off-by: cdoern <cdoern@redhat.com>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants