Podman socket performance issues #14941

jdoss · 2022-07-14T14:53:15Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

This is kind of a cross post issue to see if there is anything that can be done to improve the performance of the Podman socket under high concurrency.

I opened this issue hashicorp/nomad-driver-podman#175 on the Nomad Podman driver project to see if we can track down why Podman on my nomad client nodes becomes overwhelmed and unresponsive under high concurrency. This seems to be a common issue for other users of the Nomad Podman driver.

Is there anything that can be done to help improve the performance of the Podman socket? Are there any tips from the Podman team on how to better debug this issue to get more information?

Steps to reproduce the issue:

Launch hundreds of containers per client node with Nomad
Watch the podman socket become unavailable and my Nomad job allocations start failing

Additional information you deem important (e.g. issue happens only occasionally):

Podman is being run as root on these client nodes on Fedora CoreOS 36.20220618.3.1 on Google Compute VMs.

Output of podman version:

# podman version
Client:       Podman Engine
Version:      4.1.0
API Version:  4.1.0
Go Version:   go1.18.2
Built:        Mon May 30 16:03:28 2022
OS/Arch:      linux/amd64

Output of podman info --debug:

# podman info --debug
host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.0-2.fc36.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.0, commit: '
  cpuUtilization:
    idlePercent: 99.77
    systemPercent: 0.1
    userPercent: 0.13
  cpus: 4
  distribution:
    distribution: fedora
    variant: coreos
    version: "36"
  eventLogger: journald
  hostname: nomad-ephemeral-production-0.internal.step.plumbing
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.18.5-200.fc36.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 32708116480
  memTotal: 33506603008
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun-1.4.5-1.fc36.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.5
      commit: c381048530aa750495cf502ddb7181f2ded5b400
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-0.2.beta.0.fc36.x86_64
    version: |-
      slirp4netns version 1.2.0-beta.0
      commit: 477db14a24ff1a3de3a705e51ca2c4c1fe3dda64
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 4294963200
  swapTotal: 4294963200
  uptime: 53m 28.18s
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 106825756672
  graphRootUsed: 4567363584
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.1.0
  Built: 1653926608
  BuiltTime: Mon May 30 16:03:28 2022
  GitCommit: ""
  GoVersion: go1.18.2
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.0

Package info (e.g. output of rpm -q podman or apt list podman):

# rpm -q podman
podman-4.1.0-8.fc36.x86_64

The text was updated successfully, but these errors were encountered:

baude · 2022-07-14T14:55:57Z

Have you tried removing the nomad client and flood the socket without it?

baude · 2022-07-14T14:56:17Z

@jwhonce any thoughts?

jdoss · 2022-07-14T15:05:34Z

Hey @baude! Thanks for taking a look.

Have you tried removing the nomad client and flood the socket without it?

No, I haven't tried that. I am trying to think about how I would go about getting the same conditions without the nomad client running. The driver uses the socket to stream logs for each container so I think there are a lot of things going on that build up to the socket getting overloaded.

baude · 2022-07-14T15:07:12Z

is it possible to exactly reproduce what you are doing? otherwise, this is a lot to ask

Luap99 · 2022-07-14T15:07:51Z

If it just the log endpoint it is tracked here: #14879

jdoss · 2022-07-14T15:18:46Z

is it possible to exactly reproduce what you are doing? otherwise, this is a lot to ask

@baude Not without launching your own Nomad cluster and loading up each client node 200+ containers each. I understand it's a lot to ask and I am willing to do whatever I can on my end to provide more information.

If it just the log endpoint it is tracked here: #14879

@Luap99 Yeah the Driver does track the log endpoint. Here is where I believe it is doing that:

https://github.com/hashicorp/nomad-driver-podman/blob/main/api/container_logs.go#L16

jdoss · 2022-07-14T15:32:06Z

It looks like I can disable log collection in the Nomad Podman driver.

plugin "nomad-driver-podman" {
          config {
            socket_path = "unix://var/run/podman/podman.sock"
            disable_log_collection = false
            volumes {
              enabled      = true
              selinuxlabel = "z"
            }
          }
        }

I am going to test that out on my client nodes and see if I have better performance when deploying a lot of containers at once.

github-actions · 2022-08-14T00:07:24Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2022-08-15T17:52:16Z

Since we have heard nothing back in a month. I am guessing that the issue is resolved. Reopen if I am mistaken.

jdoss · 2022-08-15T18:18:46Z

I am still seeing issues but I haven't been able to dig into it more. I will respond back once I have more info.

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 14, 2022

baude added kind/performance and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 14, 2022

jdoss mentioned this issue Jul 14, 2022

Performance and scaling nomad-driver-podman under high allocation loads hashicorp/nomad-driver-podman#175

Open

github-actions bot added the stale-issue label Aug 14, 2022

rhatdan closed this as completed Aug 15, 2022

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 19, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Podman socket performance issues #14941

Podman socket performance issues #14941

jdoss commented Jul 14, 2022

baude commented Jul 14, 2022

baude commented Jul 14, 2022

jdoss commented Jul 14, 2022

baude commented Jul 14, 2022

Luap99 commented Jul 14, 2022

jdoss commented Jul 14, 2022

jdoss commented Jul 14, 2022

github-actions bot commented Aug 14, 2022

rhatdan commented Aug 15, 2022

jdoss commented Aug 15, 2022

Podman socket performance issues #14941

Podman socket performance issues #14941

Comments

jdoss commented Jul 14, 2022

baude commented Jul 14, 2022

baude commented Jul 14, 2022

jdoss commented Jul 14, 2022

baude commented Jul 14, 2022

Luap99 commented Jul 14, 2022

jdoss commented Jul 14, 2022

jdoss commented Jul 14, 2022

github-actions bot commented Aug 14, 2022

rhatdan commented Aug 15, 2022

jdoss commented Aug 15, 2022