Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased latency when using bridge network with v5 #22786

Closed
gaeljw opened this issue May 23, 2024 · 5 comments
Closed

Increased latency when using bridge network with v5 #22786

gaeljw opened this issue May 23, 2024 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature pasta pasta(1) bugs or features

Comments

@gaeljw
Copy link

gaeljw commented May 23, 2024

Issue Description

I noticed extra latency following the upgrade of podman version from 4.9.4 to 5.0.2.

I have a situation where doing a cURL to a server inside my network usually takes a few milliseconds but with podman 5.0.2 it now takes a few seconds.

This happens only when the container is using a bridge network.

For the record, this happens in a CI context where CI runs jobs in containers and I don't necessarily have full control on the containers/networks created.

Steps to reproduce the issue

1/ Having a podman network:

(in my case this network is created by the CI, no control on it)

$ podman network inspect podman
[
     {
          "name": "podman",
          "id": "2f259bab93aaaaa2542ba43ef33eb990d0999ee1b9924b557b7be53c0b7a1bb9",
          "driver": "bridge",
          "network_interface": "podman0",
          "created": "2024-05-23T06:48:03.773290961Z",
          "subnets": [
               {
                    "subnet": "10.88.0.0/16",
                    "gateway": "10.88.0.1"
               }
          ],
          "ipv6_enabled": false,
          "internal": false,
          "dns_enabled": false,
          "ipam_options": {
               "driver": "host-local"
          }
     }
]

2/ Run a container without linking it to the network and run a cURL:

  • The image is likely irrelevant but I'm using the one with which we found the issue.
  • The server is likely irrelevant as well, I was able to reproduce using https://www.google.com or an internal server
podman run --rm -it renovate/renovate:37.279.0 bash
# Then in the container shell
time curl -v 'https://some-server/whatever'

Output:

real	0m0.032s
user	0m0.016s
sys	0m0.012s

3/ Run the same but with a container linked to the network:

podman run --rm -it renovate/renovate:37.279.0 bash
# Then in the container shell
time curl -v 'https://some-server/whatever'

Output:

real	0m2.056s
user	0m0.015s
sys	0m0.017s

💥 Notice how the time is now 2 seconds instead of a few milliseconds.

Comparison with podman v4.9.4:

With container NOT attached to the network:

real	0m0.045s
user	0m0.021s
sys	0m0.013s

With container attached to the network:

real	0m0.044s
user	0m0.018s
sys	0m0.019s

That is similar response times no matter the podman network being used or not.

Describe the results you received

See above

Describe the results you expected

See above

podman info output

host:
  arch: amd64
  buildahVersion: 1.35.3
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-2.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: d807bb8c1de3dc05fb66c77d2979a7f6903804bf'
  cpuUtilization:
    idlePercent: 97.97
    systemPercent: 0.59
    userPercent: 1.44
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: centos
    version: "9"
  eventLogger: file
  freeLocks: 1746
  hostname: ci-gitlab-runner.mycompany.net
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.14.0-446.el9.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1952759808
  memTotal: 16500396032
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.9.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.9.0
    package: netavark-1.10.3-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.4-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231204.gb86afe3-1.el9.x86_64
    version: |
      pasta 0^20231204.gb86afe3-1.el9.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.3-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.3
      commit: c22fde291bb35b354e6ca44d13be181c76a0a432
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 6424236032
  swapTotal: 6442446848
  uptime: 71h 34m 2.00s (Approximately 2.96 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - artifactory.mycompany.net
store:
  configFile: /opt/gitlab-runner/.config/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 0
    stopped: 4
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /opt/gitlab-runner/.local/share/containers/storage
  graphRootAllocated: 85805408256
  graphRootUsed: 42503954432
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /opt/containers-tmp
  imageStore:
    number: 8
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /opt/gitlab-runner/.local/share/containers/storage/volumes
version:
  APIVersion: 5.0.2
  Built: 1715074917
  BuiltTime: Tue May  7 09:41:57 2024
  GitCommit: ""
  GoVersion: go1.22.2 (Red Hat 1.22.2-1.el9)
  Os: linux
  OsArch: linux/amd64
  Version: 5.0.2

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

No

Additional environment details

VM with CentOS Stream release 9

Additional information

No response

@gaeljw gaeljw added the kind/bug Categorizes issue or PR as related to a bug. label May 23, 2024
@Luap99 Luap99 added the network Networking related issue or feature label May 23, 2024
@Luap99
Copy link
Member

Luap99 commented May 23, 2024

I assume you run rootless. I would guess the delay is the dns resolution? Can you check by connecting directly to a ip?

If it is dns please check /etc/resolv.conf on your host and in the container and provide the output.
My best guess is the issue is with pasta

pasta 0^20231204.gb86afe3-1.el9.x86_64

This version is old and there have been several dns fixes in the meantime so I suggest you try this with a nwer pasta (passt) version installed, i.e. from https://copr.fedorainfracloud.org/coprs/sbrivio/passt/

@gaeljw
Copy link
Author

gaeljw commented May 23, 2024

(Warning: you may assume I know pretty much nothing in network "stuff" 😅 )

I would guess the delay is the dns resolution? Can you check by connecting directly to a IP?

Right! Using IP, there's no delay.

If it is dns please check /etc/resolv.conf on your host and in the container and provide the output.

In the container:

search subdomain.mycompany.net
nameserver 10.76.96.26
nameserver 10.76.128.15
options timeout:2

In the host:

domain subdomain.mycompany.net
search subdomain.mycompany.net
nameserver 10.76.96.26
nameserver 10.76.128.15
options timeout:2

(Obviously, the IP of the nameservers are internals to our network.)

My best guess is the issue is with pasta

Currently, we have passt-0^20231204.gb86afe3-1.el9.x86_64 installed. We do not have pasta.

Using passt-0^20240510.g7288448-1.el9.x86_64 seems to resolve the issue. It's not available yet in CentOS repos though but I guess it's a matter of time.

Thanks! 👏

One thing unclear to me though is why the previous version of podman (v4.9.4) is still working with passt-0^20231204.gb86afe3-1.el9.x86_64? There's some kind of "compatibility matrix" between podman and pasta? (Again, assume I don't know much these stuff).

Out of curiosity, do you have references to the pasta issues you're referring to?

@Luap99 Luap99 added the pasta pasta(1) bugs or features label May 23, 2024
@Luap99
Copy link
Member

Luap99 commented May 23, 2024

One thing unclear to me though is why the previous version of podman (v4.9.4) is still working with passt-0^20231204.gb86afe3-1.el9.x86_64? There's some kind of "compatibility matrix" between podman and pasta? (Again, assume I don't know much these stuff).

Because podman 5 switched the default from slirp4netns to pasta. https://blog.podman.io/2024/03/podman-5-0-breaking-changes-in-detail/

Out of curiosity, do you have references to the pasta issues you're referring to?

No there are so many different ones that I honestly cannot tell without searching for a long time.
You can search in https://passt.top/passt/bugs and here issues with the pasta label
https://github.com/containers/podman/issues?q=is%3Aissue+label%3Apasta

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2024
@gaeljw
Copy link
Author

gaeljw commented May 23, 2024

Great, thanks a lot @Luap99 for the explanations and solution!

@sbrivio-rh
Copy link
Collaborator

Using passt-0^20240510.g7288448-1.el9.x86_64 seems to resolve the issue. It's not available yet in CentOS repos though but I guess it's a matter of time.

Right, yes, I just happened to rebase the CentOS Stream 9 package from Fedora yesterday:
https://gitlab.com/redhat/centos-stream/rpms/passt/-/commit/a6dcc340e56451a98089b23eddc6efb32ae236a6

but it might need a bit before mirrors pick up the new build.

Out of curiosity, do you have references to the pasta issues you're referring to?

It might be this by the way: https://passt.top/passt/commit/?id=d989eae308c2ea2032fc91cc04fb02dffe4a4b63

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature pasta pasta(1) bugs or features
Projects
None yet
Development

No branches or pull requests

3 participants