Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman fails setting forward rules #5335

Closed
ikke-t opened this issue Feb 26, 2020 · 17 comments
Closed

podman fails setting forward rules #5335

ikke-t opened this issue Feb 26, 2020 · 17 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@ikke-t
Copy link

ikke-t commented Feb 26, 2020

/kind bug
Description

I lost all connectivity to network from a pod after switching to firewalld on RHEL8.1.

Steps to reproduce the issue:

  1. install podman on rhel 8.1
  2. install firewalld
  3. remove firewall section from /etc/cni/net.d/87-podman-bridge.conflist
  4. create pod as root with any image that has curl
  5. curl any web address from within the pod

Describe the results you received:

It will not be able to fetch anything from internet. Not even DNS works. One can see from tcpdump that outgoing packages are not mangled to have host ip src address. Host sends stuff to internet using the internal 10.88/16 src address.

The noon root pods do work. They also attach to bridge differently than the root pods.

Describe the results you expected:

get the internet page

Additional information you deem important (e.g. issue happens only occasionally):

It works if iptables in use. If changed to firewalld, the CNI_FORWARD rules don't get created. Repeatedly happens on RHEL8.1. Actually, it never works on RHEL.

While you debug this, monitor firewall rules for CNI_FORWARD:

iptables-save
nft list ruleset

@mccv1r0 already did put some effort debugging this, please ask him for more details.

Output of podman version:

Version:            1.6.4
RemoteAPI Version:  1
Go Version:         go1.12.12
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:                                                                                                                            
  compiler: gc
  git commit: ""
  go version: go1.12.12
  podman version: 1.6.4
host:
  BuildahVersion: 1.12.0-dev
  CgroupVersion: v1
  Conmon:
    package: conmon-2.0.6-1.module+el8.1.1+5259+bcdd613a.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.6, commit: 6ffbb2ec70dbe5ba56e4bfde946fb04f19dd8bbf'
  Distribution:
    distribution: '"rhel"'
    version: "8.1"
  MemFree: 3133972480
  MemTotal: 3945963520
  OCIRuntime:
    name: runc
    package: runc-1.0.0-64.rc9.module+el8.1.1+5259+bcdd613a.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 1073737728
  SwapTotal: 1073737728
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: apu.konttikoulu.fi
  kernel: 4.18.0-147.5.1.el8_1.x86_64
  os: linux
  rootless: false
  uptime: 1h 2m 15.31s (Approximately 0.04 days)
registries:
  blocked: null
  insecure: null
  search:
  - registry.access.redhat.com
  - registry.fedoraproject.org
  - registry.centos.org
  - docker.io
store:
  ConfigFile: /etc/containers/storage.conf
  ContainerStore:
    number: 1
  GraphDriverName: overlay
  GraphOptions: {}
  GraphRoot: /var/lib/containers/storage
  GraphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 9
  RunRoot: /var/run/containers/storage
  VolumePath: /var/lib/containers/storage/volumes

Package info (e.g. output of rpm -q podman or apt list podman):

podman-1.6.4-2.module+el8.1.1+5363+bf8ff1af.x86_64

Additional environment details (AWS, VirtualBox, physical, etc.):
physical hw

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 26, 2020
@mccv1r0
Copy link
Collaborator

mccv1r0 commented Feb 26, 2020

The same setup works using fedora 30

@ikke-t
Copy link
Author

ikke-t commented Feb 26, 2020

box was installed by this kickstart: https://github.com/RedHatNordicsSA/iot-hack/blob/master/rhel-device/ks.cfg

and further configured by this playbook:

https://github.com/RedHatNordicsSA/iot-hack/blob/master/setup-rhel-host.yml

and not much hand tweaking. Except for removing the firewall part from podman network config.

@ikke-t
Copy link
Author

ikke-t commented Feb 26, 2020

22:00:16 | <@baude> | ikkeT, on your install grab https://github.com/containernetworking/plugins/releases/download/v0.8.4/cni-plugins-linux-amd64-v0.8.4.tgz
-- | -- | --
22:00:29 | <@baude> | untar is
22:00:33 | <@baude> | it
22:01:08 | <@baude> | you could backup /usr/libexec/cni/ but then untar it in there <---
22:01:26 | <@baude> | remove the firewall section in your cni config
22:01:37 | <@baude> | reboot? or do whatever to make sure your stack is clean
22:01:40 | <@baude> | then run container
22:01:56 | <ikkeT> | ok

22:14:55 | <ikkeT> | @baude, no luck. I moved away the cni dir, recreated it's contents from tar, selinux relabeled it, made sure there are no pods left, and no firewall section in net config, rebooted, created new pod, and no forward rules appeared, and network didn't work from pod
22:15:03 | <ikkeT> | did i miss something?

@ikke-t
Copy link
Author

ikke-t commented Feb 26, 2020

Important note I forgot to write here! So this happens only with root containers. The non-root containers do work. So it's related to the bridge.

@ikke-t
Copy link
Author

ikke-t commented Feb 27, 2020

I got some extra info out of this by accident. I had some containers running, set up with firewalld firewall setting. I changed the network config in /etc/cni back to firewall being iptables. I forgot to remove containers before that. I cleaned them away (podman rm -f) only after changing the podman config back to iptables, and it spit out an error listing all the missing rules that it would have created with iptables :) So here is what is missed by firewalld setup, hopefully it helps to find the place for missing rule creation:

ERRO[0000] Error deleting network: running [/sbin/iptables -t nat -D POSTROUTING -s 10.88.0.145 -j CNI-6e99e89ea38dea2f8ce2dada -m comment --comment name: "podman" id: "ae900e94aea294f14bcf113ca71f05be682941e9dcfbb5e276a23eec4a58fcd4" --wait]: exit status 2: iptables v1.8.2 (nf_tables): Chain 'CNI-6e99e89ea38dea2f8ce2dada' does not exist
Try `iptables -h' or 'iptables --help' for more information. 
ERRO[0000] Error while removing pod from CNI network "podman": running [/sbin/iptables -t nat -D POSTROUTING -s 10.88.0.145 -j CNI-6e99e89ea38dea2f8ce2dada -m comment --comment name: "podman" id: "ae900e94aea294f14bcf113ca71f05be682941e9dcfbb5e276a23eec4a58fcd4" --wait]: exit status 2: iptables v1.8.2 (nf_tables): Chain 'CNI-6e99e89ea38dea2f8ce2dada' does not exist
Try `iptables -h' or 'iptables --help' for more information. 
ERRO[0000] unable to cleanup network for container ae900e94aea294f14bcf113ca71f05be682941e9dcfbb5e276a23eec4a58fcd4: "error tearing down CNI namespace configuration for container ae900e94aea294f14bcf113ca71f05be682941e9dcfbb5e276a23eec4a58fcd4: running [/sbin/iptables -t nat -D POSTROUTING -s 10.88.0.145 -j CNI-6e99e89ea38dea2f8ce2dada -m comment --comment name: \"podman\" id: \"ae900e94aea294f14bcf113ca71f05be682941e9dcfbb5e276a23eec4a58fcd4\" --wait]: exit status 2: iptables v1.8.2 (nf_tables): Chain 'CNI-6e99e89ea38dea2f8ce2dada' does not exist\nTry `iptables -h' or 'iptables --help' for more information.\n" 
ae900e94aea294f14bcf113ca71f05be682941e9dcfbb5e276a23eec4a58fcd4

@ikke-t
Copy link
Author

ikke-t commented Feb 27, 2020

it seems ping works, so it's tcp that get's bypassed by nat mangling.

@ikke-t
Copy link
Author

ikke-t commented Feb 27, 2020

it's the same with iptables btw, I just didn't notice earlier:

but more details, it seems non TCP works, here's tcpdump -i any of ping:

09:00:44.586367 IP 10.88.0.150 > 192.168.1.1: ICMP echo request, id 11, seq 1, length 64                                               
09:00:44.586506 IP 10.88.0.150 > 192.168.1.1: ICMP echo request, id 11, seq 1, length 64                                               
09:00:44.586635 IP 192.168.1.5 > 192.168.1.1: ICMP echo request, id 11, seq 1, length 64                                               
09:00:44.586970 IP 192.168.1.1 > 192.168.1.5: ICMP echo reply, id 11, seq 1, length 64                                                 
09:00:44.587048 IP 192.168.1.1 > 10.88.0.150: ICMP echo reply, id 11, seq 1, length 64                                                 
09:00:44.587062 IP 192.168.1.1 > 10.88.0.150: ICMP echo reply, id 11, seq 1, length 64                                                 

and here's the same with TCP curl 192.168.1.1:

09:03:33.177370 IP 10.88.0.150.47302 > 192.168.1.1.80: Flags [S], seq 1991485373, win 29200, options [mss 1460,sackOK,TS val 1856535650
ecr 0,nop,wscale 7], length 0
09:03:33.177503 IP 10.88.0.150.47302 > 192.168.1.1.80: Flags [S], seq 1991485373, win 29200, options [mss 1460,sackOK,TS val 1856535650
ecr 0,nop,wscale 7], length 0
09:03:33.177619 IP 10.88.0.1 > 10.88.0.150: ICMP host 192.168.1.1 unreachable - admin prohibited filter, length 68                     
09:03:33.177631 IP 10.88.0.1 > 10.88.0.150: ICMP host 192.168.1.1 unreachable - admin prohibited filter, length 68                     
09:03:34.209718 IP 10.88.0.150.47302 > 192.168.1.1.80: Flags [S], seq 1991485373, win 29200, options [mss 1460,sackOK,TS val 1856536682
ecr 0,nop,wscale 7], length 0
09:03:34.209848 IP 10.88.0.150.47302 > 192.168.1.1.80: Flags [S], seq 1991485373, win 29200, options [mss 1460,sackOK,TS val 1856536682
ecr 0,nop,wscale 7], length 0
09:03:34.209983 IP 10.88.0.1 > 10.88.0.150: ICMP host 192.168.1.1 unreachable - admin prohibited filter, length 68                     
09:03:34.209995 IP 10.88.0.1 > 10.88.0.150: ICMP host 192.168.1.1 unreachable - admin prohibited filter, length 68                     

The NAT doesn't happen on TCP.

@ikke-t ikke-t changed the title podman fails setting forward rules when firewalld used podman fails setting forward rules Feb 27, 2020
@ikke-t
Copy link
Author

ikke-t commented Feb 27, 2020

now trying it out with iptables firewall setting, the rules look normal. However the TCP bypasses the masquerade rule:

# Generated by xtables-save v1.8.2 on Thu Feb 27 09:07:18 2020
*filter
:INPUT ACCEPT [3310:302097]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [4764:1322591]
:CNI-FORWARD - [0:0]
:CNI-ADMIN - [0:0]
-A FORWARD -m comment --comment "CNI firewall plugin rules" -j CNI-FORWARD
-A CNI-FORWARD -m comment --comment "CNI firewall plugin rules" -j CNI-ADMIN
-A CNI-FORWARD -d 10.88.0.150/32 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A CNI-FORWARD -s 10.88.0.150/32 -j ACCEPT
COMMIT
# Completed on Thu Feb 27 09:07:18 2020
# Generated by xtables-save v1.8.2 on Thu Feb 27 09:07:18 2020
*security
:INPUT ACCEPT [3139:239867]
:FORWARD ACCEPT [40:3360]
:OUTPUT ACCEPT [4764:1322591]
COMMIT
# Completed on Thu Feb 27 09:07:18 2020
# Generated by xtables-save v1.8.2 on Thu Feb 27 09:07:18 2020
*raw
:PREROUTING ACCEPT [3447:319824]
:OUTPUT ACCEPT [4764:1322591]
COMMIT
# Completed on Thu Feb 27 09:07:18 2020
# Generated by xtables-save v1.8.2 on Thu Feb 27 09:07:18 2020
*mangle
:PREROUTING ACCEPT [3447:319824]
:INPUT ACCEPT [3310:302097]
:FORWARD ACCEPT [114:7398]
:OUTPUT ACCEPT [4764:1322591]
:POSTROUTING ACCEPT [4819:1326686]
COMMIT
# Completed on Thu Feb 27 09:07:18 2020
# Generated by xtables-save v1.8.2 on Thu Feb 27 09:07:18 2020
*nat
:PREROUTING ACCEPT [211:20496]
:INPUT ACCEPT [44:2896]
:POSTROUTING ACCEPT [107:7726]
:OUTPUT ACCEPT [107:7726]
:CNI-bcea01873550e10fff574d4b - [0:0]
-A POSTROUTING -s 10.88.0.150/32 -m comment --comment "name: \"podman\" id: \"3798cb46ebde04ecb5d2ed4b4d87e7c6908cb47344bb80569df7a5cf2d6c3101\"" -j CNI-bcea01873550e10fff574d4b
-A CNI-bcea01873550e10fff574d4b -d 10.88.0.0/16 -m comment --comment "name: \"podman\" id: \"3798cb46ebde04ecb5d2ed4b4d87e7c6908cb47344bb80569df7a5cf2d6c3101\"" -j ACCEPT
-A CNI-bcea01873550e10fff574d4b ! -d 224.0.0.0/4 -m comment --comment "name: \"podman\" id: \"3798cb46ebde04ecb5d2ed4b4d87e7c6908cb47344bb80569df7a5cf2d6c3101\"" -j MASQUERADE
COMMIT
# Completed on Thu Feb 27 09:07:18 2020
# Table `firewalld' is incompatible, use 'nft' tool.

I believe it should happen on the very last rule

@mheon
Copy link
Member

mheon commented Feb 27, 2020

Is this IPtables backend with firewalld still running? I don't think we want to support that - we should focus on iptables with firewalld off, and firewalld with firewalld on.

@ikke-t
Copy link
Author

ikke-t commented Feb 27, 2020

System had some firewalld commands given to open ports. The only thing on RHEL using iptables was the podman network config, until I removed the piece from the config.

I also put it back, as I thought podman works with iptables, but it didn't work. So it was the update of the system that broke the podman root containers external networking all together.

If I now again remove the iptables stanza from podman config, there is no-one commanding iptables in the system, it's purely on firewalld.

@ikke-t
Copy link
Author

ikke-t commented Feb 27, 2020

Tadaa, success! So if I have firewall defined to firewalld, it all starts to work!

{
    "cniVersion": "0.4.0",
    "name": "podman",
    "plugins": [
        {
            "type": "bridge",
            "bridge": "cni-podman0",
            "isGateway": true,
            "ipMasq": true,
            "ipam": {
                "type": "host-local",
                "routes": [
                    {
                        "dst": "0.0.0.0/0"
                    }
                ],
                "ranges": [
                    [
                        {
                            "subnet": "10.88.0.0/16",
                            "gateway": "10.88.0.1"
                        }
                    ]
                ]
            }
        },
        {
            "type": "portmap",
            "capabilities": {
                "portMappings": true
            }
        },
        {
            "type": "firewall",
            "backend": "firewalld"
        }
    ]
}

If I leave the firewall definition part out, root containers don't get to internet. Myth busted. Probably this should be tested again before removing that part from RHEL8.2?

Or perhaps some other changes fix it for 8.2, but on RHEL 8.1 it needs to be this way. Thanks for help.

@mheon
Copy link
Member

mheon commented Feb 27, 2020

Uh-oh.

@baude Eeeeeeek.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@Klaas-
Copy link

Klaas- commented Mar 29, 2020

this also happens on a current rhel8

also tested a centos8 stream, same result

@Klaas-
Copy link

Klaas- commented Mar 29, 2020

There is a kbase article already but no bz or link to this issue: https://access.redhat.com/solutions/4859291

@Klaas-
Copy link

Klaas- commented Mar 30, 2020

#5348 is this the fix for this issue?

@mheon
Copy link
Member

mheon commented Mar 30, 2020

Yes. This should be closed, as such.

@mheon mheon closed this as completed Mar 30, 2020
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

No branches or pull requests

5 participants