Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: imageless rootless-cni-infra #8709

Closed
AkihiroSuda opened this issue Dec 14, 2020 · 14 comments · Fixed by #9423
Closed

Proposal: imageless rootless-cni-infra #8709

AkihiroSuda opened this issue Dec 14, 2020 · 14 comments · Fixed by #9423
Assignees
Labels
CNI Bug with CNI networking for root containers kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless

Comments

@AkihiroSuda
Copy link
Collaborator

AkihiroSuda commented Dec 14, 2020

EDIT: this is now tracked in #9423


Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind feature

Description

The current dependency on quay.io/libpod/rootless-cni-infra has several issues:

  • Doesn't work for non-AMD64, unless we can have multi-arch image
  • Doesn't work without Internet connectivity
  • The origin of the image is unverifiable

We can solve all these issues by making rootless-cni-infra to be imageless.

This can be implemented by executing an equivalent of the following commands in the user namespace:

# mount --rbind / ./rootfs
# mount -t tmpfs none ./rootfs/run
# podman run -d --net=slirp4netns  --rootfs ./rootfs

Also we should consider reimplementing rootless-cni-infra shell script in Go with pkg/reexec.
This can be done separately to reduce complexity of PR, but maybe we should do the both at once to reduce extra migration steps.

Challenge

Supporting live upgrade from a previous version might be difficult.
I think we can choose not to support live upgrade.


Related:

@AkihiroSuda AkihiroSuda added rootless kind/feature Categorizes issue or PR as related to a new feature. CNI Bug with CNI networking for root containers labels Dec 14, 2020
@Luap99
Copy link
Member

Luap99 commented Dec 14, 2020

I think rewriting the script in go is a requirement since the cnitool is not available on the host.

@cevich
Copy link
Member

cevich commented Dec 14, 2020

Shouldn't the bind mount be read-only?

@rhatdan
Copy link
Member

rhatdan commented Dec 15, 2020

I really like this idea. Use the OS as the container image. I would like to see if we could make this a first class object.

@Luap99
Copy link
Member

Luap99 commented Dec 17, 2020

I will tackle it

@Luap99 Luap99 self-assigned this Dec 17, 2020
@rhatdan
Copy link
Member

rhatdan commented Dec 18, 2020

@Luap99 Thanks, I would like to see an open discussion on what this should be? For example. should this just be run a container on /? Should this just be build up the OS Directories on scratch and then volume mount in /usr?

Can I run this securely, where we somehow figure out what from the host OS needs to be share and what should be private?

Years ago, before Docker containers, I was working on virt-sandbox, which was dividing the Host OS up into multiple containers. Imaging running hundreds of apache containers where /usr and most of /var/www is shared.

@Luap99
Copy link
Member

Luap99 commented Dec 18, 2020

I tried to start small and only mount what I need but I quickly discovered that the amount of dependencies needed is huge. I need to bind mount podman, ip, iptables, ip6tables, dnsmasq and possibly more plus all dynamic linked libraries. This is not sustainable.

Mounting /usr is easy. In theory this should include all dependencies but the truth is that on my fedora 33 system /usr/sbin/iptables is symlinked to /etc/alternatives/iptables. So iptables was missing from the container. Given the large number of linux distributions I do not think it is save to only mount /usr.

So I think the only sustainable method would be to mount the full rootfs in the namespace. The only directories that must be writeable by the container are /run and /var/lib/cni. These directories do not need to be persistent so they can be mounted as tmpfs. We can also mount the rootfs as read only for extra security.

@rhatdan
Copy link
Member

rhatdan commented Dec 21, 2020

I also believe we need to mask over certain files like /etc/passwd and /etc/shadow (Which should be blocked from the container). There may or may not be other directories that need to be masked.

@cevich
Copy link
Member

cevich commented Jan 4, 2021

The only directories that must be writeable by the container are /run and /var/lib/cni. These directories do not need to be persistent

I'm assuming that /var/lib/cni stores details such as allocated IPs? If so, this seems dangerous to discard. If the CNI service container is ever restarted, you don't want a subsequent IP allocation to clash with an existing one.

I also believe we need to mask over certain files

Ooof, I worry this list of needed masking may be much more extensive. Consider also subscription-manager and /etc/pki/tls/private keys. Then there's the danger of including third-party secret-stuffs we could never have any foreknowledge of.

Thinking back to the original "dynamic build" idea...would it be possible to have a service startup script which re-creates the / directory tree somewhere, then selectively hard-links in the required binaries (from a predetermined list)? This directory then gets volume-mounted as the container's '/'.

This would be easy to test, quick to perform, and guarantee only the "current" and necessary binaries are present. I'm guessing this may be easier/simpler than arranging all the special-file masking...but my understanding could be flawed.

@Luap99
Copy link
Member

Luap99 commented Jan 5, 2021

I'm assuming that /var/lib/cni stores details such as allocated IPs? If so, this seems dangerous to discard. If the CNI service container is ever restarted, you don't want a subsequent IP allocation to clash with an existing one.

You should never restart the container. If you do it all attached containers lose network connectivity since the network namespace gets destroyed and as @rhatdan and @mheon can confirm cni is notorious for leaving the ip allocation files around even after a reboot. Using a tmpfs to clean them once the container restarts is the easiest and best solution in my eyes.

What is the threat model here?

We are in a user namespace. We never have more permission than the original user. Thus we cannot read /etc/shadow. We can read /etc/passwd but I don't see how this is a security risk. More problematic would be user secrets such as private ssh keys.

Next we don't run some untrusted third party app in the container we only run the rootless-cni-infra script which we control.
How can a attacker gain access to the container? Either the attacker already has the user privileges on the host or he can escape the mount namespace from another container to this one which seems like a way more serious security problem.

Also the image version did not respect the --cni-config-dir option properly. It mounted the cni config dir only at container create time but this option can be used on podman run commands which did not worked if the rootless-cni-infra container was already running. This is only possible with the full rootfs version.

@cevich
Copy link
Member

cevich commented Jan 5, 2021

all attached containers lose network connectivity

Oh, right, okay, so throwing away the data isn't a concern then since related containers would otherwise break when the network
namespace goes away.

We are in a user namespace.

So if I understand correctly: The only possible exploits inside the CNI infra container (running with privileges) imply a compromised host in some way (requiring superuser or worse). The only vector from the rootless container side, would be through the shared network namespace, which (presumably) has a minuscule surface-area. Okay, I think I get it.

It seems the most obvious danger is some kind of DOS attack. One rootless container crashes/disables/breaks the CNI infra. container, which disables all other rootless container's networking. But this scenario is simple, and if known, the exploit is easily covered by testing.

@Iolaum
Copy link

Iolaum commented Jan 22, 2021

I 've hit the problem described in #8411 and considered here.

I m using Fedora IoT on a raspberry pi 4 (4GB) and tried to spin a container in a network. A repdocible example should be:

$ podman --version
podman version 2.2.1
$ cat /etc/os-release 
NAME=Fedora
VERSION="33.20210113.0 (IoT Edition)"
ID=fedora
VERSION_ID=33
VERSION_CODENAME=""
PLATFORM_ID="platform:f33"
PRETTY_NAME="Fedora 33.20210113.0 (IoT Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:33"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f33/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=33
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=33
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="IoT Edition"
VARIANT_ID=iot
OSTREE_VERSION='33.20210113.0'
$ podman network create localnet
/var/home/$USER/.config/cni/net.d/localnet.conflist
$ podman network ls
NAME      VERSION  PLUGINS
localnet  0.4.0    bridge,portmap,firewall,tuning,dnsname
$ mkdir /var/home/$USER/transmission/config
$ mkdir /var/home/$USER/transmission/downloads
$ mkdir /var/home/$USER/transmission/watch
$ podman run -d \
  --name=transmission1 \
  --hostname transmission1\
  --network localnet\
  -e PUID=1000 \
  -e PGID=1000 \
  -e TZ=Europe/London \
  -e TRANSMISSION_WEB_HOME=/combustion-release/ \
  -e USER=USER \
  -e PASS=PASS \
  -p 9091:9091 \
  -p 51413:51413 \
  -p 51413:51413/udp \
  -v /var/home/$USER/transmission/config:/config:z \
  -v /var/home/$USER/transmission/downloads:/downloads:z \
  -v /var/home/$USER/transmission/watch:/watch:z \
  --restart unless-stopped \
  ghcr.io/linuxserver/transmission
Error: cannot find rootless-podman-network-sandbox image for arm64

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@Iolaum
Copy link

Iolaum commented Mar 15, 2021

As of today, with the new podman 3.0 release,

$ podman --version
podman version 3.0.1
[nikos@fedora-iot1 swag]$ cat /etc/os-release 
NAME=Fedora
VERSION="33.20210315.0 (IoT Edition)"
ID=fedora
VERSION_ID=33
VERSION_CODENAME=""
PLATFORM_ID="platform:f33"
PRETTY_NAME="Fedora 33.20210315.0 (IoT Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:33"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f33/system-administrators-guide/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=33
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=33
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"
VARIANT="IoT Edition"
VARIANT_ID=iot
OSTREE_VERSION='33.20210315.0'

the issue mentioned on my previous post

Error: cannot find rootless-podman-network-sandbox image for arm64

still persists.

@AkihiroSuda
Copy link
Collaborator Author

This is currently tracked in #9423 , but it might be good to push arm64 image as a temporary workaround.

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CNI Bug with CNI networking for root containers kind/feature Categorizes issue or PR as related to a new feature. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. rootless
Projects
None yet
5 participants