Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

coreos-metadata fails as non-root user on Azure #2468

Closed
csssuf opened this issue Jun 25, 2018 · 8 comments
Closed

coreos-metadata fails as non-root user on Azure #2468

csssuf opened this issue Jun 25, 2018 · 8 comments

Comments

@csssuf
Copy link

csssuf commented Jun 25, 2018

Issue Report

Bug

Container Linux Version

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1814.0.0
VERSION_ID=1814.0.0
BUILD_ID=2018-06-20-0209
PRETTY_NAME="Container Linux by CoreOS 1814.0.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Environment

Azure

Expected Behavior

coreos-metadata is able to retrieve metadata when run as a non-root user.

Actual Behavior

[bound] core@kola-070f54d2-3e4c211e75 ~ $ coreos-metadata --cmdline --attributes /dev/stdout
Jun 25 22:45:22.569 INFO Fetching http://168.63.129.16/?comp=versions: Attempt #1
Jun 25 22:47:32.216 INFO Failed to fetch: http://168.63.129.16/?comp=versions: Connection timed out (os error 110)
Jun 25 22:47:33.216 INFO Fetching http://168.63.129.16/?comp=versions: Attempt #2
Jun 25 22:49:43.288 INFO Failed to fetch: http://168.63.129.16/?comp=versions: Connection timed out (os error 110)
Jun 25 22:49:45.288 INFO Fetching http://168.63.129.16/?comp=versions: Attempt #3
...

Reproduction Steps

  1. Boot Container Linux on Azure.
  2. Wait a few minutes for waagent to add its firewall rules.
  3. Attempt to use coreos-metadata as a non-root user to retrieve some metadata.

Other Information

waagent was bumped from 2.2.4 to 2.2.25 (coreos/coreos-overlay#3205), which introduced and enabled the OS.EnableFirewall option. This option causes waagent to add iptables rules to prevent non-root users from reaching the Azure fabric endpoint, which coreos-metadata uses to retrieve its metadata.

The simplest fix for this would be to set OS.EnableFirewall = n in waagent.conf, although this would require an OEM partition update. It's also not clear that this is a good idea from an Azure perspective; blocking non-root access to the the fabric endpoint is apparently "the right thing", although it's not clear precisely what that means.

It also might be a better solution to explicitly require coreos-metadata to run as root on Azure. coreos-metadata already needs to be run as root when using a Config Drive. If this is the chosen solution, the issue is really that coreos-metadata should explicitly bail out and inform the user when run as a non-root user on these platforms.

@bgilbert
Copy link
Contributor

We shouldn't hardcode a bailout, since those firewall rules may or may not be enabled.

What use case does this break? coreos-metadata normally does run as root.

@csssuf
Copy link
Author

csssuf commented Jun 26, 2018

I ran into this while testing coreos/afterburn#88. At the moment this doesn't break any real-world usecase I'm aware of, but it seems that coreos-metadata should either be able to operate as an unprivileged process, or we should explicitly require root for some (or even all) providers (even if that's just adding some documentation). Failing with a seemingly unrelated error certainly seems like the wrong thing to do.

@bgilbert
Copy link
Contributor

I guess I don't see it. If someone (maybe the user?) installs a DROP rule, a) it's not an application's job to work around (or document) the consequences, and b) Connection timed out is exactly the error I'd expect.

@csssuf
Copy link
Author

csssuf commented Jun 26, 2018

We certainly shouldn't try to anticipate any possible user configurations that might prevent coreos-metadata from working, but since OS.EnableFirewall is enabled by default on most distros (except Debian) it seems odd to me to leave this entirely undocumented.

Do you think the config drive case is any different?

@bgilbert
Copy link
Contributor

The problem is also not user-facing, so such documentation would be more noise than signal. The config-drive case is not a system configuration issue, so it's somewhat more fundamental, but we emit a reasonably helpful error message there.

What might make sense is to detect the connection timeout, notice that we're non-root on Azure, and append a helpful hint to the error message. We could do the same for the config-drive case (though technically that wants CAP_SYS_ADMIN, not root).

@csssuf
Copy link
Author

csssuf commented Jun 26, 2018

That seems reasonable to me.

@lucab
Copy link

lucab commented Jul 2, 2018

Given that this is not a bug affecting ContainerLinux, but there are some distribution-independent enhancements we'd like to have in coreos-metadata, I'd like to close this ticket here and move/split into three separate tickets directly in the coreos-metadata tracker (to be reopened):

  1. document cloud-specific quirks in repo doc, so we can reference why/where it needs root
  2. log a warning on network failure on azure if non-root
  3. check and set CAP_SYS_ADMIN whenever using a config-drive

If there are not objections (especially on re-opening coreos-metadata tracker for things not specific to CL), I'll proceed with that tomorrow.

@lucab
Copy link

lucab commented Jul 3, 2018

I moved those items to coreos/afterburn#94, coreos/afterburn#95, and coreos/afterburn#96.
Closing this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants