Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Networking - userland-proxy could better clarify impact #17312

Open
polarathene opened this issue May 14, 2023 · 5 comments
Open

Networking - userland-proxy could better clarify impact #17312

polarathene opened this issue May 14, 2023 · 5 comments
Assignees
Labels
area/networking Relates to anything around networking kind/enhancement Improves the usability of docs lifecycle/frozen

Comments

@polarathene
Copy link
Contributor

polarathene commented May 14, 2023

Problem description

The daemon setting userland-proxy no longer appears to be documented? Disabling it still applies configuration, behavioural difference is not clearly explained.

Prior documentation since removed

The option was originally introduced in May 2015 with docs at docs/sources/articles/networking.md

There is not much history on those changes to derive information on if this was intentional. Considering the size and intent of the two changes, it was likely accidental and difficult for any review to catch without investing significant time to be thorough.

All that seems to exist now is this: https://docs.docker.com/engine/reference/commandline/dockerd/

Use userland proxy for loopback traffic (default true)

That does not communicate much to the user, or the actual differences in network behaviour when the setting is enabled / disabled.

Problem location

I couldn't find the information I wanted. I expected to find it near the following URL https://docs.docker.com/network/ or https://docs.docker.com/config/daemon/

Either section may be relevant. The previous docs location was regarding information on network port binding, and provided the information as a note admonition. Perhaps config (container networking) and network (iptables) are good candidates.

Project version(s) affected

Docs since Feb 2018.

Suggestions for a fix

I think it is planned to eventually drop the userland-proxy feature once it has been disabled by default and reaches a level of compatibility that makes enabling redundant.

Most users may not need this information documented, thus this issue alone may be a sufficient resource. Below is relevant information for users to be aware of regarding the feature toggles.


Previous docs content

Click to expand
  • The first part of the referenced docs (that the original review said would not be needed), was not dropped for the ~3 years it was present in the docs for.
  • The remaining seemed relevant technical information to explain better what disabling the setting did:

    The --userland-proxy parameter, true by default, provides a userland implementation for inter-container and outside-to-container communication.
    When disabled, Docker uses both an additional MASQUERADE iptable rule and the net.ipv4.route_localnet kernel parameter which allow the host machine to connect to a local container exposed port through the commonly used loopback address: this alternative is preferred for performance reasons.

Resolved

EDIT: This feedback was actually addressed, my bad 😅

An earlier PR attempt for the --userland-proxy feature has a relevant comment about the documented loopback reference:

In regards to where it says the option allows routing "through the loopback interface". It's not technically accurate as the option allows routing localnet traffic (127.0.0.0/8).
If another address is on the loopback interface, or if the 127.0.0.1 is on an interface other than loopback, it makes no difference.

This is relevant information AFAIK. Along with the route_localnet kernel change. You can query 127.0.0.1, but not the IPv6 equivalent [::1] (which has no equivalent route_localnet). As the default binding 0.0.0.0 also includes IPv6 interfaces, this behaviour from userland-proxy: false is not obvious to track down. Works fine with userland-proxy: true.

@thaJeztah thaJeztah added area/networking Relates to anything around networking kind/enhancement Improves the usability of docs labels May 15, 2023
@thaJeztah
Copy link
Member

/cc @akerouanton @dvdksn

@polarathene
Copy link
Contributor Author

polarathene commented May 15, 2023

Presently userland-proxy: false differs from userland-proxy: true in networking behaviour for a containers published port when connecting to that container indirectly through the host IP (either from the docker host itself, or one of it's other containers).

userland-proxy: false differences

userland-proxy: false differs by:

  • Kernel network settings enabled (iptables rules adjusted for compatibility):
    • route_localnet=1 (IPv4 only) set on docker bridge networks => Support publishing ports to localhost / 127.0.0.1.
    • hairpin_mode=1 (on the bridge (veth) port / brport) set on each container interface => Support connections originating from a container that resolves back into itself.
  • iptables rules no longer send local traffic through user-space via a docker-proxy process:
    • Improves local network performance (approx 2-3x iperf3 max throughput, on par with host network).
    • Better preservation of remote address (client IP). Although the current iptables rules have a regression vs userland-proxy: true.

No equivalent? (reliant on delegating to docker-proxy):

  • IPv6 localhost / [::1] is not a valid address to publish ports (but an explicit -p [::1]:80:80 does output a warning).
    • IPv4 localhost / 127.0.0.1 still provides connectivity (by setting sysctl net.ipv4.${INTERFACE}.route_localnet=1 on docker networks, but there is no equivalent setting for IPv6).
  • IPv6 host connection fails to route to a container with the default ip6tables: false (an improvement security wise vs userland-proxy: true masquerading the source address as the docker network gateway IP).

Improvements to iptables rules to almost reach parity with userland-proxy: true behaviour:


Feature development history

Click to view timeline

Disabling userland-proxy enables hairpin NAT (internally Docker refers to it as HairpinMode).

Configuration differences

Hairpin NAT / HairpinMode

  • userland-proxy: false: Each container has it's virtual interface hairpin_mode enabled:

    # Container interface veth21d71e3:
    cat /sys/class/net/veth21d71e3/brport/hairpin_mode
    # or:
    cat /sys/devices/virtual/net/veth21d71e3/brport/hairpin_mode
  • Enabling this mode is only required for containers with ports published, but seems to be applied regardless. This is to support a container connection that resolves back to the same container (Discussed as Scenario A).

  • Hairpin NAT via userland-proxy: false using this kernel setting approach instead of docker-proxy process (userland-proxy: true) notably improves network performance with containers handling internal network traffic (within the same docker host).

IPTables (NAT => DOCKER)

This hairpin NAT PR adjusted a rule to support hairpin NAT in kernel space? (talks about avoiding userspace through dockerd (docker-proxy?), by removing the ! -i <bridge name> condition of the DOCKER chain nat rule):

# Container on `docker0` network with IP `172.17.0.2` and `-p 80:8080`
# Required rules below for a container to be able to reach itself indirectly via host IP.

# `userland-proxy: true`
# Within the same network/subnet, DNAT is skipped:
iptables -t nat -A DOCKER ! -i docker0 -p tcp --dport 80 -j DNAT --to-destination '172.17.0.2:8080'
# Each network also has an early return rule inserted (added later in 2016 for cross-network support):
# Avoids DNAT (via docker-proxy on 127.0.0.1:80), replacing original RemoteAddr with bridge gateway IP.
iptables -t nat -I DOCKER -i docker0 -j RETURN

# `userland-proxy: false`
# Always apply DNAT (RemoteAddr IP preserved)
iptables -t nat -A DOCKER -p tcp --dport 80 -j DNAT --to-destination '172.17.0.2:8080'
Notes on rule change (click to expand)
userland-proxy true => DOCKER nat chain RETURN + DNAT rules (click to expand)

The RETURN rule was introduced later in Feb 2016, which seems to lack a PR reference, but an equivalent change exists in this PR.

There was a comment about ! -i docker0 change referenced in the later --userland-proxy PR discussion (but no valid permalink was used, and content has since changed).

  • Supposedly it was regarding the PR change removing the line (for ! -i docker0 or similar) in the DOCKER nat chain DNAT rule shown above?
  • That DNAT rule change does restore allowing containers to connect to a container (within the same network via the host IP on the published port).
  • Aug 2013: The DNAT rule now includes -i !docker0 to allow connections to userland-proxy via other interfaces than 127.0.0.1 / lo, which supported containers connecting to each other via published ports.
  • Jan 2014: -i !docker0 rule is discussed, but unclear if related to DNAT rules (seems to be regarding support for icc: false with FORWARD chain adding a DROP at the end, and adding rules for docker run --link ... to DOCKER chain):

    Publicly-exposed port rules should be scoped on -i !docker0, because links rules will always be created after the public port rule, and it will be complicated to place them before the publicly-exposed port rules.

    We'll need a FORWARD rule for public ports with -i !docker0 so it doesn't conflict with the icc config.

For userland-proxy: true:

  • RETURN rule seems required to support containers connecting across separate docker networks. Presumably because these bridge return rules allow falling back to the userland-proxy to route? (which skips NAT on 127.0.0.0/8 due to OUTPUT nat chain rule) It matches the interface belonging to the client container.
  • DNAT rule only differs by excluding a bridged network that has a published port.
    • A connection between containers within the same network lose the container client IP as the RemoteAddr due to this.
    • If input interface was any / * instead, the container client IP would be retained, like it is with userland-proxy: false. Except when the RETURN rule is hit earlier, as this seems to route through the userland-proxy?
  • This comment seems to confirm these rules for userland-proxy: true .
    • docker-proxy handles local traffic due to the RETURN rule (traffic coming from the container), leaving the DNAT is for external traffic.
    • The public host IP is local traffic if the connection is made from within a container (traffic is via docker network interface which hits the RETURN), but external when from the docker host itself (non-docker network, no RETURN rules apply, thus uses DNAT).
    • Container connection across separate docker networks rely on the RETURN to go through docker-proxy process, otherwise the traffic would get caught in FORWARD rules to DOCKER-ISOLATION-STAGE-1 (traffic leaving bridge subnet) to DOCKER-ISOLATION-STAGE-2 (traffic to another docker bridge subnet, other than itself) which matches and DROPs the traffic.
    • Container traffic within the same docker bridge subnet / interface would not hit that DROP rule - as the DNAT has been applied (if RETURN rule is removed), which would avoid docker-proxy and preserve the remote client IP.
      • Except, userland-proxy: true has the DNAT rule exclude traffic from that bridge (eg: -i !docker0), so it does go through the docker-proxy.
      • In userland-proxy: true removing ! -i docker0 from the DNAT rule would fail a container connection to itself indirectly via published port on the host IP, fixed by enabling hairpin_mode on the containers veth interface. The additional POSTROUTING rule to MASQUERADE for a connection source and destination of the container IP is also required.
      • Adjusting the rule to direct traffic to a different container IP (or port) better illustrates that behaviour with a request on the host at 127.0.0.1 (or within the container) via the host IP would still route to the original IP:port (that the docker-proxy instance was configured for) - while a request from the host (or external system) to the host IP would connect to the adjusted DNAT rule address instead. By removing -i !docker0 from this DNAT rule (like userland-proxy: false does), a connection to host IP from a container also connects to the adjusted DNAT address (while 127.0.0.1 / localhost on the host still connects to the original address). This seems to be what this PR was focused on (NOTE: back then RETURN rule was not present).

IPTables (NAT => OUTPUT)

April 2013: PREROUTING + OUTPUT nat chains added -m addrtype --dst-type LOCAL.

  • Without that a curl request to another server would get redirected to a container if the published port was the same (eg: -p 42:80 and curl example.com:42, would send request to container port 80 instead of connecting to the external example.com host).
    • Although, I have not observed behaviour difference when the OUTPUT rule is removed? (Can affect querying the host IP / external interface from the host, but doesn't appear to affect request from container or remote host to the host IP).
    • Same behaviour with ip6tables for accessing a container via curl [::1]:port / curl -6 localhost:port, if equivalent rules have been added by docker.
# userland-proxy: true
iptables -t nat -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER'

# userland-proxy: false
iptables -t nat -A OUTPUT -m addrtype --dst-type LOCAL -j DOCKER
Notes - Investigating history for rule difference (click to expand)

In this Hairpin NAT PR (July 2014), the author mentions (what would later become the userland-proxy: false setting) will need to remove ! -d 127.0.0.0/8 from the OUTPUT nat rule, referencing a concern for it's purpose.

Using git blame the only context for it's additional specificity was in a April 2013 commit message:

Exclude loopback-to-loopback connections from DNAT rules, to allow userland proxying.

  • userland-proxy: true seems to require that to opt-out of docker-proxy instances listening on 127.0.0.1 from applying the NAT rules?
  • The RemoteAddr thus is replaced by the docker network gateway IP address for the target container IP, which places it within the same subnet, avoiding traffic being stopped by DROP rule (via the FORWARD chain) network isolation flow?

Main observations

userland-proxy is primarily focused on supporting connections to published ports via localhost / 127.0.0.1.

  • This will always result in the docker gateway IP as the remote client IP address (aka source address).
  • userland-proxy: false doesn't support IPv6 via loopback (localhost / [::1]).
  • userland-proxy: true can indirectly connect containers across separate networks via published ports.

Security risk with Docker defaults userland-proxy: true + ip6tables: false:
userland-proxy: true: The proxy will replace the remote client IP address with the docker network gateway IP:

  • IPv4 ✔️ - iptables: true prevents external connections appearing to be from the gateway.
  • IPv6 ❌ - ip6tables: true is not enabled by default. Docker hosts reachable via IPv6 thus can unexpectedly treat remote connections as trusted (due to service configs that trust their docker network subnet as private, but are misled when a remote client IP is masqueraded as the gateway IP of that subnet).

This concern appears to have been raised when discussing hairpin mode PR:

I think # 2 is dangerous. The proxy causes all traffic to appear to come from a certain source address.
If an app has some sort of automatic blacklist capability, failing back to this method might cause the app to blacklist all traffic.

As a maintainer of docker-mailserver, I have seen several issues related to this:

  • Postfix becoming an open relay for spammers to abuse. When the user configured networks in Postfix main.cf:mynetworks to trust the internal docker subnet. IPv6 connections to a published port on the host could abuse this, despite the container itself being IPv4 only.
  • Delivery issues when legitimate clients connected over IPv6, but failed a verification step because the HELO hostname resolved IP wasn't matching the remote client IP Postfix was given (docker gateway IP) as the source address.
  • Fail2Ban banning logins from legitimate users because the same docker gateway IP was representing them and bots that triggered the failure.

Behaviour overview (Preserving RemoteAddr / Real IP)

Local connections:

  • Regardless of userland-proxy setting:
    • ✔️ Container to Container (direct IP) preserves RemoteAddr (including container back to self)
    • ❌ Request from host (to direct container IP), or from container back into itself (indirectly via Host IP) has RemoteAddr replaced by the Docker Gateway IP.
    • localhost / 127.0.0.1 as RemoteAddr would be ambiguous between host and container (the Docker Gateway IP helps disambiguate that?)
  • userland-proxy: true:
    • ✔️ Host to Container (indirectly via published port on host IP) preserves RemoteAddr (does not affect connectivity to container via localhost / 127.0.0.1)
      • If Host IP is IPv6, when ip6tables: false (default) the RemoteAddr is the Docker Gateway IP (IPv4 if no IPv6 address assigned to container)
    • Container to Container (indirectly via published port on host IP):
      • RemoteAddr replaced by the Docker Gateway IP associated to the target container (resolvable)
      • ✔️ Can connect across separate networks (without the linked fix, RemoteAddr is the Docker Gateway IP)
  • userland-proxy: false:
    • ❌ Host to Container (indirectly via published port on host IP) is replaced by the Docker Gateway IP (resolvable)
      • If Host IP is IPv6, when ip6tables: false (default) the connection will hang (even if the container has an IPv6 address assigned)
    • Container to Container (indirectly via published port on host IP):
      • ✔️ Preserves RemoteAddr
      • ❌ Cannot connect across separate networks (resolvable)

Remote connections:

  • ✔️ IPv4 only host should not have any surprises.
  • ✔️ Firewalld or UFW when active prevents the below RemoteAddr risk, the remote connection should fail or hang instead (NOTE: Local connections as described earlier are still at risk, which a remote connection may have initiated between services).
  • ⚠️ userland-proxy: true (with hosts that are reachable via IPv6):
    • Risk: Defaults presently replace a remote IPv6 address with the IPv4/IPv6 gateway address of the containers network, which can cause various monitoring and security issues (due to services configured to trust the subnet, which includes the Docker Gateway IP).
    • Fix: Docker network without IPv6 subnet assigned:
      • Use userland-proxy: false, and external IPv6 connections will fail if container network isn't configured for IPv6.
    • Fix: Docker network has configured a private ULA IPv6 subnet:
      • Use ip6tables: true (and presently experimental: true). Original client IPv6 address is now preserved.

@docker-robot
Copy link

docker-robot bot commented Sep 26, 2023

There hasn't been any activity on this issue for a long time.
If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment.
If not, this issue will be closed in 14 days. This helps our maintainers focus on the active issues.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

@polarathene
Copy link
Contributor Author

polarathene commented Sep 26, 2023

Still relevant.

Sorry my system died while I was working on this in June/July. Unfortunately, I also have no remaining bandwidth in my volunteer time to continue revising this further.

I understand the above information is a verbose mess, but I hope in that current form it is still helpful. Many of the issues in the tracking issue for disabling the feature were investigated and many have been resolved.


The last item I recall working on while revising this was a table that better communicated some of the information above.

I had shared an iteration of that earlier, and while the link provides some other information, I'll include that table here too (hope it helps @dvdksn , I recall having iptables rules that corrected some of the ❌ into ✔️ , but have lost that information):


  • userland-proxy: true mostly only matters for local connections, primarily those with localhost.
  • However, it really depends on how you make network requests between containers (and to/from host) as both false and true settings presently have the gateway IP issue in different scenarios (most can be resolved in a way that brings parity, albeit some caveats).

userland-proxy table for preserving client IP vs replacing with gateway IP

userland-proxy Remote (H) Host (L) 2 Host (H) Host (C) Container (H) Container (C) Self (H) Self (C)
true ✔️ 1a ✔️ 1a 4 ✔️ ✔️
false ✔️ 1b 1b, 3 ✔️ 5 ✔️ ✔️
  1. If the Host IP is IPv6 and ip6tables: false (default):
    a. Source Address becomes the Gateway IP
    b. Connection fails (hangs if the container was assigned an IPv6 address)
  2. localhost / 127.0.0.1 as the Source Address would be ambiguous between host and container (the Docker Gateway IP helps disambiguate that?)
  3. Gateway IP (Resolvable)
  4. Gateway IP (Resolvable)
  5. Connection hangs trying to connect across separate docker networks (Resolvable)

Legend

curl request (with source IP originating from the Remote, Host, a separate Container, or the same container Self) to the target container (eg: traefik/whoami) via destination IP:

  • L => localhost (127.0.0.1 / [::1], indirect)
  • H => Host IP (indirect)
  • C => Container IP (direct)

RemoteAddr (Connection source address) is:

  • ✔️ => Preserved.
  • ❌ => Replaced with the Docker Gateway IP associated to the target container.
    • If the docker gateway lacks an IPv6 address, then IPv6 connections receive an IPv4 gateway address instead.

@dvdksn
Copy link
Contributor

dvdksn commented Sep 27, 2023

@polarathene thanks a lot for taking the time to write down this information. It's very useful. I haven't got the capacity to start working on this just yet, but I'll get back to this as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking Relates to anything around networking kind/enhancement Improves the usability of docs lifecycle/frozen
Projects
None yet
Development

No branches or pull requests

3 participants