Skip to content

krun: tweak dns settings#2099

Open
dustymabe wants to merge 2 commits into
containers:mainfrom
dustymabe:dusty-krun-net-and-dns
Open

krun: tweak dns settings#2099
dustymabe wants to merge 2 commits into
containers:mainfrom
dustymabe:dusty-krun-net-and-dns

Conversation

@dustymabe
Copy link
Copy Markdown
Contributor

The commits in this PR try to get some default networking into the krun VM started with podman run --runtime=krun --annotation krun.use_passt=1. First we pass in NET_FLAG_DHCP_CLIENT to krun_add_net_unixstream and then pass in --dns 169.254.1.1 to workaround a problem where if the DNS server is on the local network it won't hit NAT and won't be forwarded (thus unreachable from inside the krun VM).

The hardcoding here of 169.254.1.1 isn't great. It's the default used by podman rootless (so that target is available inside the rootless container namespace, but not the krun VM). I'm interested to know the experts thoughts here and if there is a better alternative.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the libkrun VM configuration to use the NET_FLAG_DHCP_CLIENT flag and overrides the DNS server advertised by passt to use Podman's pasta DNS forwarder address. To ensure compatibility with older build environments and runtime versions of libkrun, it is recommended to provide a fallback definition for NET_FLAG_DHCP_CLIENT and handle fallback scenarios (such as catching -EINVAL) to retry the call without the flag.

Comment thread src/libcrun/handlers/krun.c
@packit-as-a-service
Copy link
Copy Markdown

Ephemeral COPR build failed. @containers/packit-build please check.

dustymabe added 2 commits May 26, 2026 15:45
Remove the --no-dhcp-dns flag from the passt arguments. This flag
was added when passt support was first introduced because libkrun's
init did not yet have a DHCP client. Now that libkrun's init has a
DHCP client (containers/libkrun@1d8429c) that parses DNS servers
from DHCP option 6 and writes them to /etc/resolv.conf, the flag
should be removed so the guest gets working DNS resolution.

Assisted-by: <anthropic/claude-opus-4.6>
Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
Pass --dns 169.254.1.1 to passt so it advertises Podman's pasta
DNS forwarder address via DHCP Option 6. Passt's default behavior
reads /etc/resolv.conf from its mount namespace to discover DNS
servers, but the container rootfs is passed to the VM via virtiofs
and is not available to passt. Without the override, passt falls
back to the host's DNS configuration, which may contain servers
that are not reachable from within the guest VM (e.g., a gateway
on the local subnet that cannot be reached through passt's NAT).

The 169.254.1.1 address is Podman's pasta DNS forwarder, which is
reachable from the guest VM through passt's NAT into the container
network namespace. This is the same address used by Podman's pasta
integration in containers/common (see podman-container-tools/podman@079bfb085a).

Assisted-by: <anthropic/claude-opus-4.6>
Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
@dustymabe dustymabe force-pushed the dusty-krun-net-and-dns branch from 112d741 to d0a5fea Compare May 26, 2026 19:45
@dustymabe dustymabe changed the title krun: enable DHCP client and tweak dns settings krun: tweak dns settings May 26, 2026
@dustymabe
Copy link
Copy Markdown
Contributor Author

ok. looks like #2093 was happening in parallel here so I've dropped the first commit from the PR here.

@giuseppe
Copy link
Copy Markdown
Member

@slp PTAL

@slp
Copy link
Copy Markdown
Contributor

slp commented May 28, 2026

This is a step in the right direction, thanks @dustymabe. Ideally, we should also use aardvark-dns instance as DNS server, instead of the host's DNS, to ensure different containers using the same network can find each other's hostnames (this works with TSI).

@sbrivio-rh any suggestions?

@sbrivio-rh
Copy link
Copy Markdown

pass in --dns 169.254.1.1 to workaround a problem where if the DNS server is on the local network it won't hit NAT and won't be forwarded (thus unreachable from inside the krun VM).

I'm not quite sure what the problem would be here. If the DNS server is on the local network, and it's not a loopback address, it will anyway reach pasta and be forwarded (with translation, if necessary) to the resolver pasta derives from the host configuration. What is the actual issue with that?

Ideally, we should also use aardvark-dns instance as DNS server, instead of the host's DNS, to ensure different containers using the same network can find each other's hostnames (this works with TSI).

@sbrivio-rh any suggestions?

I didn't quite understand what problem exactly we're fixing with this. Dropping --no-dhcp-dns sounds like a good idea in any case, and if the DHCP client is copied from the one I originally wrote for muvm it should be enough for everything to work.

Regardless of that: maybe it would be a good idea to stick to exactly the same as Podman does for simplicity. It took us a long time to get that right and it should fit as it is in the the krun situation as well if I understand correctly.

@slp
Copy link
Copy Markdown
Contributor

slp commented May 28, 2026

@sbrivio-rh any suggestions?

I didn't quite understand what problem exactly we're fixing with this. Dropping --no-dhcp-dns sounds like a good idea in any case, and if the DHCP client is copied from the one I originally wrote for muvm it should be enough for everything to work.

The problem is, in this scenario, passt is using the host's DNS server instead of the aardvark-dns instance for the netns, meaning local container name resolution doesn't work. That is, if you have two containers with name test1 and test2 configured to use the same network (with --network demo), they can't resolve each other's hostnames.

@sbrivio-rh
Copy link
Copy Markdown

The problem is, in this scenario, passt is using the host's DNS server instead of the aardvark-dns instance for the netns

Right, and you're suggesting that we should switch to the aardvark-dns resolver. But the current pull request doesn't do that, correct?

I'm still trying to understand what problem specifically a particular --dns option in the current pull request is solving.

@dustymabe
Copy link
Copy Markdown
Contributor Author

I'm still trying to understand what problem specifically a particular --dns option in the current pull request is solving.

I can try to answer that. I am certainly no expert here and sometimes "networking" makes my head hurt, so there's that disclaimer.

The problem that I was having is that before the 2nd commit in this PR I couldn't reach the DNS server (which also happened to be my gateway for the local network, the router), but slotting in a public DNS server like 8.8.8.8 worked fine.

I was chatting through this problem with opencode/anthropic/opus4.6 and this is what it came up with:

Root cause                                                                                                       
                                                                                                                 
Passt operates in rootless mode by creating its own L2/L3 network stack. It does not have raw access to your     
local network — it can only make outbound connections from the host's network namespace using regular (          
unprivileged) sockets. For remote hosts like 8.8.8.8, passt creates a socket on the host and connects — this     
works because the host's kernel handles routing.                                                                 
                                                                                                                 
But 192.168.86.1 is your local gateway/router. The issue is how passt handles traffic to addresses on the local  
network. In rootless mode, passt translates guest traffic by creating host-side sockets. For DNS (UDP port 53),  
passt needs to create a UDP socket on the host and send to 192.168.86.1:53. This should work in principle — but  
there's a known interaction:                                                                                     
                                                                                                                 
The container runs inside Podman's network namespace, not the host namespace. Passt is started by crun inside    
this namespace. When passt creates sockets to forward guest traffic, those sockets exist in the container's      
network namespace, which has its own network stack (provided by pasta/slirp4netns). From that namespace, 192.168.
86.1 may not be directly routable — pasta provides internet access through NAT, but the container namespace      
doesn't have a direct L2 path to your LAN gateway.                                                               

My understanding of the 2nd commit (and believe me there's a lot of handwaving here from my side) is that the 169.254.1.1 IP is available inside the container namespace (i.e. not inside the VM but in the shell outside of it) set up by podman just like it does for rootless containers by default and that path is accessible to us because it doesn't need to leave the container namespace but already has the right mapping set up by podman.

I can try to reproduce the full conversation I had with AI somewhere if that would be useful.

Additionally, I will note that the 2nd commit in this PR might not be the correct solution (and I appreciate the conversation trying to fight the correct solution), but it has been working nicely for me for the past few days and I've finally gotten to the workflow I desire described in containers/libkrunfw#128 (comment)

@sbrivio-rh
Copy link
Copy Markdown

I can try to reproduce the full conversation I had with AI somewhere if that would be useful.

Thanks for the offer, but that part is what's confusing, because the very "assumptions" from that excerpt look somewhat plausible but they range from misleading ("For remote hosts like 8.8.8.8, passt creates a socket on the host and connects — this works because the host's kernel handles routing. But 192.168.86.1 is your local gateway/router [...]") to plain wrong ("This should work in principle — but there's a known interaction", "pasta provides internet access through NAT").

Anyway, I think I figured out what the problem is: by default (unless --no-map-gw is given), passt will map the address of the default gateway in the guest to the host itself, host-side. That would work fine and we'd forward DNS queries to the namespace where the guest is running, but, from there, we would have a further forwarding step (I think that's happening via pasta(1) here, like with regular Podman containers?) that can't work, because pasta (or whatever forwards traffic from the namespace) isn't configured to forward that traffic.

So, yes, something like what you did is needed. I'm not quite sure if that's sufficient to reach aardvark-dns, Podman networking people might help with that. Let me ask around.

@dustymabe
Copy link
Copy Markdown
Contributor Author

Thanks @sbrivio-rh for helping me understand!

@slp
Copy link
Copy Markdown
Contributor

slp commented May 29, 2026

I've been digging a bit more on the issue. Let's say we have a VM with this config:

/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop qlen 1000
    link/ether fa:d6:d5:c1:68:28 brd ff:ff:ff:ff:ff:ff
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc pfifo_fast qlen 1000
    link/ether 5a:94:ef:e4:0c:ee brd ff:ff:ff:ff:ff:ff
    inet 10.89.2.167/24 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5894:efff:fee4:cee/64 scope link tentative
       valid_lft forever preferred_lft forever
/ # ip route
default via 10.89.2.1 dev eth0
10.89.2.0/24 dev eth0 scope link  src 10.89.2.167
/ # cat /etc/resolv.conf
search dns.podman .
nameserver 10.89.2.1

When the VM attempts to communicate with the DNS server at 10.89.2.1, passt translates the destination address to 127.0.0.1:

epoll_wait(3, [{events=EPOLLIN, data=0xc0b}], 8, 1000) = 1
recvfrom(12, "\0\0\0P\232U\232U\232UZ\224\357\344\f\356\10\0E\0\0B7\334@\0@\21\351u\nY"..., 8388608, MSG_DONTWAIT, NULL, NULL) = 168
socket(AF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_UDP) = 36355
setsockopt(36355, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(36355, SOL_IP, IP_RECVERR, [1], 4) = 0
setsockopt(36355, SOL_IP, IP_PKTINFO, [1], 4) = 0
bind(36355, {sa_family=AF_INET, sin_port=htons(36118), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 36355, {events=EPOLLIN, data=0x7008e0306}) = 0
connect(36355, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(36355, {sa_family=AF_INET, sin_port=htons(36118), sin_addr=inet_addr("127.0.0.1")}, [28 => 16]) = 0
setsockopt(36355, SOL_IP, IP_TTL, "@", 1) = 0
sendmmsg(36355, [{msg_hdr={msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=16, msg_iov=[{iov_base="Wf\1\0\0\1\0\0\0\0\0\0\6google\2es\3dns\6podma"..., iov_len=38}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=38}, {msg_hdr={msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=16, msg_iov=[{iov_base="Y\236\1\0\0\1\0\0\0\0\0\0\6google\2es\3dns\6podma"..., iov_len=38}], msg_iovlen=1, msg_controllen=0, msg_flags=0}}], 2, MSG_NOSIGNAL) = 1
sendmmsg(36355, [{msg_hdr={msg_name={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, msg_namelen=16, msg_iov=[{iov_base="Y\236\1\0\0\1\0\0\0\0\0\0\6google\2es\3dns\6podma"..., iov_len=38}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, msg_len=38}], 1, MSG_NOSIGNAL) = 1
recvmsg(36355, {msg_namelen=28}, MSG_PEEK|MSG_DONTWAIT) = -1 ECONNREFUSED (Connection refused)

Changing /etc/resolv.conf to set nameserver to a random IP other than 10.89.2.1 makes passt actually attempt to connect to that IP instead of 127.0.0.1, so the translation only happens for that particular address (in this context).

I've tried started passt with -D none, but it makes no difference.

This is the network configuration in the netns where passt is running:

[root@minis crun]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host proto kernel_lo
       valid_lft forever preferred_lft forever
2: eth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 qdisc noqueue state UP group default qlen 1000
    link/ether 76:f2:08:45:cd:b4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.89.2.167/24 brd 10.89.2.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::74f2:8ff:fe45:cdb4/64 scope link proto kernel_ll
       valid_lft forever preferred_lft forever
[root@minis crun]# ip route
default via 10.89.2.1 dev eth0 proto static metric 100
10.89.2.0/24 dev eth0 proto kernel scope link src 10.89.2.167

@sbrivio-rh any idea why passt is translating 10.89.2.1 to 127.0.0.1 for DNS traffic?

@sbrivio-rh
Copy link
Copy Markdown

sbrivio-rh commented May 29, 2026

@sbrivio-rh any idea why passt is translating 10.89.2.1 to 127.0.0.1 for DNS traffic?

That's not specific to DNS traffic, it's the default behaviour (unless you pass --no-map-gw) for traffic directed to the address of the default gateway in the guest, which we translate to the host:

that was picked as an address to refer to "the host" mostly for KubeVirt convenience, as they needed a valid known address without an explicit configuration, and usually the default gateway is not something you would try to reach directly as a connection destination from KubeVirt pods.

This is just the default, and it can be changed in two ways:

  • use --no-map-gw (Podman also does, to avoid that users inadvertently expose host ports from containers). At that point traffic to 10.89.2.1 will not be translated and sent to 10.89.2.1, outside
  • use --dns-forward 10.89.2.1, so that passt knows that guest traffic directed to 10.89.2.1 might be DNS traffic which needs to be forwarded to the resolver (not necessarily the host) and will skip the default remapping I described above

I haven't completely thought this through but if you don't need / want an option for the guest to connect to the host, --no-map-gw is probably a good idea regardless of DNS matters.

If you want to use another address to refer to the host, by the way, you can use --map-guest-addr. Podman started using that in 5.3 (https://blog.podman.io/2024/10/podman-5-3-changes-for-improved-networking-experience-with-pasta/). If /etc/hosts is similar to the one from Podman, I guess krun might find it convenient as well.

@slp
Copy link
Copy Markdown
Contributor

slp commented May 29, 2026

Adding --no-map-gw did the trick, indeed, thanks @sbrivio-rh !

So I think we need to:

  1. Drop the changes in this PR and add --no-map-gw instead, while keeping --no-dhcp-dns.
  2. Change libkrun's init/init.c to avoid overwriting /etc/resolv.conf if the DHCP server doesn't provide DNS information. I want to make this one part of 1.18.2.

This would allows us to preserve the /etc/resolv.conf generated by podman, which is the only source of truth for DNS resolution.

@sbrivio-rh
Copy link
Copy Markdown

Drop the changes in this PR and add --no-map-gw instead, while keeping --no-dhcp-dns.

I wonder if keeping --no-dhcp-dns is a good idea in terms of isolation: exposing /etc/resolv.conf directly to the guest might reveal more than what's strictly needed for the functionality (possibly making fingerprinting easier, for version of Podman or other tools).

If you isolate that over DHCP / NDP / DHCPv6, passt validates the information and tells the guest only what it needs over the network and in a simpler (binary) format, with no separate channel involved.

@slp
Copy link
Copy Markdown
Contributor

slp commented May 29, 2026

Drop the changes in this PR and add --no-map-gw instead, while keeping --no-dhcp-dns.

I wonder if keeping --no-dhcp-dns is a good idea in terms of isolation: exposing /etc/resolv.conf directly to the guest might reveal more than what's strictly needed for the functionality (possibly making fingerprinting easier, for version of Podman or other tools).

If you isolate that over DHCP / NDP / DHCPv6, passt validates the information and tells the guest only what it needs over the network and in a simpler (binary) format, with no separate channel involved.

Both passt and the libkrun are running within the container context, so the /etc/resolv.conf that's exposed is the one from the container, not the host's. That's also why --no-map-gw fits so well in this context since, at the same time, we also have a pasta instance managing the container's netns.

This is the same thing we do with TSI. We don't replace the container's boundaries with ours, we add our own on top of the container's.

@sbrivio-rh
Copy link
Copy Markdown

Drop the changes in this PR and add --no-map-gw instead, while keeping --no-dhcp-dns.

I wonder if keeping --no-dhcp-dns is a good idea in terms of isolation: exposing /etc/resolv.conf directly to the guest might reveal more than what's strictly needed for the functionality (possibly making fingerprinting easier, for version of Podman or other tools).
If you isolate that over DHCP / NDP / DHCPv6, passt validates the information and tells the guest only what it needs over the network and in a simpler (binary) format, with no separate channel involved.

Both passt and the libkrun are running within the container context, so the /etc/resolv.conf that's exposed is the one from the container, not the host's.

Yes, I get it, that's why I was mentioning possible fingerprinting of Podman's version (not of NetworkManager or whatever is configuring things on the host). I still feel like handling this over network protocols is a marginally more "isolated" way of doing things.

That's also why --no-map-gw fits so well in this context since, at the same time, we also have a pasta instance managing the container's netns.

Oh dear, another case of passt-in-pasta. 😄 I thought we only had that in our upstream tests.

This is the same thing we do with TSI. We don't replace the container's boundaries with ours, we add our own on top of the container's.

I see, it's just that with passt you could add a bit more boundaries than that. I'm not sure if it's worth it, perhaps in this case it's really not substantial.

@dustymabe
Copy link
Copy Markdown
Contributor Author

Trying to track the conversation here (that is way over my head)... are you saying we want something like:

diff --git a/src/libcrun/handlers/krun.c b/src/libcrun/handlers/krun.c
index 5e1f3e54..34d309b1 100644
--- a/src/libcrun/handlers/krun.c
+++ b/src/libcrun/handlers/krun.c
@@ -567,7 +567,7 @@ libkrun_start_passt (void *cookie, libcrun_container_t *container)
 {
   struct krun_config *kconf = (struct krun_config *) cookie;
   pid_t pid;
-  char *passt_argv[9];
+  char *passt_argv[10];
   char fd_as_str[16];
   int use_passt;
   int argv_idx;
@@ -595,9 +595,13 @@ libkrun_start_passt (void *cookie, libcrun_container_t *container)
     {
       passt_argv[argv_idx++] = (char *) "-u";
       passt_argv[argv_idx++] = (char *) "all";
-      passt_argv[argv_idx++] = (char *) "--no-dhcp-dns";
     }
 
+  /* Please help me write a good comment here!!!
+   */
+  passt_argv[argv_idx++] = (char *) "--no-dhcp-dns";
+  passt_argv[argv_idx++] = (char *) "--no-map-gw";
+
   passt_argv[argv_idx++] = (char *) "--fd";
   passt_argv[argv_idx++] = fd_as_str;
   passt_argv[argv_idx] = NULL;

@sbrivio-rh
Copy link
Copy Markdown

Trying to track the conversation here (that is way over my head)... are you saying we want something like:
[...]

@slp is suggesting something like that, yes.

I'm suggesting that we could drop --no-dhcp-dns and just add --no-map-gw but I haven't tested it or thought about possible integration issues or other consequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants