Skip to content

Configurable network name and address range#4711

Merged
vito merged 12 commits into
dagger:mainfrom
vito:fix-nested-networks
Mar 22, 2023
Merged

Configurable network name and address range#4711
vito merged 12 commits into
dagger:mainfrom
vito:fix-nested-networks

Conversation

@vito
Copy link
Copy Markdown
Contributor

@vito vito commented Mar 6, 2023

Follow-up to #4697

This might be a bit overkill, but it felt like a useful exercise just to get us away from global assumptions around the network stack.

  • The engine has two new flags --network-name (default dagger) and --network-cidr (default 10.87.0.0/16).
  • We now start dnsmasq on boot, rather than managing its state in the dnsname CNI plugin and leaving a window of time where /etc/resolv.conf points to an IP without dnsmasq listening yet.
  • We now configure /etc/resolv.conf by mounting over it, which should allow for interesting things like running a Dagger engine (or multiple?) on the same Linux host but wrapped in unshare -m so it doesn't mess with everything else's DNS. (Might try this on my Linux machine for testing purposes, just for fun.)
  • Container hostnames returned from the API are no longer affixed with the Dagger DNS domain (dagger.local). I'm 95% sure we don't actually need it, especially considering some environments (k8s) configure ndots:5, and it introduces a very tight coupling between the schema and the engine.

@vito vito requested a review from sipsma March 6, 2023 18:37
@vito vito force-pushed the fix-nested-networks branch from 0aa18b9 to 3667aed Compare March 6, 2023 19:20
@vito
Copy link
Copy Markdown
Contributor Author

vito commented Mar 6, 2023

Dang, looks like this didn't fix the network issue:

Shelving this for now, will get back to it sometime.

@vito vito force-pushed the fix-nested-networks branch 4 times, most recently from 5fbbd2d to 138f7fa Compare March 9, 2023 23:19
@vito
Copy link
Copy Markdown
Contributor Author

vito commented Mar 13, 2023

@sipsma (Low priority:) this seems ready to go whenever you have a moment to review. Ran the engine tests 5 times with services re-enabled for the remote cache test, no failures. Previously it would fail somewhat reliably (~> 50%).

@vito vito requested a review from aluzzardi March 13, 2023 17:14
@vito vito force-pushed the fix-nested-networks branch from 13d119e to 5550090 Compare March 15, 2023 15:49
vito added 7 commits March 21, 2023 16:37
Signed-off-by: Alex Suraci <alex@dagger.io>
note: leaving services network disabled since the actual issue still
isn't fixed, but this is likely needed regardless

Signed-off-by: Alex Suraci <alex@dagger.io>
Signed-off-by: Alex Suraci <alex@dagger.io>
Signed-off-by: Alex Suraci <alex@dagger.io>
Signed-off-by: Alex Suraci <alex@dagger.io>
should prevent "iptables: Resource temporarily unavailable"

all of these iptables calls used --wait, but it wasn't working because
they all had their own lock file.

Signed-off-by: Alex Suraci <alex@dagger.io>
still stuck on Docker 20.10.23 with Nixpkgs/NixOS. :( tried upgrading
locally but spooky things happened instead.

Signed-off-by: Alex Suraci <alex@dagger.io>
@vito vito force-pushed the fix-nested-networks branch from 5550090 to 7a7d48c Compare March 21, 2023 20:37
Comment thread network/resolvconf.go Outdated
Comment on lines 38 to 41
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: In most circumstances I don't believe you need to unmount in order to bind mount over something, you can just bind mount right over it and it will "stack". If we can avoid a umount call here I think that's preferable all else being equal because there's some just generally confusing+convoluted stuff around whether umount can block (and various flags like DETACH that may help but also add more complication).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I had it this way because my original approach involved unmounting and writing to the file directly, but since we're mounting over it instead we can just let it stack. 👍

Comment thread network/resolvconf.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for defense in depth may as well set this to have 0600 perms and/or mount the final file with MS_RDONLY

Comment thread network/dnsmasq.go Outdated
if err != nil {
return err
}
args := []string{"-i", iface, "-p", "udp", "-m", "udp", "--dport", "53", "-j", "ACCEPT"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this is needed, is it just enforcing that dns packets have to be allowed on this interface? I would have assumed that was already the case pretty much all the time since blocking them would break a lot, but I'm probably missing something.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly not even sure myself. It was there in the original dnsname plugin code, confusingly tied to the dnsmasq.conf setup, so I kept it: https://github.com/dagger/dagger/pull/4711/files#diff-75e7590bb9ac9d43bb60200ec283d75ac9d28e5ade434a38ea8a6342b8f6956cL60

I can try removing it and see if anything breaks. 🤷‍♂️ Maybe the dnsname plugin had to run in certain places that needed this? It's been there since the original commit so there's no insightful 'blame' for it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also I totally had ChatGPT explain what this command did a while back, which it did pretty well!)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try removing it and see if anything breaks. 🤷‍♂️ Maybe the dnsname plugin had to run in certain places that needed this? It's been there since the original commit so there's no insightful 'blame' for it.

Cool yeah may as well cut the cargo culting cord at this point I suppose. The less manual fiddling with iptables the better probably

Comment thread network/cni.go Outdated
Comment on lines +59 to +67
pidFile, err := xdg.RuntimeFile("dagger/net/" + name + "/dnsmasq.pid")
if err != nil {
return nil, err
}

hostsFile, err := xdg.RuntimeFile("dagger/net/" + name + "/hosts")
if err != nil {
return nil, err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can dedupe the definitions of these paths InstallDnsmasq

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - refactored all of these paths into a separate file with docs for each.

Comment thread core/container.go
}
hostname := hostHash(digest)
payload.Hostname = fmt.Sprintf("%s.%s", hostname, network.DNSDomain)
payload.Hostname = hostname
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking my own understanding, the hostname no longer has .dagger.local, but when a process tries to resolve the hostname it automatically appends .dagger.local (or whatever setting the user has overridden) to the dns request because .dagger.local is marked in the resolv.conf as being the local domain? Or something like that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually just don't need to use the domain at all now; it's not even configured in the /etc/resolv.conf we set up. Now that everything is routed directly to dnsmasq with no other servers listed in resolv.conf there should never be any ambiguity.

There might be a small benefit in including it and installing our domain into the search list, because it would allow us to skip the domain searching routine if the user's original /etc/resolv.conf has them configured (which we preserve), but even that benefit only applies as long as they don't also have options ndots:N set (k8s defaults to ndots:5). So it's a bit of a wash.

The benefit of not including it, on the other hand, is that we can keep core/ decoupled from the engine configuration, which definitely feels a lot nicer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay interesting, I got confused when I saw we still set something in the dnsmasq template related to the domain, but that all makes sense, thanks for the explanation!

Comment thread core/integration/remotecache_test.go
Signed-off-by: Erik Sipsma <erik@dagger.io>
Copy link
Copy Markdown
Contributor

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the issue that was causing tests to fail: #4711 (comment)

All my other comments are nits and/or questions, so LGTM!

Comment thread network/resolvconf.go Outdated
return fmt.Errorf("replace nameservers: %w", err)
}

upstreamResolv, err := touchXDGFile("dagger/net/" + name + "/resolv.conf.upstream")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized AppArmor isn't going to like this since now we'll have dynamic resolv.conf paths.

I guess I could move these to /etc/dnsmasq.d/? https://gitlab.com/apparmor/apparmor/-/blob/master/profiles/apparmor.d/usr.sbin.dnsmasq#L43

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with paths specifically designated for the dnsname plugin that this PR vendors and cannibalizes. Hopefully they don't remove them, since dnsname is technically unmaintained now. All the more reason to have some sort of smoke test for this. 😬

vito added 4 commits March 21, 2023 18:14
don't actually need to umount, so might as well skip it

Signed-off-by: Alex Suraci <alex@dagger.io>
is this load-bearing? only one way to find out!

Signed-off-by: Alex Suraci <alex@dagger.io>
Signed-off-by: Alex Suraci <alex@dagger.io>
Signed-off-by: Alex Suraci <alex@dagger.io>
@vito vito merged commit 0fa00a1 into dagger:main Mar 22, 2023
@vito vito deleted the fix-nested-networks branch March 22, 2023 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants