Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS Resolution Issues #25

Closed
benalexau opened this issue Feb 13, 2021 · 8 comments
Closed

DNS Resolution Issues #25

benalexau opened this issue Feb 13, 2021 · 8 comments

Comments

@benalexau
Copy link

Thanks for adding support for unlocking using a Tang server specified using a DNS-resolvable hostname address (issue #19).

I have now tested this with a newly-built server as follows:

  • Arch Linux x86_64
  • Booster package 0.2-1 from official Arch repo
  • DHCP address reservation of 192.168.110.104/24
  • DHCP provides gateway address 192.168.110.1
  • DHCP provides DNS address 192.168.250.1
  • Ethernet port is eno1
  • Unused ethernet port enp1s0
  • DNS server resolves http://the.dns.name to a Tang server on same subnet
  • Clevis keyslot 1 configured with a http://the.dns.name address (ie hostname, not numeric IP)

The /etc/booster.yaml contains two lines:

network:
  dhcp: on

A forced rebuilt was performed using booster -force -output /boot/booster-linux.img.

The boot failed with the following:

Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on [::1]:53: connect: cannot assign requested address
Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on [::1]:53: connect: cannot assign requested address
Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on [::1]:53: connect: cannot assign requested address
Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on [::1]:53: connect: network is unreachable
Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on [::1]:53: connect: network is unreachable
Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on 192.168.250.1:53: dial udp 192.168.250.1:53: connect: network is unreachable
Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on 192.168.250.1:53: dial udp 192.168.250.1:53: connect: network is unreachable
.... more messages as above.....
Enter passphrase for cryptroot: unable to initialize network interface eth0: DHCP: no ACK received

I attempted to provide a static network configuration as follows (and of course rebuilt the image):

network:
  dhcp: off
  ip: 192.168.110.104/24
  gateway: 192.168.110.1
  dns_servers: 192.168.250.1

On this occasion I receive:

unable to initialize network interface eth1: file exists
Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on 192.168.250.1:53: read udp 192.168.110.104:53967->192.168.250.1:53: i/o timeout
Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on 192.168.250.1:53: read udp 192.168.110.104:37372->192.168.250.1:53: i/o timeout
.... more messages as above.....

The above caused some minutes of blocking the boot waiting for the I/O timeouts to pass. It might be desirable to use a different timeout approach (eg abandon after 30 seconds).

Thinking it is perhaps an issue that the DNS server is on a different subnet than the server's IP address, I enabled DNS resolution on 192.168.110.1 and set DHCP to return that. After booting I confirmed:

$ resolvectl status
Global
           Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported                                
    resolv.conf mode: foreign                                                                       
Fallback DNS Servers: 1.1.1.1 9.9.9.10 8.8.8.8 2606:4700:4700::1111 2620:fe::10 2001:4860:4860::8888

Link 2 (enp1s0)
Current Scopes: none                                                        
     Protocols: -DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported

Link 3 (eno1)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6                                   
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.110.1                                               
       DNS Servers: 192.168.110.1

I then verified the internal DNS address of the Tang server resolves correctly via a ping. This was done to rule out any firewall, routing or DNS server issues.

I then edited the dns_servers: to 192.168.110.1 (ie maintaining a static IP configuration), rebuilt and rebooted:

**unable to initialize network interface eth1: file exists
Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on 192.168.110.1:53: read udp 192.168.110.104:57918->192.168.110.1:53: i/o timeout
Post "http://the.dns.name/rec/ABCDetc": dial tcp: lookup the.dns.name on 192.168.110.1:53: read udp 192.168.110.104:45474->192.168.110.1:53: i/o timeout
.... more messages as above.....

As shown it still didn't work despite a completely static network configuration in booster.yaml and the DNS server being on the same subnet.

I then switched Booster back to the minimal /etc/booster.yaml:

network:
  dhcp: on

The server then booted without a problem (ie DHCP assignment of a DNS server on the same subnet).

I then modified the DHCP server to return DNS server 192.168.250.1 (like we started with) and rebooted. This failed with the same messages as seen originally. When I changed the DHCP server to again return DNS server 192.168.110.1 and rebooted, the server booted fine once again.

In conclusion DNS resolution currently appears require two conditions:

  1. The DNS server is on the same subnet as the booting node; and
  2. The booting node acquires its address information over DHCP (not from /etc/booster.yaml)

I'm happy to help with testing an updated package if you wish.

@anatol
Copy link
Owner

anatol commented Feb 14, 2021

Hi @benalexau thank you for this great high-quality issue report.

I actually never tried a configuration with 2 ethernet interfaces.

the DNS server is on a different subnet than the server's IP address

I do not think it should be a problem. Requests to DNS should be handled the same way as any other IP endpoint. Any non-subnet IP packets should be routed to the default gateway.

Unused ethernet port enp1s0

Having inactive ethernet ports is definitely an issue for the booster network init code. Currently booster listens for udev events and once it receives ADD event for a net device it starts initializing it. If it is unable to initialize the interface booster returns an error which we see at eth0: DHCP: no ACK received.

Booster should have a way to specify active interface(s). The other interfaces should be ignored.

Note that in your case the systems sees so called predictable network name while booster sees raw interface names (before the systemd rename).

A quick test/fix for this part would be modifying booster's udevListener() loop and filter out the inactive interface (eth0):

if iface=="eth0" {
  continue
}

I am travelling next week and my response will be slow. Feel free to modify the sources as mentioned above. Otherwise I will try it once I am back to my workstation.

@anatol
Copy link
Owner

anatol commented Feb 14, 2021

One way to debug the issue and make sure that it will not come back is to have an integration test to cover this use-case.

Is there a way to reproduce the setup you have (2 ethernet ports + DNS outside of subnet) with QEMU?

@anatol
Copy link
Owner

anatol commented Feb 14, 2021

Actually reading booster's DHCP handing code I realized that this codepath does not set the gateway. So yeah it is an issue with having DNS outside of the subnet. And it should be fixed.

Here is how the default route initialized with static configuration. runDhcp() should do something similar.

defaultRoute := netlink.Route{Gw: gw}
if err := netlink.RouteAdd(&defaultRoute); err != nil {
   return err
}

anatol added a commit that referenced this issue Feb 27, 2021
If DNS/Tang server is in subnet that differs from the the interface IP
then default gateway is required. Static network configuration was able
to configure the gateway but DHCP codepath missed it.

Issue #25
@anatol
Copy link
Owner

anatol commented Feb 27, 2021

@benalexau the first issue (DNS outside of subnet) should be fixed now. PTAL. I do not have an integration test for this use-case unfortunately. I need qemu to create 2 network interfaces with DHCP and DNS outside of subnet. If you or someone else knows how to do it please share your knowledge.

As of the second issue (handling multiple network interfaces) I need to think how to implement it correctly. booster needs a config option that specifies what interfaces should be enabled. One caveat with it is that Linux can rename interface names so we need a reliable way to identify interfaces. Note that network MAC address might be changed as well.

@anatol anatol closed this as completed in 6bb6b6b Mar 6, 2021
@benalexau
Copy link
Author

I just ran a test as follows:

  • Server @ 192.168.110.101/24 running AUR:booster-git 0.2.r13.gd78ff6a-1
  • DNS @ 192.168.1.1/24 assigned by DHCP
  • Tang @ 192.168.50.105/24 resolved by DNS

So we have Booster receiving a DHCP address on one subnet, talking to a DNS server on a different subnet, and then communicating with a Tang server on a third subnet. This all worked fine!

Thanks @anatol!

anatol added a commit that referenced this issue Mar 6, 2021
If a machine contains multiple network interfaces it is desirable to specify
which one can be used at boot (e.g. for network binding).
And disable the rest of the networks to avoid messing with network initialization.

Add property 'interfaces:' that tells booster which network interface to enable.

Closes #25
@anatol
Copy link
Owner

anatol commented Mar 6, 2021

It sounds great @benalexau

The second part of the issues (handling multiple conflicting interfaces) should be resolved now a well. Please take a look and let me know if you see any issues.

@benalexau
Copy link
Author

@anatol I tested AUR:booster-git 0.3.r20.g764b6ab-1 on the above server and it did not give any errors or unusual messages during boot. I think it's working well.

@anatol
Copy link
Owner

anatol commented Mar 7, 2021

Great to hear it. Thank you for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants