Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/networking: use one line per IP in /etc/hosts #119236

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alyssais
Copy link
Member

@alyssais alyssais commented Apr 12, 2021

Motivation for this change

From hosts(5) (emphasis mine):

For each host a single line should be present with the following
information:

Prior to this change, my hosts file looked like this:

  127.0.0.1 localhost
  ::1 localhost
  127.0.0.2 atuin.qyliss.net atuin
  ::1 atuin.qyliss.net atuin

After this change, it looks like this:

  127.0.0.1 localhost
  ::1 localhost atuin.qyliss.net atuin
  127.0.0.2 atuin.qyliss.net atuin

Having multiple lines for the same IP breaks glibc's gethostbyaddr.
The easiest way to demonstrate this is with Python, but a simplified C
program is provided at the end of this message too.

$ python3 -c 'import socket; print(socket.gethostbyaddr("::1"))'
('localhost', [], ['::1'])

With this fix applied:

$ python3 -c 'import socket; print(socket.gethostbyaddr("::1"))'
('localhost', ['atuin.qyliss.net', 'atuin'], ['::1'])

As a higher level example, socket.getfqdn() will return 'localhost'
without this change, and 'atuin.qyliss.net' with it. This was
responsible for my Mailman instance sending mail with @localhost in
the Message-Id.

But! This exposes a problem. After this change, hostname -f will
return localhost. Worse, this won't be caught by the "hostname" NixOS
test, because that installs inetutils, which comes with its own hostname
implementation that will override the default one.

So I'm not sure what to do here. It's important we make this change so
that /etc/hosts is actually valid, but we need to make hostname -f work
as well.

Our options as I see it are:

  • Try to debug and fix hostname (the default one is from net_utils).
  • Switch to using inetutils for hostname.

Either way, we should make sure the hostname test actually uses the
hostname implementation that NixOS uses by default.

Maybe there's something else? cc @primeos @flokli, who fixed hostname -f
before.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Also CCing the people who helped me debug this: @puckipedia
@leahneukirchen.

@primeos
Copy link
Member

primeos commented Apr 12, 2021

Having multiple lines for the same IP breaks glibc's gethostbyaddr.

$ python3 -c 'import socket; print(socket.gethostbyaddr("::1"))'
('localhost', [], ['::1'])

That is actually expected, see #76542:

nixos/tests/hostname: Also check that 127.0.0.1 and ::1 still resolve back to localhost as this is apparently required by some applications (see c578924)

IIRC I didn't like the current behaviour either and I'm open to changing it but that obviously requires careful consideration as it could cause many regressions (#76542 already caused more regressions than expected :o).

After this change, hostname -f will
return localhost.

That should be fixable, e.g. if /etc/hosts looks like this (the canonical hostname / FQDN has to come first):

127.0.0.1 atuin.qyliss.net atuin localhost
::1 localhost atuin.qyliss.net atuin localhost

Worse, this won't be caught by the "hostname" NixOS
test, because that installs inetutils, which comes with its own hostname
implementation that will override the default one.

Oh, that's bad :o

So I'm not sure what to do here.

Yeah unfortunately these things are pretty difficult as the code and documentation around this is pretty outdated and it's easy to miss some consequences/regressions.

It's important we make this change so
that /etc/hosts is actually valid, but we need to make hostname -f work
as well.

AFAIK /etc/hosts is already valid (just a bit strange), though IIRC I later discovered one hostname function that returned multiple matches due to it (maybe localhost and the FQDN for 127.0.0.1/::1? - not sure anymore).

From hosts(5) (emphasis mine):

> For each host a *single* line should be present with the following
> information:

Prior to this change, my hosts file looked like this:

      127.0.0.1 localhost
      ::1 localhost
      127.0.0.2 atuin.qyliss.net atuin
      ::1 atuin.qyliss.net atuin

After this change, it looks like this:

      127.0.0.1 localhost
      ::1 localhost atuin.qyliss.net atuin
      127.0.0.2 atuin.qyliss.net atuin

Having multiple lines for the same IP breaks glibc's gethostbyaddr.
The easiest way to demonstrate this is with Python, but a simplified C
program is provided at the end of this message too.

	$ python3 -c 'import socket; print(socket.gethostbyaddr("::1"))'
	('localhost', [], ['::1'])

With this fix applied:

	$ python3 -c 'import socket; print(socket.gethostbyaddr("::1"))'
	('localhost', ['atuin.qyliss.net', 'atuin'], ['::1'])

As a higher level example, socket.getfqdn() will return 'localhost'
without this change, and 'atuin.qyliss.net' with it.  This was
responsible for my Mailman instance sending mail with @localhost in
the Message-Id.

C program:

#include <err.h>
#include <netdb.h>
#include <sysexits.h>
#include <stdio.h>

int main(void)
{
        struct in6_addr addr = { 0 };
        addr.s6_addr[sizeof addr.s6_addr - 1] = 1; // ::1

        struct hostent *host = gethostbyaddr(&addr, sizeof addr, AF_INET6);
        if (!host)
                err(EX_OSERR, "gethostbyaddr: %s", hstrerror(h_errno));

        printf("name: %s\n", host->h_name);

        size_t n;
        for (n = 0; host->h_aliases[n]; n++);

        printf("aliases (%zu):", n);

        for (size_t i = 0; i < n; i++)
                printf(" %s", host->h_aliases[i]);

        printf("\n");
}
@alyssais
Copy link
Member Author

alyssais commented Apr 12, 2021 via email

@primeos
Copy link
Member

primeos commented Apr 12, 2021

Maybe I'm missing something in #76542, but it seems to be that it's
about making sure that ::1's canonical hostname is always localhost.

Yes, that's correct.

I do not propose changing this. I just want the FQDN and hostname to show
up as aliases.

Oh, sorry, I didn't read the second example carefully enough (('localhost', ['atuin.qyliss.net', 'atuin'], ['::1']) but I somehow only read atuin.qyliss.net) - my bad.

Are you saying it's intentional that they don't?

Unfortunately yes. IIRC it's not possible to add the aliases in the second case without breaking either hostname -f or ::1 not resolving back to localhost. The only solution that I see potentially working is ::1 localhost atuin.qyliss.net - i.e. with only the FQDN, not the hostname (atuin) - because IIRC hostname -f will use gethostname() to get atuin and then determine the FQDN using gethostbyname("atuin") (i.e. gethostbyname(gethostname())).

I read the man page I quoted earlier as saying that it's invalid, but
there's definitely some ambiguity.

I'm reading "should" as a recommendation (while invalid = parsing fails, etc.) - or how RFCs use it:

  1. SHOULD This word, or the adjective "RECOMMENDED", mean that there
    may exist valid reasons in particular circumstances to ignore a
    particular item, but the full implications must be understood and
    carefully weighed before choosing a different course.

But I get what you mean, I'm just using/weighting (in)valid differently.

@alyssais
Copy link
Member Author

alyssais commented Apr 14, 2021 via email

@primeos
Copy link
Member

primeos commented Apr 16, 2021

On my system hostname is already from the GNU inetutils [0] but I've installed it manually so that could explain it. I haven't looked into other implementations and I'm not sure if it would be viable to switch.

[0]: I just realized that we still haven't updated to the new 2.0 release. The previous one (1.9.4) is from the end of 2011. (cc @matthewbauer)

@deviant
Copy link
Member

deviant commented Jun 1, 2021

The current state of things is very headache-inducing. I have a couple of VPSs, both under the same domain:

v@host1 ~> uname -n
host1
v@host1 ~> cat /etc/hostname
host1
v@host1 ~> cat /etc/hosts
127.0.0.1 localhost
::1 localhost
127.0.0.2 host1.example.com host1
::1 host1.example.com host1
v@host1 ~> hostname
host1
v@host1 ~> hostname -f
host1.example.com
v@host1 ~> nixos-option networking.hostName networking.domain | grep -A1 Value
Value:
"host1"
--
Value:
"example.com"

Here's what Python thinks is going on:

>>> import socket
>>> socket.gethostname()
'host1'
>>> socket.getfqdn()
'localhost'
>>> socket.getfqdn('host1')
'localhost'
>>> socket.getfqdn('host1.example.com')
'localhost'
>>> socket.getfqdn('example.com')
'host1'
>>> socket.getfqdn('host2.example.com')
'host2.example.com'
>>> socket.getfqdn('nixos.org')
'nixos.org'

This violates several assumptions, namely:

  • that getfqdn() returns the FQDN of the current machine
  • that the FQDN of another machine in the same domain shares a suffix with the FQDN of the current machine
  • that getfqdn(fqdn) returns its identity
  • and I'm honestly not sure what's going on with getfqdn on the apex. that's just plain weird.

Here are the results when run on the second machine:

>>> import socket
>>> socket.gethostname()
'host2'
>>> socket.getfqdn('host1')
'host1.example.com'
>>> socket.getfqdn('host1.example.com')
'host1.example.com'
>>> socket.getfqdn('example.com')
'host1.example.com'
>>> socket.getfqdn('host2.example.com')
'localhost'

And here's my laptop's perspective:

>>> import socket
>>> socket.gethostname()
'laptop'
>>> socket.getfqdn('host1.example.com')
'host1.example.com'
>>> socket.getfqdn('example.com')
'host1.example.com'
>>> socket.getfqdn('host2.example.com')
'host2.example.com'

All three of these should be giving the same results.

@flokli
Copy link
Contributor

flokli commented Jun 1, 2021

I'm not quite sure what's the state of this PR, considering its draft status and #119236 (comment).

@alyssais is there a need to switch to another hostname implementation?

@deviant
Copy link
Member

deviant commented Jun 2, 2021

FWIW, I was able to get this working correctly by removing /etc/hosts entirely:

{
  networking.hostFiles = mkForce [];
}

Is there a good reason why we're even filling /etc/hosts, given that there's nss-myhostname, which Does The Right Thing?

A quirk I ran into with this setup is that PTR queries for the machine's public IP address return only the hostname, but I think this might be a bug with networkd, and it's trivially fixable by adding the FQDN to hostFiles.

@alyssais
Copy link
Member Author

alyssais commented Jun 3, 2021 via email

@flokli
Copy link
Contributor

flokli commented Jun 3, 2021

Is there a good reason why we're even filling /etc/hosts, given that there's nss-myhostname, which Does The Right Thing?

nss-myhostname only applies on system with nscd enabled, and there's valid reasons to not have it enabled. In that case, /etc/hosts provides a fallback.

Also, there's some Go binaries that don't use nss at all, but parse /etc/hosts. I'd think having an empty /etc/hosts is way more invasive than changing our hostname implementation.

@flokli
Copy link
Contributor

flokli commented Aug 4, 2021

Seems with resolved enabled, hostname --fqdn also broke: #132646

@stale
Copy link

stale bot commented Apr 19, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants