Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default hostname now is fedora, used to be localhost #649

Closed
vrutkovs opened this issue Oct 20, 2020 · 30 comments · Fixed by coreos/fedora-coreos-config#868
Closed

default hostname now is fedora, used to be localhost #649

vrutkovs opened this issue Oct 20, 2020 · 30 comments · Fixed by coreos/fedora-coreos-config#868

Comments

@vrutkovs
Copy link
Member

Booted 33.20201006.1.0 from next stream on GCP and default hostname is fedora:

[    2.041577] systemd[1]: No hostname configured.
[    2.042356] systemd[1]: Set hostname to <fedora>.

OKD-on-FCOS expects default hostname to be localhost (as on AWS).

See openshift/machine-config-operator#2160 (comment)

@bdurrow
Copy link

bdurrow commented Oct 20, 2020

perhaps related? https://github.com/peckato1/fedora-systemd-package/commit/6eb8bcde288dda39b163e87ee0926f6f30fcad73

@bdurrow
Copy link

bdurrow commented Oct 20, 2020

This is what I see in my boot logs:

Welcome to �[0;38;2;60;110;180mFedora CoreOS 33.20201006.10.2 dracut-050-63.git20200529.fc33 (Initramfs)�[0m!

[    2.456683] systemd[1]: No hostname configured.
[    2.457970] systemd[1]: Set hostname to <fedora>.

According to the systemd code this indicates that systemd was compiled with FALLBACK_HOSTNAME set to fedora perhaps with -Dfallback-hostname=fedora
https://github.com/systemd-rhel/rhel-8/blob/45d093a37b6f8c2ceae9bfd090c5265f35413b46/src/core/hostname-setup.c#L42

Prior to pivot the console log shows:


Welcome to �[0;34mFedora CoreOS 32.20200923.3.0 dracut-050-61.git20200529.fc32 (Initramfs)�[0m!

[    2.417343] systemd[1]: No hostname configured.
[    2.418432] systemd[1]: Set hostname to <localhost>.

So it appears that this is baked into the image that pivot “pivots” to in this case:

[    0.000000] Command line: BOOT_IMAGE=(hd0,gpt1)/ostree/fedora-coreos-a370d0067f2cb442383e4725688a2bfb92a84f59ceb6cd9ead44826dcc53634a/vmlinuz-5.8.13-300.fc33.x86_64 systemd.unified_cgroup_hierarchy=0 console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.0/fedora-coreos/a370d0067f2cb442383e4725688a2bfb92a84f59ceb6cd9ead44826dcc53634a/0 ignition.platform.id=gcp root=UUID=b583abc4-1973-439f-8aa2-169f7f218dc2 rw rootflags=prjquota

@vrutkovs
Copy link
Member Author

vrutkovs commented Oct 22, 2020

Ways to fix it:

  • Update systemd-resolved to use localhost - might take time to propagate in the next stream, as it would be Fedora-wide update. It probably would be pushed back, as its important for GCP install of OKD
  • Update it in FCOS. Again, not sure its worth it
  • Update afterburn to replace fedora hostname with localhost

I'm not entirely sure which platforms are affected - so far AWS is known to work and it certainly breaks on GCP. Could we give a try on all FCOS-supported platforms?

Seems afterburn fix is the easiest to implement, but not sure if its a good idea to clean up after systemd-resolved

@jlebon
Copy link
Member

jlebon commented Oct 23, 2020

Maybe let's keep discussing in openshift/machine-config-operator#2160 how to proceed before doing anything here?

@lucab
Copy link
Contributor

lucab commented Oct 26, 2020

I was a bit puzzled by this bug report, so I checked the behavior of a plain FCOS 33.x on GCP and I can see the hostname properly set from DHCP in initramfs:

NetworkManager[490]: <info>  [1603701253.4075] policy: set 'Wired Connection' (ens4) as default for IPv4 routing and DNS
NetworkManager[490]: <info>  [1603701253.4075] policy: set-hostname: set hostname to 'lbruno-temp.c.example.internal' (from DHCPv4)

This seems right to me, and it also matches the behavior I'd expect on other platforms.
How does OKD end up having fedora instead? Can we see the full initramfs logs to check what's going on?

@bdurrow
Copy link

bdurrow commented Oct 26, 2020

Right, okd disables DHCP on GCP:
https://github.com/openshift/machine-config-operator/blob/f2489433faacc32a7ead1caafcf076b943ffc9d8/templates/common/gcp/files/etc-networkmanager-conf.d-hostname.yaml#L12

If it weren't for that I wouldn't personally have a problem with my deployments but I have modified my installer to allow me to set my clusterdomain to the basedomain instead of having to have a 4 segment domain name.

@lucab lucab changed the title next: default hostname is fedora on GCP, unlike other platforms next: default hostname now is fedora, used to be localhost Oct 28, 2020
@lucab
Copy link
Contributor

lucab commented Oct 28, 2020

I've opened https://bugzilla.redhat.com/show_bug.cgi?id=1892235 for Fedora package owners to explore whether the branding sugar can be applied in a different place or in some other way which would be easier to control at provisioning time.

@dustymabe
Copy link
Member

We discussed this in the community meeting today.

13:20:14  dustymabe | #info we'll try to work with systemd/NM to see if "better fixes" for this
                    | issue can be placed upstream in those projects. If agreeable we'll push
                    | for a short term hack in MCO, and try to get builds into Fedora soonish.
                    | If not agreeable then we'll probably go with supporting this in afterburn.

@lucab has started the discussion with systemd already in BZ1892235. The discussion with NM is upcoming.

@darkmuggle
Copy link
Contributor

coreos/afterburn#512 addresses the setting of the default hostname in GCP

@darkmuggle
Copy link
Contributor

Comment #649 (comment) does not address the immediate pain in GCP.

On GCP, the default hostname "fedora" is a symptom of a problem -- the DHCP received hostname is too long and NM refuses to set it. Afterburn can write the FQDN to /etc/hostname and the MCO templates and the Kublet can all truncate the hostname accordingly. What we really need on FCOS on GCP is to make sure that the hostname is set.

@dustymabe
Copy link
Member

Comment #649 (comment) does not address the immediate pain in GCP.

On GCP, the default hostname "fedora" is a symptom of a problem -- the DHCP received hostname is too long and NM refuses to set it.

Having NM handle this case and truncate rather than refusing (as suggested in #649 (comment)) would help?

Afterburn can write the FQDN to /etc/hostname and the MCO templates and the Kublet can all truncate the hostname accordingly. What we really need on FCOS on GCP is to make sure that the hostname is set.

A combination of coreos/afterburn#512 and coreos/afterburn#509 would also alleviate the issue then?

@bdurrow
Copy link

bdurrow commented Nov 4, 2020

In my environment the hostname returned by GCP fits in hostname. It is still a problem there for OKD because NM dhcp is explicitly disabled by the machineconfig provided ignition configuration. It is relatively straight forward to modify NM's DHCP behavior to modify the DHCP hostname to the desired short name. Based on prior discussion I believe we are not willing to go that route.

@dustymabe
Copy link
Member

Based on prior discussion I believe we are not willing to go that route.

Any more context there or a link to where the discussion happened?

@darkmuggle
Copy link
Contributor

Ah right @bdurrow

It is still a problem there for OKD because NM dhcp is explicitly disabled by the MCO provided ignition configuration.

This is true for the MCO. For FCOS, it is not the default. And there's a really good reason why this is disabled -- NM gets cantankerous when the hostname is set outside of NetworkManager and it will reset the hostname to the long hostname -- and that breaks OKD/OCP.

@jlebon
Copy link
Member

jlebon commented Nov 4, 2020

We discussed this again in today's meeting:

#agreed we will ask NM whether they're willing to truncate the DHCP hostname in-tree,
        and if so we can work on carrying short-term script (or resort to afterburn
        logic for this) until that lands

@lucab will approach the NM developers about truncating the hostname in-tree. @cgwalters brought up that truncating might not be the right approach universally, but there's agreement that at least on GCP we do want it.

@lucab
Copy link
Contributor

lucab commented Nov 5, 2020

@cgwalters brought up that truncating might not be the right approach universally, but there's agreement that at least on GCP we do want it.

This was related to the fact that infrastructure orchestrators may generally need to maintain routable identities for nodes based on their FQDN. That is, the local hostname should not be maimed in a way that it cannot be matched anymore at infra-level name resolution.

However if NM follows the path of systemd/systemd#7616 I think we are not going to break that property. This is based on the fact that each label in a DNS-valid FQDN can have at most 63 chars, which is smaller than the HOST_MAX_LEN in Linux (64 char). Thus, "truncate overlong names to the first dot or to HOST_MAX_LEN, whatever comes earlier" will not surprisingly truncate a label mid-way even as part of long FQDNs, as a single label always fits into a kernel hostname max length.

@lucab
Copy link
Contributor

lucab commented Nov 5, 2020

@lucab will approach the NM developers about truncating the hostname in-tree

Upstream ticket at https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/572. I did reproduce the "overlong DHCP hostname" case with a customized scenario on top of dnsmasq and qemu, full NM service logs are attached there.

@LorbusChris
Copy link
Contributor

Just to note: This is a release blocker for OKD 4.6

@dustymabe
Copy link
Member

New curve ball. The revert (rpm build) didn't solve the problem completely so rather than churn the revert was reverted.

This problem is extremely complex. There were two changes introduced in Fedora 33.

When we revert the fallback hostname change we now break through to attempt reverse DNS because the previous things failed:

  • static hostname
  • from dhcp
  • transient hostname (now localhost instead fedora)

So that's good, now we're trying reverse DNS again like we were in Fedora 32.

Now's where it gets tricky and new problems arise because of the switch to systemd-resolved being enabled.

The /etc/nsswitch.conf in Fedora 33 is now

hosts: files resolve [!UNAVAIL=return] myhostname dns

where in Fedora 32 it was:

hosts:      files dns myhostname

This means things will now go through nss-resolve, then nss-myhostname, and finally nss-dns. The good news is that if you have systemd-resolved enabled and reverse DNS succeeds to get an answer from an actual DNS server then the hostname will be set to the right thing. THIS IS PROGRESS over what we had before where reverse DNS was never tried.

However the new problems arise when either:

  • there is no answer to the reverse DNS query

In this case systemd-resolved's behavior is to return the fallback hostname for a reverse DNS query for the current system's IP. Now that we've changed the fallback the answer should be localhost... but wait.. systemd-resolved has a fallback for the fallback. It basically will never return localhost and chooses to return linux. So systemd-resolved's answer to the reverse DNS lookup is linux and our hostname will get set to linux.

  • the user is not using systemd-resolved

In this case the nsswitch.conf file falls through to the next entry after resolve, which is now myhostname, which answers localhost. So our hostname will be set to localhost even if using nss-dns would have given us a different answer.

I think the most ideal thing to do in Fedora 33 is to:

  1. see if we can have systemd-resolved stop returning a fallback to the fallback
  2. reorder nsswitch.conf so that myhostname is lowest priority again

For after Fedora 33 we need to be smarter in NM about the glibc resolver and possible configurations there and also collaborate between systemd and NetworkManager on what this should look like in the future.

@dustymabe
Copy link
Member

dustymabe commented Dec 10, 2020

I'll throw out the other options I see in addition to the one mentioned above.

The revert was reverted because:

    Revert the fallback hostname revert
    
    Sadly, this does not work.
    
    It seems NM queries resolved for the local IP address and gets "linux"
    and sets that as the transient hostname. Resolved has a "fallback hostname"
    (that will now again be "fedora"), but it also has a fallback fallback hostname
    that is "linux" that it used in reverse dns queries and such. NM gets
    the "linux" name and tells hostnamed to use that as the transient hostname.
    I don't think this is an improvement, since "linux" is a problematic
    as "fedora". So let's revert this for now to avoid pointless churn,
    until we figure out a real solution.

So theoretically completely unconfigured systems now have linux as their hostname. Not localhost, as intended by the revert of the fallback hostname change, nor fedora as was the case before the revert. We decided that because we got linux instead of localhost as intended, we'd not make the change for now.

The way I see it our options are to:

[1. IDEAL OPTION]

  • see if we can have systemd-resolved stop returning a fallback to the fallback
  • reorder nsswitch.conf so that myhostname is lowest priority again

[2. OK OPTION]

  • patch the systemd-resolvd in fedora to use fedora instead of linux as its fallback to the fallback
  • reorder nsswitch.conf so that myhostname is lowest priority again

[3. WORST OPTION]

  • pick up systemd-246.7-1.fc33 (has the fallback hostname change fixed)
  • disable systemd-resolved
  • change back nsswitch.conf to the f32 settings

This option basically reverts everything to how it was in Fedora 32

@dustymabe
Copy link
Member

dustymabe commented Dec 11, 2020

Here is another option:

[4. SLIGHTLY LESS WORSE OPTION]

  • pick up systemd-246.7-1.fc33 (has the fallback hostname change fixed)
  • neuter systemd-resolved but leave the systemd-resolved systemd unit enabled
    • drop in an override that sets DNSStubListener=no
      • this means the real DNS servers will get used
    • change back nsswitch.conf to the f32 settings

This option is not great either, but it is attractive because of what we've already done to testing and next. In those branches we've already ran a migration script to update /etc/resolv.conf to point to /run/systemd/resolve/stub-resolv.conf. Rather than have to figure out how to reverse engineer that, and the fact that we know we do want to go with systemd-resolved once this mess is figured out, we'll just continue to use systemd-resolved, but effectively have it non-functional. The only job systemd-resolved will do is create /run/systemd/resolve/resolv.conf and then symlink /run/systemd/resolve/stub-resolv.conf (which is pointed to by /etc/resolv.conf) to it.

dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Dec 15, 2020
This one is complicated, we need to revert the systemd change
in f33 that makes it fallback to `fedora` rather than `localhost`
for the hostname and we also need to essentially render systemd-resolved
ineffective because it's internal fallbacks for reverse DNS make
it really hard to set the hostnmae via reverse DNS, which is
something we don't want to break and needs upstream work to
get more appropriate fixes in.

More context in coreos/fedora-coreos-tracker#649 (comment)
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Dec 15, 2020
This one is complicated, we need to revert the systemd change
in f33 that makes it fallback to `fedora` rather than `localhost`
for the hostname and we also need to essentially render systemd-resolved
ineffective because it's internal fallbacks for reverse DNS make
it really hard to set the hostname via reverse DNS, which is
something we don't want to break and needs upstream work to
get more appropriate fixes in.

More context in coreos/fedora-coreos-tracker#649 (comment)
@dustymabe
Copy link
Member

ok attempt at option 4 has been started over in coreos/fedora-coreos-config#780. Still pending some things that need to land first but we're almost there.

dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Dec 15, 2020
This one is complicated, we need to revert the systemd change
in f33 that makes it fallback to `fedora` rather than `localhost`
for the hostname and we also need to essentially render systemd-resolved
ineffective because it's internal fallbacks for reverse DNS make
it really hard to set the hostname via reverse DNS, which is
something we don't want to break and needs upstream work to
get more appropriate fixes in.

More context in coreos/fedora-coreos-tracker#649 (comment)
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Dec 15, 2020
This one is complicated, we need to revert the systemd change
in f33 that makes it fallback to `fedora` rather than `localhost`
for the hostname and we also need to essentially render systemd-resolved
ineffective because it's internal fallbacks for reverse DNS make
it really hard to set the hostname via reverse DNS, which is
something we don't want to break and needs upstream work to
get more appropriate fixes in.

More context in coreos/fedora-coreos-tracker#649 (comment)
dustymabe added a commit to dustymabe/fedora-coreos-config that referenced this issue Dec 15, 2020
This one is complicated, we need to revert the systemd change
in f33 that makes it fallback to `fedora` rather than `localhost`
for the hostname and we also need to essentially render systemd-resolved
ineffective because it's internal fallbacks for reverse DNS make
it really hard to set the hostname via reverse DNS, which is
something we don't want to break and needs upstream work to
get more appropriate fixes in.

More context in coreos/fedora-coreos-tracker#649 (comment)

(cherry picked from commit 969380b)
dustymabe added a commit to coreos/fedora-coreos-config that referenced this issue Dec 15, 2020
This one is complicated, we need to revert the systemd change
in f33 that makes it fallback to `fedora` rather than `localhost`
for the hostname and we also need to essentially render systemd-resolved
ineffective because it's internal fallbacks for reverse DNS make
it really hard to set the hostname via reverse DNS, which is
something we don't want to break and needs upstream work to
get more appropriate fixes in.

More context in coreos/fedora-coreos-tracker#649 (comment)

(cherry picked from commit 969380b)
@jlebon jlebon removed the meeting topics for meetings label Dec 16, 2020
vrutkovs added a commit to vrutkovs/okd-machine-os that referenced this issue Dec 21, 2020
After pivot systemd-resolved cannot reset /etc/resolv.conf symlink due to a SELinux issue (coreos/fedora-coreos-tracker#649). Make a link ourselves
vrutkovs added a commit to vrutkovs/okd-machine-os that referenced this issue Dec 21, 2020
After pivot systemd-resolved cannot reset /etc/resolv.conf symlink due to a SELinux issue (coreos/fedora-coreos-tracker#649). Make a link ourselves

Bring back DNSStubListener setting for systemd-resolved
vrutkovs added a commit to vrutkovs/okd-machine-os that referenced this issue Dec 21, 2020
After pivot systemd-resolved cannot reset /etc/resolv.conf symlink due to a SELinux issue (coreos/fedora-coreos-tracker#649). Make a link ourselves

Bring back DNSStubListener setting for systemd-resolved
@jlebon
Copy link
Member

jlebon commented Feb 24, 2021

The latest on this is:

See also https://bugzilla.redhat.com/show_bug.cgi?id=1892235#c25 for more details.

But anyway, at least as far as this specific issue is concerned, let's consider it fixed by coreos/fedora-coreos-config#868.

@dustymabe
Copy link
Member

followup in #834

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants