Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using zram in live PXE causes services with PrivateDevices=yes to fail #1296

Closed
Nemric opened this issue Sep 14, 2022 · 22 comments
Closed

Using zram in live PXE causes services with PrivateDevices=yes to fail #1296

Nemric opened this issue Sep 14, 2022 · 22 comments
Labels

Comments

@Nemric
Copy link

Nemric commented Sep 14, 2022

Describe the bug
With version 36.20220820.3.0 some service failed to start :

Fedora CoreOS 36.20220820.3.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/tag/coreos

[systemd]
Failed Units: 2
  systemd-userdbd.service
  systemd-userdbd.socket

Once booted and looged in, systemctl start systemd-userdbd.service works fine

Reproduction steps
Start a diskless coreos from PXE boot

Expected behavior
No failed units at startup

System details

  • Bare Metal diskless pxe
  • Fedora CoreOS latest stable 36.20220820.3.0

Ignition config
Untitled-1.zip

Additional information

2022-09-09 22:46:56	
Mounted sys-fs-fuse-connections.mount - FUSE Control File System.
	
2022-09-09 22:46:56	
Mounted sys-kernel-config.mount - Kernel Configuration File System.
	
2022-09-09 22:46:56	
Finished systemd-sysctl.service - Apply Kernel Variables.
	
2022-09-09 22:46:56	
Starting systemd-userdbd.service - User Database Manager...
	
2022-09-09 22:46:56	
systemd-userdbd.service: Failed to set up mount namespacing: /run/systemd/unit-root/dev: Read-only file system
	
2022-09-09 22:46:56	
systemd-userdbd.service: Failed at step NAMESPACE spawning /usr/lib/systemd/systemd-userdbd: Read-only file system
	
2022-09-09 22:46:56	
systemd-userdbd.service: Main process exited, code=exited, status=226/NAMESPACE
	
2022-09-09 22:46:56	
systemd-userdbd.service: Failed with result 'exit-code'.
	
2022-09-09 22:46:56	
Failed to start systemd-userdbd.service - User Database Manager.
	
2022-09-09 22:46:56	
Starting systemd-userdbd.service - User Database Manager...
	
2022-09-09 22:46:56	
systemd-userdbd.service: Failed to set up mount namespacing: /run/systemd/unit-root/dev: Read-only file system
	
2022-09-09 22:46:56	
systemd-userdbd.service: Failed at step NAMESPACE spawning /usr/lib/systemd/systemd-userdbd: Read-only file system
	
2022-09-09 22:46:56	
systemd-userdbd.service: Main process exited, code=exited, status=226/NAMESPACE
	
2022-09-09 22:46:56	
systemd-userdbd.service: Failed with result 'exit-code'.
	
2022-09-09 22:46:56	
Failed to start systemd-userdbd.service - User Database Manager.
@cheese
Copy link

cheese commented Nov 8, 2022

I met the same issue with a homemade custom live image.
Every service with PrivateDevices=yes fails to start with this error.

@cheese
Copy link

cheese commented Nov 8, 2022

I found out that systemd requires a writable /tmp. I am building a CoreOS-like system from CentOS Stream 9, which does not enable tmp.mount by default. The issue was resolved after I explicitly enabled tmp.mount.

@dustymabe
Copy link
Member

Thanks @cheese - @Nemric, sorry for not looking at this sooner. We're working through some backlog and trying to get back around to issues like this soon!

@lump
Copy link

lump commented Nov 18, 2022

I worked around this for now by dropping in an /etc/systemd/system/systemd-userdbd.service.d/override.conf:

[Service]
PrivateTmp=no
PrivateDevices=no

@Nemric
Copy link
Author

Nemric commented Nov 19, 2022

Yeah ! works great !

variant: fcos
version: 1.4.0

systemd:
  units:
    - name: systemd-userdbd.service
      dropins:
        - name: tempfix.conf
          contents: |
            #Temp fix for https://github.com/coreos/fedora-coreos-tracker/issues/1296
            [Service]
            PrivateTmp=no
            PrivateDevices=no

@dustymabe
Copy link
Member

@jlebon thinks we should be able to catch something like this by adding a pxe-live-login testiso test.

@jlebon
Copy link
Member

jlebon commented Nov 8, 2023

I can't reproduce this on a recent f38 dev build I have.

Could be something specific to your Ignition config, though it seems like there are quite a few things that get merged in. If you can still reproduce this, can you try to shrink it down to a minimal reproducer?

@dustymabe
Copy link
Member

I wonder if this is somehow specific to "real hardware"?

@Nemric
Copy link
Author

Nemric commented Nov 9, 2023

I can't reproduce this on a recent f38 dev build I have.

Could be something specific to your Ignition config, though it seems like there are quite a few things that get merged in. If you can still reproduce this, can you try to shrink it down to a minimal reproducer?

My bad, you're right, it looks like it works with minimal config :

variant: fcos
version: 1.5.0

ignition:
  security:
    tls:
      certificate_authorities:
        - source: http://someserver:8080/ign/cert.crt
          verification:
            hash: sha256-xxx

  config:
    merge:
      - source: https://someserver:8443/ign/passwd_users_core_sshkeys.ign

storage:
  filesystems:
    - path: /var
      device: /dev/sda
      format: xfs
      label: Var
      wipe_filesystem: false
      with_mount_unit: true

  files:
    - path: /etc/hostname
      mode: 0644
      contents:
        inline: Curie

and that will be https://someserver:8443/ign/passwd_users_core_sshkeys.ign merged file

variant: fcos
version: 1.5.0

passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-ed25519 AAAACxxxx Emeric

So I will try to find the "bad" merge !

@Nemric
Copy link
Author

Nemric commented Nov 9, 2023

Got it !
this merged file is the root cause !

variant: fcos
version: 1.5.0

storage:
  files:
    - path: /etc/systemd/zram-generator.conf
      mode: 0644
      contents:
        inline: |
          # This config file enables a /dev/zram0 device with the default settings
          [zram0]

@cheese @lump are you using zram-generator ? anything else ?

@jlebon jlebon changed the title systemd-userdbd.service & socket failed to start on boot on diskless FCOS Using zram in live PXE causes services with PrivateDevices=yes to fail Nov 9, 2023
@jlebon
Copy link
Member

jlebon commented Nov 9, 2023

Yes thanks, I can reproduce it now. I've updated the title.

@Nemric
Copy link
Author

Nemric commented Feb 13, 2024

@jlebon @dustymabe @cheese @lump
On next stream, systemd is 254.9 and this issue is gone ... 😃
Thanks all and "pacho2" for the link

@Nemric Nemric closed this as completed Feb 13, 2024
@dustymabe
Copy link
Member

Was there an upstream fix in systemd?

@Nemric
Copy link
Author

Nemric commented Feb 14, 2024

Yes that could be this one : systemd/systemd#29343
Here is the issue I did follow : systemd/systemd#30535

@dustymabe
Copy link
Member

ok I'll trust you guys :)

@dustymabe dustymabe added status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Feb 14, 2024
@dustymabe

This comment was marked as outdated.

@dustymabe

This comment was marked as off-topic.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. status/pending-next-release Fixed upstream. Waiting on a next release. labels Feb 14, 2024
@Nemric Nemric reopened this Feb 26, 2024
@Nemric
Copy link
Author

Nemric commented Feb 26, 2024

@jlebon @dustymabe @cheese @lump
My bad, I was wrong, I did forgot that I disabled SWAP (and of course ZRAM) on this "next stream" server
Sorry guys :(

I was waiting for systemd v254.9 and did tests too quickly

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Feb 26, 2024
@travier travier added the status/pending-upstream-release Fixed upstream. Waiting on an upstream component source code release. label Mar 14, 2024
@dustymabe
Copy link
Member

@travier you marked this as pending-upstream-release. Do you know (or does anyone know) if there is a confirmed fix upstream?

@Nemric
Copy link
Author

Nemric commented Mar 27, 2024

I did try with "next" FCOS 40.20240322.1.0, systemd 255.3 and kernel 6.8 ; and now I'm sure 😉 that :

  • swap on zram is enabled
  • no workaround is used
  • systemd-userdb did start as expected

I don't know what is the confirmed fix (F40, K6.8, Stmd255) but my first clues are :

Yes that could be this one : systemd/systemd#29343
Here is the issue I did follow : systemd/systemd#30535

Perhaps was I a merge too soon ^^

I'll refer to this issue during upcoming test days ;)

@dustymabe
Copy link
Member

No worries. Thanks @Nemric. So that means we can probably just close this out then?

@Nemric
Copy link
Author

Nemric commented Mar 28, 2024

Yes of course, I'll do that after test days and will follow zram tests results to validate the "fix" definitely 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants