Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default "fast boot" setting is confusing and causes services to be unreliable #24

Closed
DanTup opened this issue Mar 27, 2016 · 14 comments
Closed

Comments

@DanTup
Copy link

DanTup commented Mar 27, 2016

I don't know if this is the right place for this and I don't think this will be changed, but I wanted to provide feedback in any case. I've spent many hours yesterday evening and most of today trying to figure out why I had services randomly failing at startup - they appeared to be starting during the DHCP process when there was no DNS (postfix takes a copy of /etc/resolv.conf into its chroot during the DHCP process which results in a broken postfix that can't resolve any DNS).

I eventually stumbled upon this magic setting:

screenshot 2016-03-27 at 21 57 11

And suddenly everything was explained!

Since I can't script raspi-config, I've taken the code from here and added it to my setup script. This has fixed both postfix, and all of the other DNS-related errors that fill syslog on a default Raspian install.

I understand that people want faster boots, but to me it seems a little crazy to have a faster boot at the expense of working services and filling syslog with errors at boot. Maybe I'm the first person to spend > 10 hours debugging before finding this setting; maybe I'm not. It's frustrating in any case. I think the default should be the safe option, and if people want faster boots, they're probably already making tweaks and they can change this (and understand the implications).

@XECDesign
Copy link
Member

Internally, there was quite a bit of back and forth about this and this is what was decided in the end. I haven't seen any complaints before this.

When booting with fast boot disabled and without a network connection, the systems waits for dhcp to timeout before continuing. This significantly increases boot time and there is no good workaround. It's really not ideal either way.

We could configure things such that dhcpcd tells systemd when the system is online and services which are required to be online don't start until that happens. However, many of the services which require being online don't specify that in their systemd units. Worse still, many services are still provided by sysvinit scripts, which results in circular dependencies. There's going to have to be some compromises until Debian fully migrates to properly configured systemd units.

I don't think it really helps you, but jessie lite defaults to having fast boot disabled. If you don't need a desktop, maybe that's a better starting point for you.

@DanTup
Copy link
Author

DanTup commented Mar 28, 2016

Thanks for the info!

When booting with fast boot disabled and without a network connection, the systems waits for dhcp to timeout before continuing. This significantly increases boot time and there is no good workaround. It's really not ideal either way.

I can see how this would be annoying too, though (maybe because it's bitten me) I'd still value working over potential random failure. I don't know what's visible on the screen at this time, but if the last thing is "Waiting for DHCP..." or something similar, it would be really obvious why the boot is slow and easy to resolve? The problem with the current mechanism is that it's hard to understand what's going on (at least, for a noob like me; I spent so many hours Googling without finding an explanation... it was also hard to know whether it was a Pi-specific problem, a postfix-specific problem or more general issue).

There's going to have to be some compromises until Debian fully migrates to properly configured systemd units.

I'm too much of a noob to really understand this; but if this means that it'll be better in future as a result of something that's already happening, that's definitely positive :)

I don't think it really helps you, but jessie lite defaults to having fast boot disabled. If you don't need a desktop, maybe that's a better starting point for you.

I didn't realise these was such a thing; so I might take a look. The only part of the desktop I really want is a actually just a (decent) fullscreen kiosk-mode browser; but sadly the only option for that (ChromiumOS) seems to come with other compromises (a really cut-back version of Linux).

The fix I have for now works and is in my setup script (which I'll always run, because it removes the pi user, changes SSH ports, etc.) so I might just stick with this.

I don't expect anything to change as a result of this post (especially now knowing about the timeout delay when no network connected) but maybe it'll show up in Google results for others hit by the same issue and save them some time :-)

@lurch
Copy link

lurch commented Apr 1, 2016

@DanTup You can script raspi-config :-) See e.g. https://github.com/raspberrypi/rc_gui/blob/master/src/rc_gui.c#L64

@XECDesign I guess it would be nice if all the new raspi-config features got documented which would help prevent confusion like this, or at least give somewhere to point people to for further info.

@DanTup
Copy link
Author

DanTup commented Apr 1, 2016

@lurch Hmmm, I'm confused - on my system, raspi-config seems to be a shell script. It has a bunch of script implemented at the end (for example, I've scripted expanding the filesystem) but not everything. Yet, I can see that's a C file and it does support more... What's the deal? :/

@lurch
Copy link

lurch commented Apr 1, 2016

Apologies for the confusion. raspi-config is indeed a shell script, and provides a text-mode (console / terminal / ssh / uart) method of configuring your Raspbian system. It can also be used non-interactively.

rc_gui is a GUI wrapper around raspi-config which is available using the Menu->Preferences->Raspberry Pi Configuration menu option on the Raspbian desktop. rc_gui does it's 'actual' work by making use of the non-interactive mode of raspi-config. The link in my last message was just to show you how you'd "script" raspi-config to do the "Slow" network boot, i.e.

sudo raspi-config nonint do_wait_for_network Slow

@DanTup
Copy link
Author

DanTup commented Apr 1, 2016

Aha! So I went back and looked in the script to see why I thought it wasn't complete...

#
# Command line options for non-interactive use
#
for i in $*
do
  case $i in
  --memory-split)
    OPT_MEMORY_SPLIT=GET
    printf "Not currently supported\n"
    exit 1
    ;;
  --memory-split=*)
    OPT_MEMORY_SPLIT=`echo $i | sed 's/[-a-zA-Z0-9]*=//'`
    printf "Not currently supported\n"
    exit 1
    ;;
  --expand-rootfs)
    INTERACTIVE=False
    do_expand_rootfs
    printf "Please reboot\n"
    exit 0
    ;;
  --apply-os-config)
    INTERACTIVE=False
    do_apply_os_config
    exit $?
    ;;
  nonint)
    INTERACTIVE=False
    $@
    ;;
  *)
    # unknown option
    ;;
  esac
done

I guess I overlooked the noint part. Not really sure why the first few have their own switches when you can just pass whatever you want! Thanks for the info, I can simplify my script! :)

@XECDesign
Copy link
Member

Yes, the documentation needs to be updated and I've added that to the todo list.

@lurch
Copy link

lurch commented Apr 1, 2016

Not really sure why the first few have their own switches when you can just pass whatever you want!

Probably for backwards-compatibility reasons...

@DanTup
Copy link
Author

DanTup commented Dec 3, 2017

FWIW, it seems like slow boot doesn't solve all issues. Last time my Pi rebooted, I ended up with the same issue - postfix unable to resolve DNS:

Dec 3 13:23:51 raspberrypi postfix/smtp[4787]: A08DA200FE: to=, orig_to=root@XXXXXXX, relay=none, delay=19803, delays=19803/0.06/0/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=aspmx.l.google.com type=MX: Host not found, try again)

I don't know whether this means slow boot isn't working correctly though :(

@XECDesign
Copy link
Member

I feel like if we haven't done it since 2016, it's probably not happening.

@nickolay
Copy link

@XECDesign

(2016) There's going to have to be some compromises until Debian fully migrates to properly configured systemd units.
[...]
(2022) I feel like if we haven't done it since 2016, it's probably not happening.

This is really weird. I don't know about 2016, but current systemd has network-online.target, which is documented as "delay[ing] boot until the network management software says the network is “up” .. Usually means that all configured network devices are up and have an IP address assigned, but details may vary.".

For systems using NetworkManager or systemd-networkd network-online.target is said to work out of the box. (At least it does on my Ubuntu laptop.) Could it be that the underlying issues have largely been solved and this can be reconsidered?

dhcpcd.service used by Raspbian is not even mentioned in systemd docs, so RPI users have to invent creative workarounds or switch away from dhcpcd altogether.

(raspi-config's 'slow wait for network connection' changes the way dhcpcd.service starts to -w Wait for an address to be assigned before forking to the background, which somehow delays network-online.target as well. Before I found this issue, I used dhcpcd-online (sudo apt install dhcpcd-gtk + https://github.com/NetworkConfiguration/dhcpcd-ui/blob/master/src/dhcpcd-online/dhcpcd-wait-online.service.in) to achieve the same effect.)

@lurch
Copy link

lurch commented May 14, 2024

dhcpcd.service used by Raspbian is not even mentioned in systemd docs, so RPI users have to invent creative workarounds or switch away from dhcpcd altogether.

dhcpcd was already switched to NetworkManager when we moved to Raspberry Pi OS Bookworm https://www.raspberrypi.com/news/bookworm-the-new-version-of-raspberry-pi-os/

@XECDesign
Copy link
Member

@nickolay Since the bookworm release, the landscape has changed significantly. While I haven't done any extensive testing to verify that this is the case, I would expect things to work out of the box as documented by systemd.

What issues do you believe need to be reconsidered?

Since we no longer use dhcpcd, this issue is just not relevant anymore, so if there are other reproducible issues related to network-online.target you'd like to flag up, open a separate issue and I'd be happy to take a look.

@nickolay
Copy link

Sorry, I did my research on a bullseye system, and got confused by the fact that raspi-config's master branch still had the "slow boot" configuration of dhcpcd. I should have noticed the bookworm branch is way ahead and/or actually tested it first! Thanks for the quick correction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants