New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't cause system to hang on halt/reboot when net-booting #1776
Conversation
Many thanks for this - yes this has been a known issue for a while. For instance, see this forum thread The first two commits look OK, but I'll have to take your word on the lazy umount. I can't easily test this myself at the moment, but can include this in RPi/RPi2/Generic test builds (from Tuesday night, #711) if nothing else than to ensure there are no unwanted side effects. Would be nice to sort out resume from standby too! :) |
As it's kind of related, if you haven't already you might want to review the hack I added in #546 which prevents |
Yeah, the lazy-unmount is ugly: it relies on the assumption that all other processes using |
Thanks for the contribution but I don't really like this. A proper solution should be found. It most likely needs some systemd reworking. |
@chrisnovakovic have you done any further investigation? - systemd is likely the problem and the solution here. Thanks. |
b9f12fc
to
77ca9ed
Compare
I've found some time to look into doing this in a way that doesn't rely on lazy-unmounting The good news is that shutting down and rebooting can be made to work by eliminating the file descriptors that ConnMan keeps open for its interface stats and history files, which end up getting written to The bad news is that suspending still doesn't work, even with ConnMan patched, because other processes that are stopped after ConnMan also have open file descriptors on
Kodi's not a problem here (it's stopped before ConnMan when the host is suspended) and I don't think X is either, but the others might be. The fact that systemd itself writes its journal to |
@chrisnovakovic could you submit a description of the problem and your patch to the connman mailing list? .. several folks there are somewhat familiar with LE and should be receptive to the addition; worst case they might refuse your patch but come up with their own alternative. Our preference is not to add patches unless we have visibility on being able to drop them in a future package update, i.e. connman version bump. Nice investigation work btw :) |
@chewitt Actually I originally found that patch on the ConnMan mailing list, but there didn't seem to be any appetite for merging it. I'll post it again and emphasise the benefits of being able to choose not to repeatedly write a file that's not particularly useful to flash-backed storage :) |
@rudy1981 Can you confirm that this patch worked for you? It worked for me in a VMware VM and on an RPi3, but wider testing would be a good thing, especially from those who have more services running in LibreELEC (and therefore a higher probability of having processes running that have open file descriptors on |
Patch reposted upstream: https://lists.01.org/pipermail/connman/2018-February/022467.html |
Patch merged upstream. Slightly different to the one here, but the semantics are the same: |
@chrisnovakovic thanks for the update, i'll go nag wagi for a connman release :) |
77ca9ed
to
1442e27
Compare
While we're waiting for ConnMan 1.36, I've updated the second commit (my cherry-picked upstream patch doesn't apply cleanly against ConnMan 1.35). Tested on RPi3. It should be safe to merge this into master now, if you want to: whenever you update to 1.36, just delete |
@chrisnovakovic it's not necessary to comment the package.mk - can you drop that and we're happy to merge with the patch - we'll spot the connection when testing the next connman release. NB: I've tried to encourage connman devs to roll something out recently as there are some NTP fixes we'd like to see in addition to this, but nothing has been forthcoming. |
@chrisnovakovic ping! ^^ |
Sorry, Christian - I'll get this done today if I can. |
The init script currently touches a file at /dev/.storage_netboot if /storage is a remote filesystem, so that scripts that run after the root filesystem has been switched can behave differently depending on whether /storage is mounted locally or remotely. Add similar functionality for /flash by touching /dev/.flash_netboot if it is a remote filesystem.
1442e27
to
3cfdf9b
Compare
ConnMan writes stats and history files for each configured interface to /storage/.cache/connman/*/{data,history}. These files remain open while ConnMan is running, and prevent the system from halting or rebooting when /storage is an NFS mount (because ConnMan brings down the interface through which the NFS mount is accessed and then tries to update the stats and/or history file for that interface, but the file descriptors are no longer valid, so the system hangs). The stats and history files are superfluous, especially since the means of viewing them isn't included in LibreELEC (the stats tool is missing because ConnMan is compiled with --disable-tools), so there's no harm in not generating them on systems that don't mount /storage over NFS either; in fact, it benefits LibreELEC installations where /storage is mounted on a flash device by reducing unnecessary flash writes.
3cfdf9b
to
0235a22
Compare
@chrisnovakovic Looks good. All done? |
Yep, this is good to go on |
Actually, it seems that with the latest modifications, this applies cleanly against the |
There are no current plans for more 8.2 releases, we're overdue on 9.0 already. Thanks! |
I ran into another problem while net-booting LibreELEC: with
/storage
mounted via NFS, halting or rebooting causes the system to hang. This is due to ConnMan exiting (and therefore bringing down the network) while processes still have open handles to files under/storage
(including ConnMan itself: this can be verified withlsof | grep /storage
, which shows thatconnmand
has an open state file at/storage/.cache/connman/*/data
).This can be solved by lazy-unmounting
/storage
when LibreELEC is being net-booted (along with/flash
, if that's also mounted remotely, since there's no reason not to) before ConnMan is stopped. This means that ConnMan doesn't have a chance to update its state file before it exits, but from a brief look at the ConnMan source code, it appears not to do that anyway.This PR doesn't address the follow-up problem of
/flash
and/storage
having disappeared after the system resumes from standby. I think one way to fix that would be to create two new systemd targets, one for/flash
being mounted and one for/storage
being mounted, and do the net-boot mounting operations currently performed in the initramfs init script in systemd unit files after switching to the new root filesystem - that would require substantial work from someone who understands both LibreELEC and systemd better than I do, I'm afraid, but at least this PR fixes halting and rebooting.Tested and working fine on a RPi3 using the
RPi2
build.