Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kodi: safe mode [RFC] #2228

Merged
merged 3 commits into from
Dec 14, 2017
Merged

kodi: safe mode [RFC] #2228

merged 3 commits into from
Dec 14, 2017

Conversation

MilhouseVH
Copy link
Contributor

@MilhouseVH MilhouseVH commented Nov 17, 2017

As nobody wanted to discuss this on Slack...

With the move to automatic updates enabled by default, I have a sneaking suspicion this may lead to an increase in the number of systems that enter a Kodi crash loop due to duff addons after the upgrade, particularly after a major upgrade.

To mitigate this, I present to you... safe mode.

By default after 5 crashes within 15 minutes (both configurable in options: KODI_MAX_RESTARTS and KODI_MAX_SECONDS) "safe mode" will restart LibreELEC with a temporary clean .kodi folder.

The original (and crash-causing) .kodi folder will be renamed .kodi.FAILED. In the Logfiles Samba folder an extra archive with -FAILED suffix will be created with the logs from the failed Kodi folder.

"Safe mode" (identifiable as it uses the red background for the default Estuary skin) should now give the user time to enable Samba and/or ssh, and perform any type of investigation required to determine the cause of the crash loop, including potentially downgrading to the previous version of LibreELEC.

To return to their original .kodi folder they simply need to reboot. Obviously if the cause of the crash has not been resolved then they'll quickly find themselves back in "safe mode".

Users can disable "safe mode" entirely by creating /storage/.config/safemode.disable.

There's probably a cleaner/better/neater solution for this that involves additional systemd services and dependencies, in which case consider this nothing more than a proof of concept for anyone that wishes to go that route. And if this isn't wanted, then we can close it.

These commits have been in my test builds for the past week.


A simple way to test this PR as follows:

  1. systemctl stop kodi
  2. cd /storage/.kodi
  3. mv userdata userdata.bak
  4. touch userdata
  5. reboot

Kodi will now crash on start, entering a "restart loop".

After 5 restarts Kodi should enter "safe mode" with a red background, allowing the cause of the crash to be investigated and fixed.

To resolve the issue, connect and...

  1. cd .kodi.FAILED
  2. rm userdata
  3. mv userdata.bak userdata
  4. reboot

@Ray-future
Copy link
Contributor

I must have missed the discussion on slack :). I like it 👍

@nomandera
Copy link
Contributor

Similarly I must have missed this discussion. Elegant to the point where it seems like it should always have been there. +1 from me as well

@vpeter4
Copy link
Contributor

vpeter4 commented Nov 17, 2017

I think 15 minutes is too much. Maybe 5? How long does it take to boot RPi1 to Kodi?

Also maybe it would be good to enable local console when in safe mode?

@CvH
Copy link
Member

CvH commented Nov 17, 2017

Wouldn't it make sense to disabling this per default at an dev image?

@MilhouseVH
Copy link
Contributor Author

I must have missed the discussion on slack

I made a couple of posts on 9 Nov in #General floating the idea, seeking feedback for a hairbrained idea. Got none. :)

I think 15 minutes is too much. Maybe 5? How long does it take to boot RPi1 to Kodi?

It may need to vary from system to system, hence the configurable options. It can take a few minutes to process the core dump on an RPi1, much less on x86, hence the current 15 minute default - maybe this should be longer for RPi1, but 15 minutes for x86 (and possibly other systems) should be OK as having Kodi crash 5 times within 15 minutes should be a good indication of an unstable system.

Typically if Kodi is going to get stuck in a restart loop it will crash almost immediately at application startup, resulting in a fairly small core dump which even an RPi1 should be able to churn through 5 times in 15 minutes - whether the user waits long enough to enter "safe mode" though, is another matter. I'll have to run some more tests and time my RPi1.

Also maybe it would be good to enable local console when in safe mode?

Possibly, but for the average end-user a text console (and no keyboard connected to the system) might be of less benefit than than booting into a "safe mode" Kodi.

Wouldn't it make sense to disabling this per default at an dev image?

It could also be of use to developers, as they're just as likely to have a crashing Kodi system from time to time (although admittedly less likely to need "safe mode" to get out of trouble). It can be permanently disabled by creating /storage/.config/safemode.disable, which is something a developer might choose to do.

@MilhouseVH
Copy link
Contributor Author

MilhouseVH commented Nov 17, 2017

So the time taken to process an instant crash (see userdata test in first post) on RPi1 is not as bad as I feared - these are the five timings:

31
50
67
83
100

so 5 crashes logged in less than 2 minutes.

However the core dumps are only about 40MB, which is likely to be typical when Kodi crashes immediately on start. Larger core dumps will take longer to a) write (particularly to a slow SD card) and b) process, so if Kodi is crashing later in the startup process - maybe after loading a binary addon - then more time will be required.

I'd stick with 5 in 15 for now, for all platforms, and tune based on real-life feedback.

@vpeter4
Copy link
Contributor

vpeter4 commented Nov 17, 2017

My concert for the time was this. If user is having some buggy addon then it can crash often. And in this case it will boot once in this safe mode. Which will have a user reaction like "wtf?". And same situation will happen like with lirc changes where no one reads release notes.

Probably on first boot after upgrade some release notes should be showed. Very condensed. But this is another issue.

@MilhouseVH
Copy link
Contributor Author

MilhouseVH commented Nov 17, 2017

If user is having some buggy addon then it can crash often. And in this case it will boot once in this safe mode. Which will have a user reaction like "wtf?".

Yes, that's a legitimate concern. Reducing the 15 minute time period would make it less likely to trigger safe mode when the user is crashing Kodi by using a faulty add-on, but if they're managing to crash Kodi 5 times within 15 minutes then they might appreciate the break and use the time out from crashing Kodi to consider if they really should be using that add-on.

At least they're just one reboot away from yet more crashes!

Probably on first boot after upgrade some release notes should be showed. Very condensed. But this is another issue.

That would be nice. Ideally we'd also have some on-screen indication that the user is now in the crash recovery "safe mode" but the only thing I could think of was to change the background, and I'm not sure how we could make it more obvious. Maybe display additional "YOU ARE IN SAFE MODE" text in the Settings Wizard (it could detect the value SAFE which has been written to /storage/.config/boot.status), as the wizard is the first thing the users will see in a "clean" Kodi.

@MilhouseVH
Copy link
Contributor Author

This is the first thing a user will see when entering "safe mode":

s1

Maybe we can add some extra text there, something along the lines of:

*** SAFE MODE! ****

LibreELEC has temporarily started in safe mode due to repeated Kodi crashes.

Reboot to return to your original Kodi installation (which may continue to crash if the cause is not resolved).

@MilhouseVH
Copy link
Contributor Author

MilhouseVH commented Nov 17, 2017

I've created a branch for service.libreelec.settings: LibreELEC/service.libreelec.settings@master...MilhouseVH:add_safe_mode.

When "safe mode" is activated the initial page of the Setup Wizard will now have the following appearance:

s1

This change is in my nightly test build, starting with #1117.

@vpeter4
Copy link
Contributor

vpeter4 commented Nov 18, 2017

Pure perfection!

@MilhouseVH
Copy link
Contributor Author

It's not ideal, as disabling or removing a bad add-on isn't exactly straightforward - it's a shame Kodi can't be started without add-ons allowing a bad add-on to be disabled or uninstalled. With this PR users will need to rm -fr the relevant add-on folder in /storage/.kodi.FAILED/addons, but at least they can provide logs etc. and connect with ssh, and it's better than nothing.

@MilhouseVH
Copy link
Contributor Author

I've reorganised the code which makes it feel a bit less of a hack.

I've also updated my branch for the LibreELEC Settings addon with the following changes:

  1. Added a second "Submit log" option which uploads the latest kodi_crash.log rather than the regular kodi.log
  2. When in "safe mode", the uploaded logs will be read from the .kodi.FAILED folder
  3. I've updated the help text string associated with the existing "Submit log" item, as it was not accurate (will need translating)
  4. I've added x86 boot file support to the uploaded logs (syslinux.cfg etc.)
  5. The "Boot mode" (EFI or BIOS) will be recorded in the log (second line) for x86_64 systems
  6. The addon will no longer cat files that don't exist (meaning it doesn't include RPi config files in x86_64 logs)
  7. Improved cleanup - don't leave temporary files hanging around

Currently the log upload process is heavily biased towards RPi and Generic systems.

Are there any log files that other projects might benefit from having uploaded?

@MilhouseVH
Copy link
Contributor Author

I've replaced dmesg with journalctl -a as the latter should include all of the former, plus useful systemd information.

Example log from Generic system: http://sprunge.us/CAZG

@MilhouseVH
Copy link
Contributor Author

Added to the pastebin logs:

  1. Samba configs
  • /storage/.kodi/.smb/smb.conf (client)
  • /storage/.kodi/.smb/user.conf (client)
  • /run/samba/smb.conf (server)
  1. u-boot config:
  • /flash/extlinux/extlinux.conf

@MilhouseVH
Copy link
Contributor Author

MilhouseVH commented Nov 21, 2017

  1. The "failed" .kodi.FAILED folder is now auto-shared as "Kodi-Failed" while safe mode is active
  2. Safe mode wizard landing page text updated:
    s1

@MilhouseVH
Copy link
Contributor Author

Depends on LibreELEC/service.libreelec.settings#87

@CvH CvH merged commit 9155531 into LibreELEC:master Dec 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants