Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe Mode: mechanism for user code to recover without manual intervention #5956

Closed
anecdata opened this issue Jan 31, 2022 · 10 comments · Fixed by #7577
Closed

Safe Mode: mechanism for user code to recover without manual intervention #5956

anecdata opened this issue Jan 31, 2022 · 10 comments · Fixed by #7577
Assignees
Milestone

Comments

@anecdata
Copy link
Member

Feature request based on some discussion in this issue:
#2694 (comment)

It would be very useful to have a mechanism for user code to intercept or detect Safe Mode, and trigger its own reload or reset. Or, have a user-settable indicator not to go into Safe Mode for at least some causes. Ideally we'd want to identify and fix every last intermittent software cause of Safe Mode, but sometimes they are rare and very difficult to debug, and perhaps triggered from code included from some port-specific development environment.

For now, I have a microcontroller watchdog that looks for a heartbeat signal from the main microcontroller and resets it if it's lost. A software solution would simplify deployments, particularly for devices that are placed in less-accessible locations.

@tannewt tannewt added this to the Long term milestone Feb 1, 2022
@tannewt
Copy link
Member

tannewt commented Feb 1, 2022

Would a safemode.py work in your case? How does this interact with the internal watchdog?

Safe mode is really meant as a "we don't know what went wrong and don't expect to recover". Anything recoverable should be done through exceptions instead.

@anecdata
Copy link
Member Author

anecdata commented Feb 1, 2022

safemode.py would work fine. Anything that the user can control.

watchdog module doesn't catch these, nor do Exceptions. I'm not sure if there would be any interactions / complications from having both. Nothing comes to mind.

Most (all?) of my Safe Mode occurrences are intermittent but fixed with a restart. 95% sure it's not power or hardware. The idea is that the device can self-recover with a software restart.

@mrdalgaard
Copy link

mrdalgaard commented Feb 2, 2022

Maybe a safemode.py could also be used to account for brownout safemode occurrences in battery/solar powered projects.

[edit] Sorry, didn't see this was actually the issue the original issue was about

@maholli
Copy link

maholli commented Feb 14, 2022

+1

Still eagerly awaiting safemode.py! :)

@adamwolf
Copy link

Where would we want to put the safemode.py? Next to code.py and boot.py et al?

I looked into implementing this, or at least starting to, and I quickly ran into a comment in main.py

     // Create a new filesystem only if we're not in a safe mode.
     // A power brownout here could make it appear as if there's                                                                               
     // no SPI flash filesystem, and we might erase the existing one.

@tannewt
Copy link
Member

tannewt commented Mar 14, 2022

@adamwolf Yes, I'd do it like code.py I think. That way serial output can go out usb potentially.

@protosam
Copy link

What if the user locks themself out of the file system by disabling it and the serial? Perhaps the safemode should be an alternate binary?

@dhalbert
Copy link
Collaborator

@protosam in the worst case, the user can load a .uf2 that erases the CIRCUITPY flash, e.g.: https://learn.adafruit.com/welcome-to-circuitpython/troubleshooting#erase-circuitpy-without-access-to-the-repl-3105309

I'm not sure what you mean by an "alternate binary", but in general there's not enough space for an alternate copy of the CircuitPython firmware.

@protosam
Copy link

protosam commented Jul 26, 2022

Was thinking a copy of the circuitpython binary could be maintained that doesn't load boot.py to replace what's loaded. Though, that's problematic if the controller is in a product I guess.

I just learned that "run" is the equivalent of "reset" on the feather r2040 after looking at the code. Edit: on the pico.

@dansteingart
Copy link

Just re-upping this. safemode.py would be a welcome and IMHO crucial addition.

@tannewt tannewt modified the milestones: Long term, 8.1.0 Jan 23, 2023
@dhalbert dhalbert self-assigned this Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants