Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Battery drains because of infinite retries in case of failures #1

Closed
ThomasFarstrike opened this issue Sep 2, 2023 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@ThomasFarstrike
Copy link

Throughout the piggy code, there are several places where infinite retries are attempted, and this can drain the battery if it keeps failing.

Examples:

  • wifi signal issue: if the access point has very poor signal, the ESP32 will keep trying to connect, forever
  • wifi misconfigurations: if the user configured a wrong wifi SSID or password, it will also keep trying to connect
  • wifi issue on ESP32: the ESP32 has good, but not great, wifi, which from time to time fails to connect, no matter how long it tries, often Reason: 4 - ASSOC_EXPIRE or Reason: 2 - AUTH_EXPIRE, sometimes other errors like 4WAY_HANDSHAKE_TIMEOUT. After a restart, is usually works fine.
  • lnbits protocol: if the server is down, or sends some invalid response, our HTTPS handling code can go in an infinite loop
  • lnbits replies: sometimes, the lnbits server might accept the TCP connection, without replying, resulting in an infinite wait
  • update checker: if for some reason the server doesn't reply, it will hang
  • (possibly more)

To fix these issues, better error handling of these specific cases would be good, where possible. For example, if the wifi credentials are wrong, the user should be notified on the screen.

Additionally, to prevent anything from causing the ESP32 to get stuck forever, the ESP32 watchdog should be activated and programmed. This will ensure the board reboots in case of an exceptionally long action.

To do it properly and prevent infinite watchdog reboots from draining the battery, a watchdog reboot counter should be kept somewhere. This watchdog reboot counter should be incremented in cases of a "watchdog" reset cause. And it should be reset to 0 in case of a regular (non-watchdog triggered) reboot.

If the watchdog reboot counter exceeds some configured value (example: 3) then the device should immediately go into a long sleep/hibernate (example: 6 hours) so that it wakes up at a time when whatever is causing the problem might hopefully be resolved.

@ThomasFarstrike ThomasFarstrike added the bug Something isn't working label Sep 2, 2023
@ThomasFarstrike
Copy link
Author

With the deadline of Feb 25, 2024 only 6 days away and to have this ready in time for Bitcoin Atlantis, I'm thinking of starting work on this issue. Unless there are others working on it? Please speak up!

ThomasFarstrike added a commit that referenced this issue Feb 24, 2024
This brings it in line with the issue description
at #1

"If the watchdog reboot counter exceeds some configured value (example: 3)
then the device should immediately go into a long sleep/hibernate
(example: 6 hours) so that it wakes up at a time when
whatever is causing the problem might hopefully be resolved."
@ThomasFarstrike
Copy link
Author

@ThomasFarstrike
Copy link
Author

I implemented the above.

Initially, I used the typical ESP32 "task" watchdog, but if that one triggers a restart, it's not knowable from rtc_get_reset_reason(). So I switched to the more unusual and convoluted "RTC watchdog", which is normally used by the lower-level ESP32 boot functions to detect hung boots, but can be repurposed.

More info: https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/system/wdts.html

I also spent a lot of effort in getting it to work without writing any state (boot counters etc) to the flash memory, because that has limited (as low as 10k?) write cycles. Also NVM and EEPROM were out of the question because these are also implemented in dedicated flash regions on the ESP32.

I found "noinit DRAM" in the docs which is an area of RAM which is preserved across watchdog restarts BUT not across deepsleeps. Then I found RTC_DATA_ATTR memory, which is preserved across deepsleeps, but not across watchdog restarts. In the end, I used both of these concepts in tandem, moving state from one variable to the other at the right times, to achieve persistence across both occurrences.

More info: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/memory-types.html

Then I also did a lot of wireless network testing with wrong access point names, wrong passwords, wrong encryption types, and after a lot of wireless event callback parsing, was able to convert those protocol-level issues into usable feedback for the user, on the display. This is a bit out of scope for this issue, but it should help the users debug the most common wifi issues more easily.

@ThomasFarstrike
Copy link
Author

This is ready and deployed in v2.0.0 in the webinstaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant