Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32: Fix watchdog timeout handling #1249

Closed
wants to merge 2 commits into from

Conversation

tve
Copy link
Contributor

@tve tve commented Nov 18, 2023

The support in the esp32 xsHost has gotten a bit wonky due to changes in the CONFIG_ESP_TASK constants. Specifically, CONFIG_ESP_TASK_WDT is now CONFIG_ESP_TASK_WDT_EN and there's CONFIG_ESP_TASK_WDT_INIT.

In addition, the task WDT can be enabled at run-time and the timeout can be set/changed at run-time too.

This PR fixes the various #ifdef around the config constants. I wrote a little watchdog module that exercises all this. It can be found at https://github.com/tve/modtve/tree/main/modules/drivers/wdt with an example at https://github.com/tve/modtve/tree/main/modules/examples/wdt. If this module looks good I'm happy to add it to this PR or to modify it to fit. I was hoping it could fit into an ecma-419 Host scheme of sorts...

In any case, there's an issue with the WDT timeout since it can be changed dynamically. The calculation of maxDelayMS in xsHost.c doesn't really work necessarily. To add insult to injury, one can't query the current timeout from esp-idf. The best I can come up with is to at least call some function in xsHost from the WDT module if the user changes the timeout.

I tested on esp32 and esp32-c3

Feedback welcome.

@tve
Copy link
Contributor Author

tve commented Nov 19, 2023

Oops, I missed one instance of CONFIG_ESP_TASK_WDT, will push that later.

@phoddie
Copy link
Collaborator

phoddie commented Nov 20, 2023

This looks very reasonable. I'm not sure how we overlooked the #define name change. I remember testing with WDT recently and was working.

The driver and example would be welcome additions.

The dynamic configuration of the WDT timeout is obscure but I suppose worth trying to support. Your approach is reasonable as a first order attempt. A trickier solution would be to patch into esp_task_wdt_reconfigure to get the value. You could use --wrap to have gcc reroute the call to you.

@tve
Copy link
Contributor Author

tve commented Nov 20, 2023

The dynamic configuration of the WDT timeout is obscure

I ended up there more through trying to figure out the whole thing than by hard requirement. When I discovered that the timeout ends up being global my enthusiasm dropped significantly. I think that a 2-level WDT may be more appropriate with the "application level WDT" implemented in JS. That would be more portable too.

@phoddie
Copy link
Collaborator

phoddie commented Nov 20, 2023

Agreed. The one place I can think of where extending the timeout could be useful is your goal of disabling it when stopped at a breakpoint. But, that could be done transparently to the scripts.

I think that means that nothing needs to change in this PR, so it is good to go as-is?

@tve
Copy link
Contributor Author

tve commented Nov 20, 2023

I need to add the commit for the one instance I missed. I don't think I'll get to it today (lots of git wrangling).
I'm fine if you want to add it:

diff --git a/modules/base/worker/modWorker.c b/modules/base/worker/modWorker.c
index 904880f93..54cbc04c2 100644
--- a/modules/base/worker/modWorker.c
+++ b/modules/base/worker/modWorker.c
@@ -512,7 +512,7 @@ void workerLoop(void *pvParameter)
 {
        modWorker worker = (modWorker)pvParameter;
 
-#if CONFIG_ESP_TASK_WDT
+#if CONFIG_ESP_TASK_WDT_EN
        esp_task_wdt_add(NULL);
 #endif

@phoddie
Copy link
Collaborator

phoddie commented Nov 20, 2023

It'll be cleanest if you can update the PR. Thanks.

@tve
Copy link
Contributor Author

tve commented Nov 28, 2023

I added the missing commit. Ready from my point of view.

@phoddie
Copy link
Collaborator

phoddie commented Nov 29, 2023

Thanks for that. We'll try to get this one merged.

@phoddie
Copy link
Collaborator

phoddie commented Dec 1, 2023

Just FYI – I did some testing with this today. It all seems to work nicely. Setting up the watchdog in sdkconfig seems to get increasingly complex with each ESP-IDF. Nothing we can do about that. FWIW, this worked for me:

CONFIG_ESP_TASK_WDT_EN=y
CONFIG_ESP_TASK_WDT_INIT=y
CONFIG_ESP_TASK_WDT_PANIC=y

Not setting CONFIG_ESP_TASK_WDT_PANIC gave a really strange behavior. The watchdog fires and outputs a bunch of state to the console. Then it keeps running.

@phoddie
Copy link
Collaborator

phoddie commented Dec 18, 2023

Merged.

@phoddie phoddie closed this Dec 18, 2023
@tve
Copy link
Contributor Author

tve commented Dec 21, 2023

Sooo... it turns out this patch is not entirely OK. The problem is that CONFIG_ESP_TASK_WDT_EN defaults to 'y' and CONFIG_ESP_TASK_WDT_INIT is set to 'n' in the default sdkconfig in XS. The result is that esp-idf doesn't like the calls to esp_task_wdt_reset that are happening because of the former (i.e. "enabled bu not initialized"). This is hidden because logging is disabled by default. If one just enables logging everything hangs on the mxDebugMutex because of the avalanche of messages about esp_task_wdt_reset:

I (136) main_task: Calling app_main()                                                  
I (163) uart: queue free spaces: 8                                                     
E (163) task_wdt: esp_task_wdt_add(747): TWDT was never initialized                    
App started                                                                            
This is fx_vprintf                                                                     
I (164) main_task: Returned from app_main()                                            
<?xs.3FFBF7E8?>E (415) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized   
E (415) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized                  
E (418) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (424) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (430) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (436) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (442) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (449) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (455) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (461) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (467) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (473) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (479) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (486) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
E (492) task_wdt: esp_task_wdt_reset(774): TWDT was never initialized
...

I don't know what your thoughts are about the TWDT. I would enable and init it by default. But disabling and not init'ing it by default ought to work as well. But it looks like both CONFIG variables need to be set in concert. I would also set the ESP_LOG to ERROR level and not NONE, but that's probably just me.

@tve
Copy link
Contributor Author

tve commented Dec 21, 2023

For reference, this is what I ended up doing. First changes to sdkconfig.defaults to enable some logging, get a backtrace on crash, and enable the task WDT:

--- a/build/devices/esp32/xsProj-esp32/sdkconfig.defaults
+++ b/build/devices/esp32/xsProj-esp32/sdkconfig.defaults
@@ -205,9 +205,9 @@ CONFIG_ESP32_MEMMAP_TRACEMEM_TWOBANKS=n
 CONFIG_ESP32_TRAX=n
 CONFIG_ESP32_TRACEMEM_RESERVE_DRAM=0x0
 CONFIG_ESP_COREDUMP_ENABLE_TO_FLASH=n
-CONFIG_ESP_COREDUMP_ENABLE_TO_UART=y
+CONFIG_ESP_COREDUMP_ENABLE_TO_UART=n
 CONFIG_ESP_COREDUMP_ENABLE_TO_NONE=n
-CONFIG_ESP_COREDUMP_ENABLE=y
+CONFIG_ESP_COREDUMP_ENABLE=n
 CONFIG_ESP_COREDUMP_UART_DELAY=0
 CONFIG_ESP32_CORE_DUMP_LOG_LEVEL=1
 CONFIG_ESP32_UNIVERSAL_MAC_ADDRESSES_TWO=n
@@ -225,20 +225,21 @@ CONFIG_NEWLIB_STDIN_LINE_ENDING_CRLF=n
 CONFIG_NEWLIB_STDIN_LINE_ENDING_LF=n
 CONFIG_NEWLIB_STDIN_LINE_ENDING_CR=y
 CONFIG_NEWLIB_NANO_FORMAT=n
-CONFIG_ESP_CONSOLE_UART_DEFAULT=n
+CONFIG_ESP_CONSOLE_UART_DEFAULT=y
 CONFIG_ESP_CONSOLE_UART_CUSTOM=n
-CONFIG_ESP_CONSOLE_NONE=y
+CONFIG_ESP_CONSOLE_NONE=n
 CONFIG_ESP_CONSOLE_UART_NUM=0
 CONFIG_ULP_COPROC_ENABLED=n
 CONFIG_ULP_COPROC_RESERVE_MEM=0
 CONFIG_ESP_SYSTEM_PANIC_PRINT_HALT=n
-CONFIG_ESP_SYSTEM_PANIC_PRINT_REBOOT=n
+CONFIG_ESP_SYSTEM_PANIC_PRINT_REBOOT=y
 CONFIG_ESP_SYSTEM_PANIC_SILENT_REBOOT=n
 CONFIG_ESP_SYSTEM_PANIC_GDBSTUB=n
 CONFIG_ESP_DEBUG_OCDAWARE=y
 CONFIG_ESP_DEBUG_STUBS_ENABLE=n
 CONFIG_ESP_INT_WDT=n
-CONFIG_ESP_TASK_WDT_INIT=n
+CONFIG_ESP_TASK_WDT_EN=y
+CONFIG_ESP_TASK_WDT_INIT=y
 CONFIG_ESP_BROWNOUT_DET=y
 CONFIG_ESP_BROWNOUT_DET_LVL_SEL_0=y
 CONFIG_ESP_BROWNOUT_DET_LVL_SEL_1=n
@@ -404,13 +405,13 @@ CONFIG_HEAP_TRACING=n
 #
 # Log output
 #
-CONFIG_LOG_DEFAULT_LEVEL_NONE=y
+CONFIG_LOG_DEFAULT_LEVEL_NONE=n
 CONFIG_LOG_DEFAULT_LEVEL_ERROR=n
-CONFIG_LOG_DEFAULT_LEVEL_WARN=n
+CONFIG_LOG_DEFAULT_LEVEL_WARN=y
 CONFIG_LOG_DEFAULT_LEVEL_INFO=n
 CONFIG_LOG_DEFAULT_LEVEL_DEBUG=n
 CONFIG_LOG_DEFAULT_LEVEL_VERBOSE=n
-CONFIG_LOG_DEFAULT_LEVEL=0
+CONFIG_LOG_DEFAULT_LEVEL=2
 CONFIG_LOG_COLORS=n
 
 #

And in main.c I added a TX buffer for the uart (IIRC the HW fifo is 128 bytes):

--- a/build/devices/esp32/xsProj-esp32/main/main.c
+++ b/build/devices/esp32/xsProj-esp32/main/main.c
@@ -247,10 +256,10 @@ void app_main() {
 
 #ifdef mxDebug
        QueueHandle_t uartQueue;
-       uart_driver_install(USE_UART, UART_FIFO_LEN * 2, 0, 8, &uartQueue, 0);
+       uart_driver_install(USE_UART, UART_FIFO_LEN * 2, UART_FIFO_LEN * 2, 8, &uartQueue, 0);
        xTaskCreate(debug_task, "debug", (768 + XT_STACK_EXTRA) / sizeof(StackType_t), uartQueue, 8, NULL);
 #else
-       uart_driver_install(USE_UART, UART_FIFO_LEN * 2, 0, 0, NULL, 0);
+       uart_driver_install(USE_UART, UART_FIFO_LEN * 2, UART_FIFO_LEN * 2, 0, NULL, 0);
 #endif
 
        xTaskCreate(loop_task, "main", kStack, NULL, 4, NULL);

And in xsPlatform.c I added a timeout to taking the mxDebug mutex:

--- a/xs/platforms/esp/xsPlatform.c
+++ b/xs/platforms/esp/xsPlatform.c
@@ -74,12 +74,14 @@ static void doRemoteCommand(txMachine *the, uint8_t *cmd, uint32_t cmdLen);
        SemaphoreHandle_t gDebugMutex;
 
        #define mxDebugMutexTake() xSemaphoreTake(gDebugMutex, portMAX_DELAY)
+       #define mxDebugMutexTakeTicks(ticks) xSemaphoreTake(gDebugMutex, ticks)
        #define mxDebugMutexGive() xSemaphoreGive(gDebugMutex)
        #define mxDebugMutexAllocated() (NULL != gDebugMutex)
 
        static int fx_vprintf(const char *str, va_list list);
 #else
        #define mxDebugMutexTake()
+       #define mxDebugMutexTakeTicks()
        #define mxDebugMutexGive()
        #define mxDebugMutexAllocated() (true)
 #endif
@@ -1291,9 +1295,9 @@ int fx_vprintf(const char *str, va_list list)
 {
        int result;
 
-       mxDebugMutexTake();
-               result = vprintf(str, list);
-       mxDebugMutexGive();
+       int ok = mxDebugMutexTakeTicks(10);
+       result = vprintf(str, list);
+       if (ok) mxDebugMutexGive();
 
        return result;
 }

There is an issue with the mxDebug mutex, which I suspect is the fact that fxReceiveLoop holds the mutex while calling modMessagePostToMachine, c_malloc, and modWatchDogReset, all of which can trigger an esp-idf log message, which results in deadlock, which I observed. I worked around this by adding more fifo and a timeout in fx_vprintf but the correct solution would be to rework fxReceiveLoop IMHO. (Moving the modWatchDogReset out of the critical section seems trivial.)

@phoddie
Copy link
Collaborator

phoddie commented Dec 21, 2023

These changes (mostly?) seem related to the watchdog when debugging is enabled. As a rule, the watchdog shouldn't be enabled in debug builds because it conflicts with debugging.

I understand that CONFIG_ESP_TASK_WDT_EN and CONFIG_ESP_TASK_WDT_INIT need to be set together. Maybe the right change is to set those both to off (N) in the defaults to reinforce that?

@tve
Copy link
Contributor Author

tve commented Dec 21, 2023

I you want to disable the watchdog in debug mode, then yes, they should both be set to 'n'. I did a quick test and don't see any error/warning messages.

Can you remind me how it interferes with debugging other than the fact that Timer.delay(n) with n>5_000 will trigger the WDT? I just tried stopping at a break point for several minutes and nothing untoward happened. (You're probably correct that it's best to disable it for debug builds, though.)

@phoddie
Copy link
Collaborator

phoddie commented Dec 21, 2023

I'm not confident that we can always guarantee the debugger behavior will be good with the watchdog enabled. I tried to get that right more than once and it resisted.

...then yes, they should both be set to 'n'. I did a quick test and don't see any error/warning messages.

Since you are in that code now and can easily test, would you do a PR for that? It will be a much bigger project for me.

@tve
Copy link
Contributor Author

tve commented Dec 21, 2023

Haha, not easy here either, but you did fix a bunch of things for me... Do you agree with having error logging enabled and backtrace on crash? If you'd like different LMK so I prepare the right PR.

@phoddie
Copy link
Collaborator

phoddie commented Dec 21, 2023

Do you agree with having error logging enabled and backtrace on crash?

On a debug build? I'm OK with backtrace on crash. Not wild about logging. That can generate a fair amount of noise. Generally that Is useful when tracking down native code issues, not script issues, so leaving logging to instrumentation builds seems appropriate for the majority of developers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants