Skip to content

Crash with message "Mutex released without being held" due to race condition in xSemaphoreTake #172

@dariusarnold

Description

@dariusarnold

I observed a crash of InfiniSim after running some amount of time with the message "Mutex released without being held". It would occur relatively frequently though, usually within 5-15 minutes of just letting the program idle.

This is the callstack of the crashing thread:

libc.so.6!__pthread_kill_implementation(int no_tid, int signo, pthread_t threadid) (pthread_kill.c:44)
libc.so.6!__pthread_kill_internal(int signo, pthread_t threadid) (pthread_kill.c:78)
libc.so.6!__GI___pthread_kill(pthread_t threadid, int signo) (pthread_kill.c:89)
libc.so.6!__GI_raise(int sig) (raise.c:26)
libc.so.6!__GI_abort() (abort.c:79)
libstdc++.so.6![Unknown/Just-In-Time compiled code] (Unknown Source:0)
libstdc++.so.6!std::terminate() (Unknown Source:0)
libstdc++.so.6!__cxa_throw (Unknown Source:0)
xSemaphoreGive(SemaphoreHandle_t xSemaphore) (/workspaces/InfiniSim/sim/semphr.cpp:33)
Pinetime::Controllers::DateTime::CurrentDateTime(Pinetime::Controllers::DateTime * const this) (/workspaces/InfiniSim/InfiniTime/src/components/datetime/DateTimeController.cpp:75)
Pinetime::System::SystemTask::Work(Pinetime::System::SystemTask * const this) (/workspaces/InfiniSim/InfiniTime/src/systemtask/SystemTask.cpp:378)
Pinetime::System::SystemTask::Process(void * instance) (/workspaces/InfiniSim/InfiniTime/src/systemtask/SystemTask.cpp:96)
sdl_function_wrapper(void * instance) (/workspaces/InfiniSim/sim/task.cpp:61)
libSDL2-2.0.so.0![Unknown/Just-In-Time compiled code] (Unknown Source:0)
libc.so.6!start_thread(void * arg) (pthread_create.c:442)
libc.so.6!clone() (clone.S:100)

The crash is caused by a race condition in xSemaphoreTake and insufficient error handling in DateTimeController::CurrentDateTime(). The function xSemaphoreTake waits for the semaphore to be available to take by looping + sleeping until there is an element in the pxQueue->queue. Then it locks the pxQueue->mutex and checks the queuesize again. Since the first check is not done under the mutex, some other thread could remove the element from the queue before the mutex is locked. Then the second check in xSemaphoreTake will return false.
The method DateTimeController::CurrentDateTime() uses a semaphore as a mutex to guard calls to UpdateTime(). But it (and the other methods from DateTimeController) do not check if xSemaphoreTake actually gave them the semaphore. They will try to give it back even if they failed to take it, which leads to the crash in xSemaphoreGive.
The other thread in this crash was calling the watchface refresh method, which also calls CurrentDateTime().

I have opened a PR which fixes the crash by fixing the race condition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions