Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in event-handling thread example #1291

Open
Erlkoenig90 opened this issue Jun 15, 2023 · 5 comments
Open

Race condition in event-handling thread example #1291

Erlkoenig90 opened this issue Jun 15, 2023 · 5 comments
Labels
core Related to common codes Documentation enhancement New features

Comments

@Erlkoenig90
Copy link

In the documentation for implementing event handling threads, the example code about quitting such a thread without using hotplug has a race condition that can lead to the thread not quitting immediately, but running into the internal timeout (60sec), possibly freezing an application upon quitting.

The proposed event handling thread:

void *event_thread_func(void *ctx)
{
    while (event_thread_run)
        libusb_handle_events (ctx);
 
    return NULL;
}

When this thread's execution is right after the check to event_thread_run and before the call to libusb_handle_events, and then another thread calls the proposed my_close_handle function, the event handling thread will still enter libusb_handle_events which doesn't see anything to do (as no devices are open anymore) and will then run until the internal timeout of 60sec expires.

Does this mean there is currently no safe race-free way to quit an event-handling thread? Perhaps we need a way to send/dispatch custom callbacks to the currently executing event loop. This could be used to send a "quit" message to the event handling thread, which could set event_thread_run and therefore quit the event handling thread immediately and safely.

@mcuee mcuee added core Related to common codes Documentation labels Jun 15, 2023
@Youw
Copy link
Member

Youw commented Jun 15, 2023

From libusb_handle_events:

This function is kept primarily for backwards compatibility. All new code should call libusb_handle_events_completed() or libusb_handle_events_timeout_completed() to avoid race conditions.

Maybe the documentation should indeed be updated.

@Youw
Copy link
Member

Youw commented Jun 15, 2023

BTW: I've never seen any hangs when closing the device in HIDAPI and it uses libusb_handle_events.

@Erlkoenig90
Copy link
Author

How would I use libusb_handle_events_completed() to implement an event handling thread? Particularly, how would I set the "completed" flag, especially from outside an event handler? Should I manually acquire the events lock to synchronize access to the flag?

I would like to implement an event thread that forcibly processes all events, i.e. even when I use synchronous I/O functions on other threads, those should never do the event processing and call the callbacks; those should always come from my "main" event handling thread. Therefore, I thought I could lock the event lock permanently in this thread to stop other threads from processing and run libusb_handle_events_locked in a loop. However, this would mean I couldn't use the event lock to protect accesses to the completed flag, as the lock would be always locked. Is there a good way to achieve this?

@Erlkoenig90
Copy link
Author

Erlkoenig90 commented Jun 16, 2023

PS: Isn't there also a race condition in libusb_handle_events_timeout_completed? If right after checking completed but before calling handle_events (i.e. during the usbi_debug statement), completed is set to 1 by another thread, we won't notice this but enter handle_events anyways. If no events are pending, we won't return even though another thread requested that we do.

The other thread could try to acquire the event lock before setting completed. However, the thread in libusb_handle_events_timeout_completed might acquire it for a long time while no events are pending an, so the other thread would also hang. It appears completed can only be safely set from within an event handler.

Perhaps we need to refactor the "completed" interface, as this also caused #1263. Perhaps there should be a dedicated mutex protecting access to the completed integer. Maybe something like this:

Define an opaque data structure for signaling completion:

struct libusb_completion_signal {
  int completed;
  usbi_mutex_t mutex;
};

Give API users a way to set it to complete (which can then be called from all contexts, including other threads):

void libusb_signal_completion (struct libusb_context *ctx, struct libusb_completion_signal* completion_signal) {
  usbi_mutex_lock (&completion_signal->mutex);
  completion_signal->completed = 1;
  usbi_mutex_unlock (&completion_signal->mutex);
  usbi_signal_event(&ctx->event);
}

And then, change libusb_handle_events_timeout_completed to accept the signaling structure:

int API_EXPORTED libusb_handle_events_timeout_completed(libusb_context *ctx,
	struct timeval *tv, struct libusb_completion_signal *completion_signal)
{
	int completed = 0;
	...
  retry:
	if (completion_signal != NULL) {
        	usbi_mutex_lock (&sign->mutex);
		completed = completion_signal->completed;
		usbi_mutex_unlock (&sign->mutex);
	}
	if (libusb_try_lock_events(ctx) == 0) {
		if (!completed) {
			/* we obtained the event lock: do our own event handling */
			usbi_dbg(ctx, "doing our own event handling");
			r = handle_events(ctx, &poll_timeout);
		}
		libusb_unlock_events(ctx);
		return r;
	}
        ...

The mutex in libusb_completion_signal would protect accesses to the completed integer, making sure it is always read/written consistently. This would also mean that the integer need not be accessed under the events lock. If another thread calls libusb_signal_completion while the event-handling thread is at the aforementioned usbi_dbg line, the race condition still happens, but because usbi_signal_event is called, handle_events will return immediately, preventing the thread from hanging.

We could then even revert #1267 but instead call libusb_signal_completion at the very end(!) of sync_transfer_cb. Since libusb_signal_completion uses synchronization primitives this should be save, and even more efficient, as the waiters lock isn't acquired for a (potentially) long time.

This would also make it easy to quit an event-handling thread at any point in time, even if no devices are open at all (thereby obviating the need for the libusb_close-hack of quitting the thread when the last device was closed).

Is that a remotely good idea?

@mcuee mcuee added the enhancement New features label Jun 16, 2023
@mcuee
Copy link
Member

mcuee commented Oct 9, 2023

@tormodvolden

How do you like the enhancement idea from @Erlkoenig90? Please help to comment when you have some time. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to common codes Documentation enhancement New features
Projects
None yet
Development

No branches or pull requests

3 participants