Skip to content

chan_console: Fix deadlock caused by unclean thread exit. #309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

InterLinked1
Copy link
Contributor

To terminate a console channel, stop_stream causes pthread_cancel to make stream_monitor exit. However, commit 5b8fea9 added locking to this function which results in deadlock due to the stream_monitor thread being killed while it's holding the pvt lock.

To resolve this, a flag is now set and read to indicate abort, so the use of pthread_cancel and pthread_kill can be avoided altogether.

Resolves: #308

@InterLinked1
Copy link
Contributor Author

cherry-pick-to: 18
cherry-pick-to: 20
cherry-pick-to: 21

To terminate a console channel, stop_stream causes pthread_cancel
to make stream_monitor exit. However, commit 5b8fea9
added locking to this function which results in deadlock due to
the stream_monitor thread being killed while it's holding the pvt lock.

To resolve this, a flag is now set and read to indicate abort, so
the use of pthread_cancel and pthread_kill can be avoided altogether.

Resolves: asterisk#308
@seanbright
Copy link
Contributor

Isn’t this what pthread_cleanup_push() is for? That would be cleaner IMO.

@InterLinked1
Copy link
Contributor Author

Isn’t this what pthread_cleanup_push() is for? That would be cleaner IMO.

Maybe cleaner than setting the cancellability, but it's a little messy and using pthread_cancel is never very "clean", in my opinion, so if it can be avoided that's probably more ideal.

I restored the pthread_kill so if the read is blocking for some reason, it should still wake up, and exit after it releases the mutex, and then the thread can join.

@seanbright
Copy link
Contributor

using pthread_cancel is never very "clean", in my opinion

And why is that?

@InterLinked1
Copy link
Contributor Author

using pthread_cancel is never very "clean", in my opinion

And why is that?

Because the thread that's cancelled doesn't get to control how it exits. You can add a cleanup handler but now you'd probably need to keep track of whether it's holding a mutex, for example, and then release it only then.

This guy here has a good explanation that says it better than I can: https://stackoverflow.com/a/3438576

I subscribe to the theory that each thread should totally control its own resources, including its execution lifetime. I've always found that to be the best way to avoid deadlock.

pthread_cancel was fine before the commit that added locking, but now that there are resources fine, it's cleaner to not use pthread_cancel and have the thread control where it exits.

@seanbright
Copy link
Contributor

Is Pa_ReadStream() a cancellation point? I don’t see how calling pthread_cancel() here could leave pvt locked.

@InterLinked1
Copy link
Contributor Author

Is Pa_ReadStream() a cancellation point? I don’t see how calling pthread_cancel() here could leave pvt locked.

Internally, probably (internally, I imagine it would call read somewhere), because it was exiting while holding the pvt lock. There's an example in the linked issue.

@seanbright
Copy link
Contributor

Because the thread that's cancelled doesn't get to control how it exits.

Of course it does. pthread_testcancel() and pthread_cancel() are cooperative.

@InterLinked1
Copy link
Contributor Author

Because the thread that's cancelled doesn't get to control how it exits.

Of course it does. pthread_testcancel() and pthread_cancel() are cooperative.

Yes, but if PA_readstream internally calls pthread_testcancel, which I'm almost positive it does (perhaps via read or poll), then you're screwed because it will exit at the wrong cancellation point. That's why cancellability would need to be disabled while it's holding the mutex, to ensure it never exits in the middle.

I believe by default all threads are cancellable so without this there isn't a guarantee it won't be cancelled in the middle. I originally tested it this way before changing it to a flag. This should also work, if you strongly prefer this approach to setting a flag and not using pthread_cancel, I could change it back, they should both do the same thing.

@seanbright
Copy link
Contributor

seanbright commented Sep 13, 2023

My preference would be to use cancellation but I am not writing the code. If you've confirmed this fixes the issue then it's fine.

@gtjoseph gtjoseph added the cherry-pick-test Trigger dry run of cherry-picks label Sep 20, 2023
@github-actions github-actions bot added cherry-pick-testing-in-progress Cherry-Pick tests in progress cherry-pick-checks-passed Cherry-Pick checks passed and removed cherry-pick-test Trigger dry run of cherry-picks cherry-pick-testing-in-progress Cherry-Pick tests in progress labels Sep 20, 2023
@github-actions
Copy link

Successfully merged to branch master and cherry-picked to ["18","20","21"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug]: chan_console: Deadlock when hanging up console channels
4 participants