Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread sanitizer reveals missing locks #1374

Closed
jefftrull opened this issue Aug 8, 2019 · 9 comments

Comments

@jefftrull
Copy link

commented Aug 8, 2019

I recently ran a client's software with gcc's thread sanitizer enabled. It revealed some issues in mosquitto that I've worked around in a fork but I felt I should document them here. The problems are:

  • Inconsistent protection of mosq->in_callback and mosq->state by their respective mutexes
  • mosq->sock may need its own mutex

I've attached a very rough patch showing where the issues are.

@ralight

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2019

Thanks for this. Out of interest, is the code using loop_start(), or threads external to the library?

@jefftrull

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

External threads in this case. Do you suspect a user error :)

(we are calling mosquitto_threaded_set)

@ralight

This comment has been minimized.

Copy link
Contributor

commented Aug 8, 2019

Not at all, it's just good to know the situation.

I've not look at it properly yet, but I'm curious to know your reasoning as to why you've removed the mutex from the callback calls.

@jefftrull

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

I found that mosquitto_reinitialise was getting called from them and thus the mutexes were reset, causing the following unlock to silently fail... Could this be something we're doing wrong?

@jefftrull

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

mosq->in_callback still gets protected

@jefftrull

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

I'm sorry, mosquitto_reinitialise is not getting called again. Something else is happening in the callback that is messing up the locks somehow. Investigating...

@jefftrull

This comment has been minimized.

Copy link
Author

commented Aug 8, 2019

OK, thanks for pointing this out. The changes in locking of callback_mutex were a rabbit hole triggered by the fact that we call mosquitto_disconnect from our main thread, taking a path where callback_mutex was not already held. It looks like the (much) easier fix is to change a single test in packet_mosq.c - the value of in_callback is only interesting in non-threaded mode there, so it's safe.
New diff attached

@jefftrull

This comment has been minimized.

Copy link
Author

commented Aug 16, 2019

Further investigation reveals that the first change in that last diff (removing the access to mosq->sock) is also unnecessary. Every remaining change protects mosq->state.

@ralight ralight added this to the 1.6.5 milestone Sep 3, 2019
@ralight

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2019

Thanks, I've now added a commit that implements this.

@ralight ralight closed this Sep 8, 2019
ralight added a commit that referenced this issue Sep 8, 2019
Closes #1374. Thanks to Jeff Trull.
ralight added a commit that referenced this issue Sep 18, 2019
Closes #1374. Thanks to Jeff Trull.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.