Race condition when locking Linux processes #30

acj · 2021-07-31T16:31:16Z

We've had a couple of reports that locking processes on busy production servers can fail, returning an ESRCH error (no such process). My best guess is that it's related to threads exiting between the calls to self.threads() and thread.lock here, resulting in a lock error. The locking code is tolerant of new threads spinning up during the loop but immediately returns an error if a thread fails to lock.

Would it be acceptable for remoteprocess to log (e.g. debug/warn) failed locks but not return an error? Or to count the number of failed thread locks and only return an error if all of them fail? I'd be happy to make a PR if we can agree on a path.

Downstream issue with repro steps: rbspy/rbspy#334

The text was updated successfully, but these errors were encountered:

benfred · 2021-07-31T18:23:05Z

We should handle the race condition - we're tolerant of new threads being creating during this lock call, and we should also allow threads to exit .

How about we detect the in ESRCH error in thread.lock - and just ignore it ? I'd like to still fail out on permission denied errors, or if there were no threads we managed to lock though.

I'd be happy to accept a PR for this -

acj mentioned this issue Jul 31, 2021

Ignore ESRCH errors when locking Linux threads #31

Merged

benfred closed this as completed in #31 Aug 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition when locking Linux processes #30

Race condition when locking Linux processes #30

acj commented Jul 31, 2021

benfred commented Jul 31, 2021

Race condition when locking Linux processes #30

Race condition when locking Linux processes #30

Comments

acj commented Jul 31, 2021

benfred commented Jul 31, 2021