-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDS: make writer able to deal with concurrent writes #62
Conversation
This commit will make the CI fail, as this demonstrates utilizing the UDS writer in a concurrent program is currently unsafe. This is triggered because the current UDS class changes the underlying `conn` pointer on errors unsafely. This would show up as: ```runtime error: invalid memory address or nil pointer dereference``` Running the `TestClientUDSConcurrent` test with the race detector shows this as well. ``` ================== WARNING: DATA RACE Read at 0x00c420126000 by goroutine 16: [...] github.com/DataDog/datadog-go/statsd.(*udsWriter).Write() /Users/leo/datadog/go/src/github.com/DataDog/datadog-go/statsd/uds.go:52 +0x14e [...] Previous write at 0x00c420126000 by goroutine 13: [...] github.com/DataDog/datadog-go/statsd.(*udsWriter).Write() /Users/leo/datadog/go/src/github.com/DataDog/datadog-go/statsd/uds.go:46 +0x328 [...] ```
There is a failing test for this usecase ebb1a50 Adding this mutex allows for making sure when a write happens and can decide to change the `conn` attribute (to create a new one or to replace a failing one) it locks the resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix
@truthbk could you please review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 🔒
We tried this change and it killed performance. I would revert since it makes this library usable. |
Question for @nabiber and @LeoCavaille - AFAIK concurrent |
PR DataDog#62 added in a `Mutex` around the Unix socket writer's `Write` method, to protect the logic that conditionally connected the socket if there was not one already. However, because of the lock placement, it serialises all statsd writes globally, which is likely to be a performance bottleneck for highly parallel systems. This commit turns that Mutex into a `RWMutex`. In the case where there is already a socket set, only a `RLock` is needed. In the case where the socket needs to be connected, the a full `Lock` is performed to protect the swapping of `w.conn`. Note that this change, I believe, will make `SetWriteTimeout` meaningless; multiple goroutines will call `conn.SetWriteDeadline` in parallel, and the behaviour of that is to extend the deadline for future _and all in-flight_ requests. If new calls to `SetWriteDeadline` keep getting made, the deadline on an existing blocked call to `Write` will be continuously extended. I'm not really sure how to fix this or whether it's worth fixing though.
…84) * uds: chage Mutex to RWMutex for fast already-connected path PR #62 added in a `Mutex` around the Unix socket writer's `Write` method, to protect the logic that conditionally connected the socket if there was not one already. However, because of the lock placement, it serialises all statsd writes globally, which is likely to be a performance bottleneck for highly parallel systems. This commit turns that Mutex into a `RWMutex`. In the case where there is already a socket set, only a `RLock` is needed. In the case where the socket needs to be connected, the a full `Lock` is performed to protect the swapping of `w.conn`. Note that this change, I believe, will make `SetWriteTimeout` meaningless; multiple goroutines will call `conn.SetWriteDeadline` in parallel, and the behaviour of that is to extend the deadline for future _and all in-flight_ requests. If new calls to `SetWriteDeadline` keep getting made, the deadline on an existing blocked call to `Write` will be continuously extended. I'm not really sure how to fix this or whether it's worth fixing though. * [rebase] moving uds to uds_blocking for rebase * [uds][blocking] fixing bad struct name after rebase
Each commit message has more info.
I added a failing test that highlighted the issue first before adding the code to fix it.
The CI already runs with the race detector enabled so it should be able to test the change properly.