Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDS: make writer able to deal with concurrent writes #62

Merged
merged 2 commits into from
Aug 22, 2018

Conversation

LeoCavaille
Copy link
Member

@LeoCavaille LeoCavaille commented Aug 16, 2018

Each commit message has more info.
I added a failing test that highlighted the issue first before adding the code to fix it.

The CI already runs with the race detector enabled so it should be able to test the change properly.

This commit will make the CI fail, as this demonstrates utilizing the
UDS writer in a concurrent program is currently unsafe.

This is triggered because the current UDS class changes the underlying
`conn` pointer on errors unsafely.

This would show up as:
```runtime error: invalid memory address or nil pointer dereference```

Running the `TestClientUDSConcurrent` test with the race detector shows
this as well.
```
==================
WARNING: DATA RACE
Read at 0x00c420126000 by goroutine 16:
  [...]
  github.com/DataDog/datadog-go/statsd.(*udsWriter).Write()
      /Users/leo/datadog/go/src/github.com/DataDog/datadog-go/statsd/uds.go:52 +0x14e
  [...]

Previous write at 0x00c420126000 by goroutine 13:
  [...]
  github.com/DataDog/datadog-go/statsd.(*udsWriter).Write()
      /Users/leo/datadog/go/src/github.com/DataDog/datadog-go/statsd/uds.go:46 +0x328
  [...]
```
There is a failing test for this usecase ebb1a50
Adding this mutex allows for making sure when a write happens and can
decide to change the `conn` attribute (to create a new one or to replace
a failing one) it locks the resource.
@LeoCavaille LeoCavaille changed the title UDS: make writer aconcurrent UDS: make writer able to deal with concurrent writes Aug 16, 2018
Copy link
Contributor

@xvello xvello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix
@truthbk could you please review?

Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 🔒

@LeoCavaille LeoCavaille merged commit 281ae9f into master Aug 22, 2018
@LeoCavaille LeoCavaille deleted the leo/fix-concurrency branch August 22, 2018 15:14
@nabiber
Copy link

nabiber commented Dec 4, 2018

We tried this change and it killed performance. I would revert since it makes this library usable.

josephmi added a commit to josephmi/datadog-go that referenced this pull request Dec 4, 2018
nabiber added a commit to UnityTech/datadog-go that referenced this pull request Dec 4, 2018
@KJTsanaktsidis
Copy link

Question for @nabiber and @LeoCavaille - AFAIK concurrent Write calls to a unix datagram socket should be OK, because they'll just be separate datagrams. The locking is really only needed around the connect-if-not-connected logic, no? Could we use a sync.RWMutex instead around those code paths so that the locking actually only happens when a connection is needed?

KJTsanaktsidis added a commit to zendesk/datadog-go that referenced this pull request Dec 11, 2018
PR DataDog#62 added in a `Mutex` around the Unix socket writer's `Write` method,
to protect the logic that conditionally connected the socket if there
was not one already. However, because of the lock placement, it
serialises all statsd writes globally, which is likely to be a
performance bottleneck for highly parallel systems.

This commit turns that Mutex into a `RWMutex`. In the case where there is
already a socket set, only a `RLock` is needed. In the case where the
socket needs to be connected, the a full `Lock` is performed to protect
the swapping of `w.conn`.

Note that this change, I believe, will make `SetWriteTimeout`
meaningless; multiple goroutines will call `conn.SetWriteDeadline` in
parallel, and the behaviour of that is to extend the deadline for future
_and all in-flight_ requests. If new calls to `SetWriteDeadline` keep
getting made, the deadline on an existing blocked call to `Write` will
be continuously extended. I'm not really sure how to fix this or whether
it's worth fixing though.
truthbk added a commit that referenced this pull request Apr 8, 2019
…84)

* uds: chage Mutex to RWMutex for fast already-connected path

PR #62 added in a `Mutex` around the Unix socket writer's `Write` method,
to protect the logic that conditionally connected the socket if there
was not one already. However, because of the lock placement, it
serialises all statsd writes globally, which is likely to be a
performance bottleneck for highly parallel systems.

This commit turns that Mutex into a `RWMutex`. In the case where there is
already a socket set, only a `RLock` is needed. In the case where the
socket needs to be connected, the a full `Lock` is performed to protect
the swapping of `w.conn`.

Note that this change, I believe, will make `SetWriteTimeout`
meaningless; multiple goroutines will call `conn.SetWriteDeadline` in
parallel, and the behaviour of that is to extend the deadline for future
_and all in-flight_ requests. If new calls to `SetWriteDeadline` keep
getting made, the deadline on an existing blocked call to `Write` will
be continuously extended. I'm not really sure how to fix this or whether
it's worth fixing though.

* [rebase] moving uds to uds_blocking for rebase

* [uds][blocking] fixing bad struct name after rebase
@truthbk truthbk added this to the 2.2 milestone Apr 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants