Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix race in SubscriberImplTest::receiveMessage #1586

Merged
merged 1 commit into from
Feb 3, 2017
Merged

fix race in SubscriberImplTest::receiveMessage #1586

merged 1 commit into from
Feb 3, 2017

Conversation

pongad
Copy link
Contributor

@pongad pongad commented Feb 2, 2017

cc @davidtorres

This bug manifests as testBundleAcks deadlocking.

The test

  1. publishes a series of messages to a mock pubsub server,
  2. sets up a Subscriber client, waits to receive the messages back,
  3. and acknowledges them.

For performance reasons, Subscriber does not immediately
send the acknowledgement request to the server.
Instead, it sends acks in batch every 100ms.
Using a fake clock, the test advances the time by 100ms,
sending the ack-batch, then verify that the mock server
receives the acks.

The bug is in step 2. The test thread waits by waiting on a
CountDownLatch, which is counted down by GRPC thread calling
receiveMessage().
However, the method decrements the latch before acking the message.

On a bad day, the test thread can wake up,
advance the clock,
and send the ack-batch
before the GRPC thread could add the message to the batch.
The test then waits
for the server to receive an ack it never sent, deadlocking the test.

The fix is for receiveMessage() to ack the message before
decrementing the counter.

This bug manifests as testBundleAcks deadlocking.

The test
1. publishes a series of messages to a mock server,
2. waits to receive them back,
3. and acknowledges them.

For performance reasons, the client does not immediately
send the acknowledgement request to the server.
Instead, it sends acks in batch every 100ms.
Using a fake clock, the test advances the time by 100ms,
sending the ack-batch, then verify that the mock server
receives the acks.

The bug is in step 2. The test thread waits by waiting on a
CountDownLatch, which is counted down by GRPC thread calling
receiveMessage().
However, the method decrements the latch before acking the message.

On a bad day, the test thread can wake up, advance the clock,
and send the ack-batch before the GRPC thread could add the
message to the batch.
The test then waits for the server to receive an ack it never sent,
deadlocking the test.

The fix is for receiveMessage() to ack the message before
decrementing the counter.
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Feb 2, 2017
@coveralls
Copy link

coveralls commented Feb 2, 2017

Coverage Status

Changes Unknown when pulling 7ce5405 on pongad:fix-race into ** on GoogleCloudPlatform:pubsub-hp**.

@garrettjonesgoogle
Copy link
Member

Your description is hard to follow. Could you add some qualifiers? If the following are incorrect, it indicates that I didn't understand correctly:

  • (mock) server -> (mock) pubsub server
  • client -> subscriber client

Add explicit references to where the following are performed in the code:

  • "the test thread can [...] send the ack-batch"
  • "before the GRPC thread could add the message to the batch"
  • "The test then waits for the server to receive an ack"

It's hard to verify the flow if I have to do the work to correlate these descriptions with the code.

@pongad
Copy link
Contributor Author

pongad commented Feb 2, 2017

@garrettjonesgoogle PTAL

@garrettjonesgoogle
Copy link
Member

LGTM

@pongad pongad merged commit 36e609e into googleapis:pubsub-hp Feb 3, 2017
@pongad pongad deleted the fix-race branch February 3, 2017 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants