Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✅ Fix flaky tests #590

Merged
merged 1 commit into from Feb 16, 2023
Merged

✅ Fix flaky tests #590

merged 1 commit into from Feb 16, 2023

Conversation

alecgibson
Copy link
Collaborator

Fixes share/sharedb-mongo#131

This change attempts to fix some tests that are flaky in sharedb-mongo.

The flakiness can be reproduced locally by wrapping the Agent._querySubscribe() call to _fetchBulkOps() in a long setTimeout():

  if (options.fetchOps) {
    wait++;
    setTimeout(function() {
      console.log('fetch bulk ops');
      agent._fetchBulkOps(collection, options.fetchOps, finish);
    }, 1000);
  }

This forces us into an edge case where the subscribe query triggers and returns the diff from a queryPoll(), which triggers the tests' 'insert' handlers, finishes the test, and closes the backend, all before the _fetchBulkOps() call is issued (which subsequently fails because the database has been closed).

Handling this query subscribe actually triggers a variety of responses to be sent to the client at different times:

  1. The actual _querySubscribe() callback (which ultimately triggers agent._reply() in response to the original request)
  2. Ops sent independently by _fetchBulkOps()
  3. The diff resulting from queryPoll()

In order to reduce flakiness, this change adds checks that the query's 'ready' event has been called, which will happen once the resubscribe request has been replied to by the Agent.

This ensures we've waited for events 1 & 3 of the above list, although we notably aren't waiting for event 2 (which is where the error is actually coming from).

Since no ops will actually be sent to the client, I'm not sure how best to wait for this. Hopefully waiting for the subscribe ack should be sufficient.

Fixes share/sharedb-mongo#131

This change attempts to fix some tests that are flaky in `sharedb-mongo`.

The flakiness can be reproduced locally by wrapping the
`Agent._querySubscribe()` [call to `_fetchBulkOps()`][1] in a long
`setTimeout()`:

```js
  if (options.fetchOps) {
    wait++;
    setTimeout(function() {
      console.log('fetch bulk ops');
      agent._fetchBulkOps(collection, options.fetchOps, finish);
    }, 1000);
  }
```

This forces us into an edge case where the subscribe query triggers and
returns the diff from a [`queryPoll()`][2], which triggers the tests'
`'insert'` handlers, finishes the test, and closes the backend, all
before the `_fetchBulkOps()` call is issued (which subsequently fails
because the database has been closed).

Handling this query subscribe actually triggers a variety of responses
to be sent to the client at different times:

 1. The actual `_querySubscribe()` [callback][3] (which ultimately
    triggers [`agent._reply()`][4] in response to the original request)
 2. Ops sent [independently][5] by [`_fetchBulkOps()`][6]
 3. The diff resulting from [`queryPoll()`][2]

In order to reduce flakiness, this change adds checks that the query's
[`'ready'` event][7] has been called, which will happen once the
resubscribe request has been replied to by the `Agent`.

This ensures we've waited for events 1 & 3 of the above list, although
we notably aren't waiting for event 2 (which is where the error is
actually coming from).

Since no ops will actually be sent to the client, I'm not sure how best
to wait for this. Hopefully waiting for the subscribe ack should be
sufficient.

[1]: https://github.com/share/sharedb/blob/b84290f8a66ee42b3d2c75d6de780e125b99c514/lib/agent.js#L521-L524
[2]: https://github.com/share/sharedb/blob/b84290f8a66ee42b3d2c75d6de780e125b99c514/lib/agent.js#L534
[3]: https://github.com/share/sharedb/blob/b84290f8a66ee42b3d2c75d6de780e125b99c514/lib/agent.js#L511
[4]: https://github.com/share/sharedb/blob/b84290f8a66ee42b3d2c75d6de780e125b99c514/lib/agent.js#L344
[5]: https://github.com/share/sharedb/blob/b84290f8a66ee42b3d2c75d6de780e125b99c514/lib/agent.js#L265
[6]: https://github.com/share/sharedb/blob/b84290f8a66ee42b3d2c75d6de780e125b99c514/lib/agent.js#L523
[7]: https://github.com/share/sharedb/blob/b84290f8a66ee42b3d2c75d6de780e125b99c514/lib/client/query.js#L148
@coveralls
Copy link

Coverage Status

Coverage: 97.471%. Remained the same when pulling 7681077 on flaky-query-test-fix into b84290f on master.

Copy link
Contributor

@ericyhwang ericyhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this out! I'd just finished the debugging yesterday and didn't have time to work on a solution.

This should work fine. Agent#_fetchBulkOps invokes _sendBulkOps prior to calling the callback that sends an ack for the qs action.

In my local testing yesterday, the error was sent down as the ack for qs:

RECV {"a":"qs","id":1,"c":"dogs","q":{},"r":[["fido",1],["spot",1]],"error":{"code":5101,"message":"Already closed"}}

For another time - I was thinking whether there's a more general way to handle the tests suddenly closing the backend. Maybe by also destroying clients created directly with backend.connect()? We can have that discussion elsewhere, though.

@alecgibson alecgibson merged commit bd7ec3f into master Feb 16, 2023
@alecgibson alecgibson deleted the flaky-query-test-fix branch February 16, 2023 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test - "subscribed query gets simultaneous insert and remove after reconnecting"
3 participants