Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sync bugs with new v8 connected and started properties #94

Merged
merged 15 commits into from
Jun 30, 2020

Conversation

okdistribute
Copy link
Collaborator

@okdistribute okdistribute commented Jun 27, 2020

Found these bugs while updating Mapeo Desktop to the latest v8.

This will be published as @mapeo/core v8.2.2

Sync peers should always emit the close event

Added a failing (now passing!) test where peer.started was never getting set, which lead to the replication-complete state never firing.

For example, this state should not exist, where replication-progress is the state and started is false.

Screenshot from 2020-06-27 00-25-01

The test also captures this by making sure peer.started is set on the first progress event.

The culprit?

self._activeSyncs gets into an incorrect state when a peer would disconnect without syncing, and then reconnect.

Peer A joins
Peer B joins
Peer A closes
-1 activeSyncs
Peer A joins
Peer B syncs with Peer A
-1 activeSyncs + 1 = 0
0 activeSyncs means that sync-start is never emitted

Sending replicateNetwork too quickly would sometimes not do anything

If peer.connected the sync start doesn't go through and nothing happens.

Now, it sends an error if the peer is not connected when replicate is called.

See https://github.com/digidem/mapeo-core/pull/94/files#diff-39367c259f0252fc4bc764d86564411bR234-R239

An alternative way to do this would be to create an internal queue of syncronizations that have been requested, and once a peer reconnects, see if a replication has already been requested and drain that queue. But this seemed to be unnecessary as consumer applications should not allow syncronization if the peer is disconnected. (although it would be a nice to have convenience for developer ux)

Syncronizing and then disconnecting would not properly set peer.connected = false

Peer A joins
Peer B joins
Peer B and A syncronize
Peer A quits
peer.connected is true during the down event. It should be false as the peer is now disconnected.

This is fixed by setting peer.connected = false in the onClose function before the down event is emitted with the peer -- otherwise a consumer application will think that the peer is still connected when it is not.

See https://github.com/digidem/mapeo-core/pull/94/files#diff-39367c259f0252fc4bc764d86564411bR422

@hackergrrl
Copy link
Contributor

Great catch @okdistribute 🎉

@okdistribute okdistribute force-pushed the v8-close-bug branch 2 times, most recently from 0f78a43 to 71c543d Compare June 27, 2020 22:01
sync.js Outdated
process.nextTick(function () {
peer.sync.emit('error', new Error('trying to sync before handshake has occurred'))
process.nextTick(() => {
this._syncQueue.set(peer.id, peer)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noffle please review!
I added this because if you hit replicateNetwork twice in a row, it would throw an error immediately. This gives the consumer a bit more of what they expect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still applicable? I see you removed the sync queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still applicable? I see you removed the sync queue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I removed it!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some edge cases and it became clear that it wasn't necessary now although a nice to have, better to push it out the door

@@ -1,7 +1,6 @@
language: node_js
node_js:
- '8'
- '10'
- '12'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmaclennan is the mobile app also using node 12 to build?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okdistribute the mobile app is running on Node 10 (I assume you mean the Node version it is running mapeo-core on, vs. the Node version used to build the app?). The latest nodejs-mobile is using Node v12.6.3 and we should be able to update to that without issues, I had only been holding off until we are sure that the stack runs ok on Node v12 (I remember that we had issues with level etc. not supporting Node v10 before). If we know the mapeo-core stack works with Node v12 then I can update MM.

@okdistribute okdistribute changed the title fix: Sync peers should always emit the close event fix: Syncornization fixes for v8 Jun 28, 2020
@okdistribute okdistribute changed the title fix: Syncornization fixes for v8 fix: sync fixes for v8 Jun 28, 2020
@okdistribute okdistribute changed the title fix: sync fixes for v8 fix: sync bugs with new v8 connected and started properties Jun 28, 2020
Copy link
Member

@gmaclennan gmaclennan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully parsed the logic here, but on a quick read through this all looks great. In a future PR, would it be possible to queue a sync request until a handshake completes instead of throwing an error? I feel like that is a slightly confusing reason for an error for the user. Ideal behaviour would be:

  1. Click sync before handshake
  2. Handshake completes
  3. Continue sync if handshake was successful, throw error if handshake failed.

But glad this fixes the other errors

return this.peers().filter(this._isactive)
}

_isactive (peer) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and _isclosed don't need to be methods (they don't reference this, and aren't public methods) so they could be standalone functions. But that's just a stylistic thing.

Copy link
Collaborator Author

@okdistribute okdistribute Jun 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the peer object is encapsulated by the SyncState class, so these methods should be in the SyncState (where the peer object is created and managed) to maintain that. Ideally, the peer object would be it's own class and these methods would live there.

Currently PeerState is operating more like a struct or complex data type rater than a class, so it doesn't feel appropriate to add methods there either.

@okdistribute okdistribute merged commit 1ad1ac1 into v8 Jun 30, 2020
@okdistribute okdistribute deleted the v8-close-bug branch June 30, 2020 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants