Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZMQ] append a message sequence number to every ZMQ notification #7762

Merged
merged 2 commits into from
Apr 19, 2016

Conversation

jonasschnelli
Copy link
Contributor

@jonasschnelli jonasschnelli commented Mar 29, 2016

Currently, ZMQ listeners cannot detect if they have missed a message.
This PR adds a sequence number to each message (each message type has its own counter).

The sequence number is the last part of the multipart zmq message to not break the API, though, we could consider breaking the API in favor of moving the sequence number to the very beginning.

Todo:

  • update release notes

@laanwj
Copy link
Member

laanwj commented Mar 29, 2016

Concept ACK, I like adding the sequence nr as an extra part. Also ok with using 32 bit sequence numbers.

@promag
Copy link
Member

promag commented Mar 30, 2016

I thought a subscriber can't miss a message.

Edit:

But it can: http://stackoverflow.com/a/15821036. So what can a subscriber do if it detects a missing message?

@jonasschnelli
Copy link
Contributor Author

@promag:
I'm not sure if I would design an listening application that fully rely on getting all notifications. With this PR, you could at least detect if you got all (also check if you got the first) notifications and if not, you could poll/sync your data over RPC.

@laanwj
Copy link
Member

laanwj commented Mar 30, 2016

I thought a subscriber can't miss a message.

The way we are using ZMQ, the sender will never block. This is sensible as we don't want to hang the application due to a slow client. But this means at some point one of the send queues will fill up and new messages will be discarded.

But it can: http://stackoverflow.com/a/15821036. So what can a subscriber do if it detects a missing message?

That's up to the application. Either stop with a fatal error, or re-start/re-sync. An event application usually does some synchronization at start-up, then starts processing incremental changes. After getting out of sync it can repeat the process. The important thing is being able to detect it and thus act on it.

@laanwj
Copy link
Member

laanwj commented Apr 14, 2016

This needs mention in doc/release-notes.md as well as doc/zmq.md (there, don't forget to specify the size and endianness of the sequence number), I think it is ready for merge otherwise.

body = msg[1]
nseq = msg[2]
msgSequence = struct.unpack('<I', str(msg[-1]))[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for using str() here?

@jonasschnelli jonasschnelli force-pushed the 2016/03/zmq_seq branch 2 times, most recently from c0cdcce to fefecf8 Compare April 15, 2016 13:07
@jonasschnelli
Copy link
Contributor Author

Fixed @MarcoFalke nit.
Mentioned the change in doc/release-notes.md as well as in doc/zmq.md

@@ -47,11 +48,17 @@ def run_test(self):
print "listen..."
msg = self.zmqSubSocket.recv_multipart()
topic = msg[0]
assert_equal(topic, "hashtx")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Add the b prefix, which is a noop in py2?

b"hashtx"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@jonasschnelli
Copy link
Contributor Author

@MarcoFalke: fixed nits.

// Internal function to send multipart message
static int zmq_send_multipart(void *sock, const void* data, size_t size, ...)
static int zmq_send_multipart_keepalive(void *sock, const void* data, size_t size, ...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the rename to _keepalive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to _keepalive because it does not "close" the multipart message. All messaged are sent with ZMQ_SNDMORE and I wanted to increase the awareness of that by renaming the function.

@laanwj
Copy link
Member

laanwj commented Apr 19, 2016

Here's a slight simplification of the code - gets rid of some duplication, as well as the _keepalive change:
laanwj@fb843df

@jonasschnelli
Copy link
Contributor Author

@laanwj: Thanks. Much better. Stolen fb843df, squashed and force pushed.

@laanwj laanwj merged commit 0b25a9f into bitcoin:master Apr 19, 2016
laanwj added a commit that referenced this pull request Apr 19, 2016
…fication

0b25a9f [ZMQ] append a message sequence number to every ZMQ notification (Jonas Schnelli)
de821d5 [ZMQ] refactor message string (Jonas Schnelli)
@dcousens
Copy link
Contributor

dcousens commented Apr 21, 2016

The way we are using ZMQ, the sender will never block. This is sensible as we don't want to hang the application due to a slow client.

Could we make that parameterizable?
In certain cases I'd rather have message reliability than having the node to be guaranteed to be running in real time.

@sipa
Copy link
Member

sipa commented Apr 21, 2016 via email

@laanwj
Copy link
Member

laanwj commented Apr 21, 2016

In certain cases I'd rather have message reliability than having the node to be guaranteed to be running in real time.

Well notifications can be lost, through zmq or otherwise, for example if bitcoind needs to be restarted, or a myriad of other circumstances not under your control. At least you can detect it now:

  • sequence number goes to 0 (and wasn't at 0xffffffff) -> bitcoind restarted
  • sequence number skips a beat -> send buffer was full

Your application needs an (application dependent) way to resync anyhow. This is better than pretending to guarantee something.

zkbot added a commit to zcash/zcash that referenced this pull request Feb 9, 2017
zkbot added a commit to zcash/zcash that referenced this pull request Feb 9, 2017
codablock added a commit to codablock/dash that referenced this pull request Dec 20, 2017
CryptoCentric added a commit to absolute-community/absolute that referenced this pull request Feb 15, 2019
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants