Redis Cluster: Publish propagation optimization #122

Open
antirez opened this Issue Oct 7, 2011 · 6 comments

Comments

Projects
None yet
3 participants
@antirez
Owner

antirez commented Oct 7, 2011

Redis Cluster is able to propagate all the Publish operations to all the connected nodes.

This way you can PUBLISH to every node, and SUBSCRIBE to any other node. All the messages are duplicated across nodes.
Since in Redis the resources needed for a PUBLISH are mostly proportional to the number of subscribers to a given channel, AND to the number of subscribers to any pattern, this is a good way to start.

However if you have a very bigger cluster, with for instance 1000 nodes, it's not good I/O wise to copy the message to all the nodes. So the idea is to implement a Bloom filter based approach to deliver messages only to interested nodes.

Nodes already use the "Cluster Bus" to propagate PUBLISH operations (our cluster bus is just a node-to-node TCP socket using a binary protocol). We can use that bus to also update the bloom filter when there is a new subscriber. From time to time the node will reset the bloom filter (to avoid pollution on unsubscribes) and will broadcast it again.

This works great for channels. For patterns it is possible to use a prefix approach, that is, we can have two bloom filters, one for channels, and one for patterns. The prefix of every pattern up to a given length (for instance 8 bytes) is used to set the bloom filter. For instance if node A has a client subscribed to "foo*" it sets the bloom filter to prefix "f", "fo", and "foo". This way it is possible to also handle the patterns case.

A simpler alternative is to set the filter to all ones if there is at least one subscriber to a pattern channel (PSUBSCRIBE).

This is just an initial idea, we'll have time to improve this in release two of Redis Cluster. Likely the first release will just have the current "brute force" support.

Design proposal

(as sketched during a private email exchange)

Every node takes a bloom filter of its subscribers, from time to time
the filter is recomputed when there are enough unsubscribed nodes,
since the node is otherwise too much "polluted".
Every node also takes in memory the bloom filter of every other node:
it is used to check if a message should be sent to a given node.
A node initializes the bloom filter of other nodes as a bitmap of "1",
so if we don't have information about a node, we send messages by
default.

Now the first technical problem is that this filters must be updated,
so nodes perform the following operations:

  1. In PING/PONG packets there is always the filter fingerprint of the
    node sending the packet. It is simply the CRC64 of the filter.
  2. If a receiving node detects that there is a mismatch between the
    CRC64 of the sending node, and its bloom filter about it, it sends a
    PUBSUB_BF_REQUEST packet (just an example, naming is hard ehm..). The
    node getting the BF_REQUEST will reply with a message describing its
    bloom filters.
  3. Also, when a node gets a new subscriber, it broadcasts its new
    filter only if the new subscriber actually made at least one bit
    changing in the filter. Otherwise the filter is the same, no need to
    broadcast it.
  4. As a side effect, when the filter is too polluted, and the node
    recomputes it, the CRC64 checksums will mismatch and all the nodes
    will end with the new filter eventually.

Cool so far, apart from the problem, that will be documented with red
characters unside a tag, that there is a delay between the
moment your client subscribes, and the moment it starts receiving all
the messages.
Also, and this already happens, if there is a connection issue between
nodes, you may lose messages without knowing it, even if this latter
issue is fixable in some way by using message IDs internally so that
we disconnect all the subscribers when we detect they lost a message,
or alternatively, we can inform them with a special message.

Now what about PSUBSCRIBE? The easy way is to say, if there is at
least a client subscribed to a pattern, our bitmap is a solid wall of
"1" :-) We get everything and so far. This would put us back to the
current implementation, de facto, as soon as there is a single
subscriber. There is a better way using the pattern prefixes.

For example a given node may have two psubscribes to the following two patters:

user1:*
alerts:*

What about if we have another bloom filter (and two CRC64s in the
header and so forth)? In this second bloom filter what we do is to set
all the substrings of the pattern up to the first jolly character
(whatever it is: [, _, ?).
So we set "user1:" and "alerts:". In the special case of the jolly
char starting at position 0, we fill the filter all with 1s, like in
the case of the pattern "_foo*".

Now another node gets two PUBLISH messages:

PUBLISH user1:foo
PUBLISH user2:bar

The target node subscribers filter does not match, but it may have
patterns. In the special case the pattern is all blank, we skip any
processing at all, there are no subscribers to patterns in that target
node.
Otherwise we start to see if it is possible for the node to be a
receiver of our message. We start checking if any substring of our
channel matches with the pattern substring.

So we check "u", "us", "use", "user", "user1", ... up to "user1:foo".
If we find a match we send the message, otherwise we don't.

It is computationally not exactly a joke (but we are not going to use
SHA1 I guess to implement this kind of bloom filter...) but the
computation can be skipped for all the nodes that don't have
subscribers to patterns.

@melo

This comment has been minimized.

Show comment Hide comment
@melo

melo Oct 7, 2011

Contributor

A possible optimization when cleaning the bloom filter: when a node receives a PUBLISH for whom he found no subscribers, he could broadcast back that to update the bloom filters.

Contributor

melo commented Oct 7, 2011

A possible optimization when cleaning the bloom filter: when a node receives a PUBLISH for whom he found no subscribers, he could broadcast back that to update the bloom filters.

@antirez

This comment has been minimized.

Show comment Hide comment
@antirez

antirez Oct 7, 2011

Owner

I think the bloom filters are just one way, you can't remove items from a bloom filter. But I think your idea can be modified to work this way: a node takes an internal counter about mismatches. Once N publish packets are received about non existing channels, then the bloom filter is cleared and rebuild.

Owner

antirez commented Oct 7, 2011

I think the bloom filters are just one way, you can't remove items from a bloom filter. But I think your idea can be modified to work this way: a node takes an internal counter about mismatches. Once N publish packets are received about non existing channels, then the bloom filter is cleared and rebuild.

@melo

This comment has been minimized.

Show comment Hide comment
@melo

melo Oct 7, 2011

Contributor

Bloom filters only give you an assurance when an key is guaranteed not to be in there. That is to say, if the bloom filter has a hit, the key might or might now be in "database". If it misses, the key is for sure not existing.

If we want to use a bloom filter to control PUBLISH propagation, then can only stop sending to a specific node+topic when the filter misses. That's the only safe combination.

So we have to start with a full bloom filter where all node+topic combinations hit, and then remove the ones that we discover not to be subscribed.

Maybe I don't remember how bloom filters work, or I missing something obvious. I'll sleep on it.

Contributor

melo commented Oct 7, 2011

Bloom filters only give you an assurance when an key is guaranteed not to be in there. That is to say, if the bloom filter has a hit, the key might or might now be in "database". If it misses, the key is for sure not existing.

If we want to use a bloom filter to control PUBLISH propagation, then can only stop sending to a specific node+topic when the filter misses. That's the only safe combination.

So we have to start with a full bloom filter where all node+topic combinations hit, and then remove the ones that we discover not to be subscribed.

Maybe I don't remember how bloom filters work, or I missing something obvious. I'll sleep on it.

@antirez

This comment has been minimized.

Show comment Hide comment
@antirez

antirez Oct 7, 2011

Owner

Pedro: yes that what I did not understood your previous comment. I mean: you can't remove an item from a bloom filter. So the filter will gradually get more and more polluted. Finally when a given amount of pollution is reached, that a node can detect counting messages received for which it has no receivers, it needs to reset the filter again and rebuild it form the list of channels that are actually existing.

If instead it would be possible to remove items from bloom filters a mis-received message would just cause the bloom filter update and broadcast. But this requires a full list of channels scan unfortunately since in bloom filters we can only add channels but never remove them.

Owner

antirez commented Oct 7, 2011

Pedro: yes that what I did not understood your previous comment. I mean: you can't remove an item from a bloom filter. So the filter will gradually get more and more polluted. Finally when a given amount of pollution is reached, that a node can detect counting messages received for which it has no receivers, it needs to reset the filter again and rebuild it form the list of channels that are actually existing.

If instead it would be possible to remove items from bloom filters a mis-received message would just cause the bloom filter update and broadcast. But this requires a full list of channels scan unfortunately since in bloom filters we can only add channels but never remove them.

@melo

This comment has been minimized.

Show comment Hide comment
@melo

melo Oct 7, 2011

Contributor

Then I don't see how a bloom filter would work. Think when you start, the bloom filter is empty. When you have a message that you need to forward to the other N-1 nodes, what query do you make to the bloom filter?

  • Should I send message on topic T to Node-n? or
  • Shouldn't I send message on topic T to Node-n?

Basically what I'm not seeing is what is inside the bloom filter? All combinations on Node+Topic's each Node has? If so, then a new Node needs to ask of all other nodes the current subscribed topics, all of them, to init the filter.

Contributor

melo commented Oct 7, 2011

Then I don't see how a bloom filter would work. Think when you start, the bloom filter is empty. When you have a message that you need to forward to the other N-1 nodes, what query do you make to the bloom filter?

  • Should I send message on topic T to Node-n? or
  • Shouldn't I send message on topic T to Node-n?

Basically what I'm not seeing is what is inside the bloom filter? All combinations on Node+Topic's each Node has? If so, then a new Node needs to ask of all other nodes the current subscribed topics, all of them, to init the filter.

@nebulawars

This comment has been minimized.

Show comment Hide comment
@nebulawars

nebulawars Feb 24, 2012

If you want to use a bit more memory, wouldn't it be possible to replace the 'bits' in bloom filter with 2 byte counters. Adding to bloom filter increases the counter bucket, removing from bloom filter decreases the counter bucket. Not technical a bloom, but still almost the same.

If you want to use a bit more memory, wouldn't it be possible to replace the 'bits' in bloom filter with 2 byte counters. Adding to bloom filter increases the counter bucket, removing from bloom filter decreases the counter bucket. Not technical a bloom, but still almost the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment