Permalink
Browse files

Merge branch 'chapter4-wip' of github.com:imatix/zguide

  • Loading branch information...
2 parents 1048b0c + 40b7263 commit 5fb492bda4bdd8f9b098db5ab5e6796c8840b9bc @hintjens hintjens committed Mar 1, 2011
Showing with 685 additions and 180 deletions.
  1. +1 −0 .gitignore
  2. +185 −40 chapter4.txt
  3. +178 −0 examples/C/mdcliapi.c
  4. +18 −0 examples/C/mdclient.c
  5. +10 −0 examples/C/mdworker.c
  6. +213 −0 examples/C/mdwrkapi.c
  7. +26 −2 examples/C/zmsg.c
  8. +22 −6 notes.txt
  9. +32 −132 wip.txt
View
1 .gitignore
@@ -1,3 +1,4 @@
wdtemp.txt
tmp_*
+*.wd
View
225 chapter4.txt
@@ -6,11 +6,22 @@ In Chapter Three we looked at advanced use of 0MQ's request-reply pattern with w
We'll cover:
* How we define 'reliability'.
-* The most common types of failures we will experience in 0MQ applications.
-* How to handle slow clients.
-* How to build a reliable request-reply framework.
-* How to build a reliable publish-subscribe framework.
-* How to build a reliable push-pull pipeline framework.
+* The types of failures we will experience in 0MQ applications.
+* How to implement reliability on top of the 0MQ core patterns.
+* How to implement heartbeating between 0MQ peers.
+* How to write a reusable protocol specification.
+* How to design a service-oriented framework API.
+
+In this chapter we focus heavily on user-space 'patterns', which are reusable models that help you design your 0MQ architecture:
+
+* The //Suicidal Snail// pattern: how to handle slow clients.
+* The //Lazy Pirate// pattern: reliable request reply from the client side.
+* The //Simple Pirate// pattern: reliable request-reply using a LRU queue.
+* The //Paranoid Pirate// pattern: reliable request-reply with heartbeating.
+* The //Majordomo// pattern: service-oriented reliable queuing.
+* The //Titanic// pattern: disk-based reliable queuing.
+* The //Freelance// pattern: brokerless reliable request-reply.
+* The //Clone// pattern: reliable publish-subscribe using distributed key-value tables.
+++ What is "Reliability"?
@@ -24,7 +35,7 @@ To understand what 'reliability' means, we have to look at its opposite, namely
* Networks can fail in exotic ways, e.g. some ports on a switch may die and those parts of the network become inaccessible.
* Entire data centers can be struck by lightning, earthquakes, fire, or more mundane power or cooling failures.
-To make a software system fully reliable against *all* of these possible failures is an enormously difficult and expensive job and goes beyond the scope of this modest guide.
+To make a software system fully reliable against //all// of these possible failures is an enormously difficult and expensive job and goes beyond the scope of this modest guide.
Since the first five cases cover 99.9% of real world requirements outside large companies (according to a highly scientific study I just ran), that's what we'll look at. If you're a large company with money to spend on the last two cases, contact me immediately, there's a large hole behind my beach house waiting to be converted into a pool.
@@ -331,9 +342,13 @@ You should see the workers die, one by one, as they simulate a crash, and the cl
++++ Heartbeating
-When writing the Paranoid Pirate examples, it took about five hours to get the heartbeating working properly. The rest of the request-reply chain took perhaps ten minutes. Heartbeating is one of those reliability layers that often causes more trouble than it saves. It is especially easy to create 'false failures', i.e. peers decide that they are disconnected because the heartbeats aren't sent properly.
+When writing the Paranoid Pirate examples, it took about five hours to get the queue-to-worker heartbeating working properly. The rest of the request-reply chain took perhaps ten minutes. Heartbeating is one of those reliability layers that often causes more trouble than it saves. It is especially easy to create 'false failures', i.e. peers decide that they are disconnected because the heartbeats aren't sent properly.
-Here are some tips to getting this right:
+Some points to consider when understanding and implementing heartbeating:
+
+* Note that heartbeats are not request-reply. They flow asynchronously in both directions. Either peer can decide the other is 'dead' and stop talking to it.
+
+* If one of the peers uses durable sockets, this means it may get heartbeats queued up that it will receive if it reconnects. For this reason, workers should //not// reuse durable sockets. The example code uses durable sockets for debugging purposes but they are randomized to (in theory) never reuse an existing socket.
* First, get the heartbeating working, and only //then// add in the rest of the message flow. You should be able to prove the heartbeating works by starting peers in any order, stopping and restarting them, simulating freezes, and so on.
@@ -361,6 +376,12 @@ Here are some tips to getting this right:
* In a real application, heartbeating must be configurable and usually negotiated with the peer. Some peers will want aggressive heartbeating, as low as 10 msecs. Other peers will be far away and want heartbeating as high as 30 seconds.
+* If you have different heartbeat intervals for different peers, your poll timeout should be the lowest of these.
+
+* You might be tempted to open a separate socket dialog for heartbeats. This is superficially nice because you can separate different dialogs, e.g. the synchronous request-reply from the asynchronous heartbeating. However it's a bad idea for several reasons. First, if you're sending data you don't need to send heartbeats. Second, sockets may, due to network vagaries, become jammed. You need to know when your main data socket is silent because it's dead, rather than just not busy, so you need heartbeats on that socket. Lastly, two sockets is more complex than one.
+
+* We're not doing heartbeating from client to queue. We could, but it would add //significant// complexity for no real benefit.
+
++++ Contracts and Protocols
If you're paying attention you'll realize that Paranoid Pirate is not compatible with Simple Pirate, because of the heartbeats.
@@ -378,59 +399,183 @@ Turning PPP into a real protocol would take more work:
* There should be a protocol version number in the READY command so that it's possible to create new versions of PPP safely.
* Right now, READY and HEARTBEAT are not entirely distinct from requests and replies. To make them distinct, we would want a message structure that includes a "message type" part.
-* Client-assigned sequence numbers in messages would probably be useful for debugging.
-++++ Reliable Publish-Subscribe (Clone Pattern)
+++++ Service-Oriented Reliable Queuing (Majordomo Pattern)
-Pubsub is like a radio broadcast, you miss everything before you join, and then how much information you get depends on the quality of your reception. It's so easy to lose messages with this pattern that you might wonder why 0MQ bothers to implement it at all.[[footnote]]If you're German or Norwegian, that is what we call 'humor'. There are many cases where simplicity and speed are more important than pedantic delivery. In fact the radio broadcast covers perhaps the majority of information distribution in the real world. Think of Facebook and Twitter. No, I'm not still joking.[[/footnote]]
+The nice thing about progress is how fast it happens. Just a few sentences ago we were dreaming of a better protocol that would fix the world. And here we have it:
-However, reliable pubsub is also a useful tool. Let's do as before and define what that 'reliability' means in terms of what can go wrong.
+* http://rfc.zeromq.org/spec:7
-Happens all the time:
+This one-page specification takes PPP and turns it into something more solid. This is how we should design complex architectures: start by writing down the contracts, and only //then// write software to implement them.
-* Subscribers join late, so miss messages the server already sent.
-* Subscriber connections take a non-zero time, and can lose messages during that time.
+The Majordomo Protocol (MDP) extends and improves PPP in one interesting way apart from the two points above. It adds a "service name" to requests that the client sends, and asks workers to register for specific services. The nice thing about MDP is that it came from working code, a simpler protocol, and a precise set of improvements. It took literally only a couple of hours to draft, review, and publish.
-Happens exceptionally:
+Adding service names is a small but significant change that turns our Paranoid Pirate queue into a service-oriented broker:
-* Subscribers can crash, and restart, and lose whatever data they already received.
-* Subscribers can fetch messages too slowly, so queues build up and then overflow.
-* Networks can become overloaded and drop data (specifically, for PGM).
-* Networks can become too slow, so publisher-side queues overflow.
+[[code type="textdiagram"]]
-A lot more can go wrong but these are the typical failures we see in a realistic system. The difficulty in defining 'reliability' now is that we have no idea, at the messaging level, what the application actually does with its data. So we need a generic model that we can implement once, and then use for a wide range of applications.
+ +-----------+ +-----------+ +-----------+
+ | | | | | |
+ | Client | | Client | | Client |
+ | | | | | |
+ \-----------/ \-----------/ \-----------/
+ ^ ^ ^
+ | | |
+ \---------------+---------------/
+ "Give me coffee" | "Give me tea"
+ v
+ /-----------\
+ | |
+ | Broker |
+ | |
+ \-----------/
+ ^
+ |
+ /---------------+---------------\
+ | | |
+ v v v
+ /-----------\ /-----------\ /-----------\
+ | "Water" | | "Tea" | | "Coffee" |
+ +-----------+ +-----------+ +-----------+
+ | | | | | |
+ | Worker | | Worker | | Worker |
+ | | | | | |
+ +-----------+ +-----------+ +-----------+
-What we'll design is a simple *shared key-value cache* that stores a set of blobs indexed by unique keys. Don't confuse this with *distributed hash tables*, which solve the wider problem of connecting peers in a distributed network, or with *distributed key-value tables*, which act like non-SQL databases. All we will build is a system that reliably clones some in-memory state from a server to a set of clients. We want to:
-* Let a client join the network at any time, and reliably get the current server state.
-* Let any client update the key-value cache (inserting new key-value pairs, updating existing ones, or deleting them).
-* Reliably propagates changes to all clients, and does this with minimum latency overhead.
-* Handle very large numbers of clients, e.g. tens of thousands or more.
+ Figure # - Majordomo Pattern
+[[/code]]
-The key aspect of the Clone pattern is that clients talk back to servers, which is more than we do in a simple pub-sub dialog. This is why I use the terms 'server' and 'client' instead of 'publisher' and 'subscriber'. We'll use pubsub as part of the Clone pattern but it is more than that.
+To implement Majordomo we need to write a framework for clients and workers. It's really not sane to ask every application developer to read the spec and make it work, when they could be using a simpler API built and tested just once.
-When a client joins the network, it subscribes a SUB socket, as we'd expect, to the data stream coming from the server (the publisher). This goes across some pub-sub topology (a multicast bus, perhaps, or a tree of forwarder devices, or direct client-to-server connections).
+So, while our first contract (MDP itself) defines how the pieces of our distributed architecture talk to each other, our second contract defines how user applications talk to the technical framework we're going to design.
-At some undetermined point, it will start getting messages from the server. Note that we can't predict what the client will receive as its first message. If a zmq_connect[3] call takes 10msec, and in that time the server has sent 100 messages, the client might get messages starting from the 100th message.
+Majordomo has two halves, a client side and a worker side. Since we'll write both client and worker applications, we will need two APIs. Here is a sketch for the client API, using a simple object-oriented approach. We write this in C, using the style of the [http://zfl.zeromq.org/page:read-the-manual ZFL library]:
-Let's define a message as a key-value pair. The semantics are simple: if the value is provided, it's an insert or update operation. If there is no value, it's a delete operation. The key provides the subscription filter, so clients can treat the cache as a tree, and select whatever branches of the tree they want to hold.
+[[code]]
+mdcli_t *mdcli_new (char *broker);
+void mdcli_destroy (mdcli_t **self_p);
+zmsg_t *mdcli_send (mdcli_t *self, char *service, zmsg_t *request);
+[[/code]]
-The client now connects to the server using a different socket (a REQ socket) and asks for a snapshot of the cache. It tells the server two things: which message it received (which means the server has to number messages), and which branch or branches of the cache it wants. To keep things simple we'll assume that any client has exactly one server that it talks to, and gets its cache from. The server *must* be running; we do not try to solve the question of what happens if the server crashes (that's left as an exercise for you to hurt your brain over).
+That's it. We open a to the broker, we send a request message and get a reply message back, and we eventually close the connection. Here's a sketch for the worker API:
-The server builds a snapshot and sends that to the client's REQ socket. This can take some time, especially if the cache is large. The client continues to receive updates from the server on its SUB socket, which it queues but does not process. We'll assume these updates fit into memory. At some point it gets the snapshot on its REQ socket. It then applies the updates to that snapshot, which gives it a working cache.
+[[code]]
+mdwrk_t *mdwrk_new (char *broker,char *service);
+void mdwrk_destroy (mdwrk_t **self_p);
+zmsg_t *mdwrk_recv (mdwrk_t *self, zmsg_t *reply);
+[[/code]]
-You'll perhaps see one difficulty here. If the client asks for a snapshot based on message 100, how does the server provide this? After all, it may have sent out lots of updates in the meantime. We solve this by cheating gracefully. The server just sends its current snapshot, but tells the client what its latest message number is. Say that's 200. The client gets the snapshot, and in its queue, it has messages 100 to 300. It throws out 100 to 200, and starts applying 201 to 300 to the snapshot.
+It's more or less symmetrical but the worker dialog is a little different. The first time a worker does a recv (), it passes a null reply, thereafter it passes the current reply, and gets a new request.
+
+The client and worker APIs are fairly simple to construct.
+
+
+.end
+
+- reconnect interval fixed, no backoff
+- timeout for worker recv?
+
+- single threaded worker API
+- will not send heartbeats while busy
+- synchronous, simple
+
+broker
+ - design first
+ - containers, zlist, zhash
+
+
+
+++++ Rust-based Reliability (Titanic Pattern)
+
+- Majordomo + rust
+
+Once you realize that the Paranoid Pirate queue is basically a message broker, you might be tempted to add rust-based reliability to it. After all, this works for all the enterprise messaging systems. It's such a tempting idea that it's a little sad to have to be negative. But that's one of my specialties. So, reasons you don't want rust-based brokers sitting in the center of your architecture:
+
+* As you've seen, the Lazy Pirate client performs surprisingly well. It works across a whole range of architectures, from direct client-to-server to distributed queue devices. It does assume that workers are stateless and idempotent (see below). But we can work around that limitation without resorting to rust.
+
+* Rust brings a whole set of problems, from slow performance to additional pieces to have to manage, repair, and create 6am panics as they inevitably break at the start of trading. The beauty of the Pirate queues we saw is their simplicity. They won't crash. And if you're still worried about the hardware, you can move to a peer-to-peer pattern that has no broker at all. I'll explain later in this chapter.
+
+There is one sane use case for rust-based reliability, which is asynchronous fire-and-forget. This pattern, which I'll just sketch, works as follows:
+
+* Clients use durable sockets to talk to a broker.
+* They use a request-reply dialog to send requests to the broker, which accepts them and stores them on disk.
+* When the broker has confirmed receipt of a request, this means it's stored on disk, safely.
+* The broker then looks for workers to process the request, and it does this over time as fast as it can.
+* Replies from workers are in the same way saved to disk, with an acknowledgement from the broker to the worker that the reply was received.
+* When clients reconnect (or immediately if they stay connected), they receive these replies, and they confirm to the broker that they've received them, again with a request-reply handshake.
+* The broker erases requests and replies only when they've been processed.
+
+This model makes sense when clients and workers come and go. It solves the major problem with the Pirate pattern, namely that a client waits for an answer in realtime. When clients and workers are randomly connected like this, raw performance is not a big concern: it's far more important to just never lose messages. Some people will argue that "just never lose messages" is a use case by itself, but if clients and workers are connected, you don't need rust to do that: Pirate can work, as we've demonstrated.
+
+++++ Idempotency
+
+Idempotency is not something to take a pill for. What it means is that it's safe to repeat an operation. While many client-to-server use cases are idempotent, some are not. Examples of idempotent use cases are:
+
+* Stateless task distribution, i.e. a collapsed pipeline where the client is both ventilator and sink, and the servers are stateless workers that compute a reply based purely on the state provided by a request. In such a case it's safe (though inefficient) to execute the same request many times.
+* A name service that translates logical addresses into endpoints to bind or connect to. In such a case it's safe to make the same lookup request many times.
+
+And here are examples of a non-idempotent use cases:
+
+* A logging service. One does not want the same log information recorded more than once.
+* Any service that has impact on downstream nodes, e.g. sends on information to other nodes. If that service gets the same request more than once, downstream nodes will get duplicate information.
+* Any service that modifies shared data in some non-idempotent way. E.g. a service that debits a bank account is definitely not idempotent.
+
+When our server is not idempotent, we have to think more carefully about when exactly a server might crash. If it dies when it's idle, or while it's processing a request, that's usually fine. We can use database transactions to make sure a debit and a credit are always done together, if at all. If the server dies while sending its reply, that's a problem, because as far as its concerned, it's done its work.
+
+if the network dies just as the reply is making its way back to the client, the same problem arises. The client will think the server died, will resend the request, and the server will do the same work twice. Which is not what we want.
+
+We use the common solution of detecting and rejecting duplicate requests. This means:
+
+* The client must stamp every request with a unique client identifier and a unique message number.
+* The server, before sending back a reply, stores it using the client id + message number as a key.
+* The server, when getting a request from a given client, first checks if it has a reply for that client id + message number. If so, it does not process the request but just resends the reply.
+
+++++ Distributed Reliable Client-Server (Freelance Pattern)
+
+- heartbeating from clients outwards
+- ZMQ_ROOT service
+
+We've seen how to make a reliable client-server architecture using a queue device (essentially a broker) in the middle. We've discussed the advantages of a central broker a few times. The biggest pro is that workers can come and go silently, there is just one 'stable' node on the network.
+
+But for many cases this isn't worth the hassle of an extra device. If you are not managing a pool of anonymous workers, but want to make the intelligence explicitly addressable, then a broker is an extra step for little gain.
+
+Again, this is a matter of taste. Some architects will swear by a broker. Others hate them. Let's take a real example and see how this plays. Say we want a name service (we do, we do!) that translates logical names (like "authentication") into physical network addresses (like "tcp://192.168.55.121:5051").
+
+Without a broker, every application needs to know the address of the name service. It can then talk to the name service (using a Pirate pattern) to translate logical names into endpoints as needed. Fair enough.
+
+With a broker, every application needs to know the address of the broker. The broker hopefully supports some kind of service-based routing. So clients can then send a request to the "name lookup service" and this will be routed to
+
+
+
+Lastly...
+
+* Multiple clients talking to multiple servers with no intermediary devices. Use case: distributed services such as name resolution. Types of failure we aim to handle: service crashes and restarts, service busy looping, service overload, network disconnects.
+
+- N clients to N servers
+- move queue logic into client-side class
+- ditto for server, make framework
+- talk to it via inproc...
+
+
+Handshaking at Startup
+
+We must use XREP-to-XREP sockets because we want to connect N clients to N servers without (necessarily) an intermediary queue device.
+
+In an XREP-to-XREP socket connection, one side of the connection must know the identity of the other. You cannot do xrep-to-xrep flows between two anonymous sockets since an XREP socket requires an explicit identity. In practice this means we will need a name service share the identities of the servers. The client will connect to the server, then send it a message using the server's known identity as address, and then the server can respond to the client.
+
+In this prototype we'll use fixed, hardcoded identities for the servers. We'll develop the name service in a later prototype.
+
+
+Pool management
+
+* If there is just one server in the pool, the we wait with a timeout for the server to reply. If the server does not reply within the timeout, we retry a number of times before abandoning.
+* If there are multiple servers in the pool, we try each server in succession, but do not retry the same server twice.
+* If a server appears to be really dead (i.e. has not responded for some time), we remove it from the pool.
-Once the client has happily gotten its cache, it disconnects from the server (destroys that REQ socket), which is not used for anything more.
-How does Clone handle updates from clients? There are several options but the simplest seems to be that each client acts as a publisher back to the server, which subscribes. In a TCP network this will mean persistent connections between clients and servers. In a PGM network this will mean using a shared multicast bus that clients write to, and the server listens to.
-So the client, at startup, opens a PUB socket and part of its initial request to the server includes the address of that socket, so the server can open a SUB socket and connect back to it.
-Why don't we allow clients to publish updates directly to other clients? While this would reduce latency, it makes it impossible to sequence messages. Updates *must* pass through the server to make sense to other clients. There's a more subtle second reason. In many applications it's important that updates have a single order, across many clients. Forcing all updates through the server ensures that they have the same order when they finally get to clients.
-With unique sequencing, clients can detect the nastier failures - network congestion and queue overflow. If a client discovers that its incoming message stream has a hole, it can take action. It seems sensible that the client contact the server and ask for the missing messages, but in practice that isn't useful. If there are holes, adding more stress to the network will make things worse. All the client can really do is warn its users "Unable to continue", and stop, and not restart until someone has manually checked the cause of the problem.
-Clone is complex enough in practice that you don't want to implement it directly in your applications. Instead, it makes a good basis for an application server framework, which talks to applications via the key-value table.
-.end
View
178 examples/C/mdcliapi.c
@@ -0,0 +1,178 @@
+/* =========================================================================
+ mdcliapi.c
+
+ Majordomo Protocol Client API
+ Implements the MDP/Client spec at http://rfc.zeromq.org/spec:7.
+
+ Follows the ZFL class conventions and is further developed as the ZFL
+ mdcli class. See http://zfl.zeromq.org for more details.
+
+ -------------------------------------------------------------------------
+ Copyright (c) 1991-2011 iMatix Corporation <www.imatix.com>
+ Copyright other contributors as noted in the AUTHORS file.
+
+ This file is part of the ZeroMQ Guide: http://zguide.zeromq.org
+
+ This is free software; you can redistribute it and/or modify it under the
+ terms of the GNU Lesser General Public License as published by the Free
+ Software Foundation; either version 3 of the License, or (at your option)
+ any later version.
+
+ This software is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
+ ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General
+ Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+ =========================================================================
+*/
+
+#ifndef __MDCLIAPI_H_INCLUDED__
+#define __MDCLIAPI_H_INCLUDED__
+
+#include "zhelpers.h"
+#include "zmsg.c"
+
+// This is the version of MDP/Client we implement
+#define MDPC_HEADER "MDPC01"
+
+// Reliability parameters
+#define REQUEST_TIMEOUT 2500 // msecs, (> 1000!)
+#define REQUEST_RETRIES 3 // Before we abandon
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Opaque class structure
+typedef struct _mdcli_t mdcli_t;
+
+mdcli_t *mdcli_new (char *broker);
+void mdcli_destroy (mdcli_t **self_p);
+zmsg_t *mdcli_send (mdcli_t *self, char *service, zmsg_t *request);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
+
+// Structure of our class
+// We access these properties only via class methods
+
+struct _mdcli_t {
+ char *broker;
+ void *context;
+ void *client; // Socket to broker
+};
+
+
+// --------------------------------------------------------------------------
+// Connect or reconnect to broker
+
+void s_connect_to_broker (mdcli_t *self)
+{
+ if (self->client)
+ zmq_close (self->client);
+ self->client = zmq_socket (self->context, ZMQ_REQ);
+ int linger = 0;
+ zmq_setsockopt (self->client, ZMQ_LINGER, &linger, sizeof (linger));
+ zmq_connect (self->client, self->broker);
+}
+
+
+// --------------------------------------------------------------------------
+// Constructor
+
+mdcli_t *
+mdcli_new (char *broker)
+{
+ mdcli_t
+ *self;
+
+ assert (broker);
+ s_version_assert (2, 1);
+ self = malloc (sizeof (mdcli_t));
+ memset (self, 0, sizeof (mdcli_t));
+
+ self->broker = strdup (broker);
+ self->context = zmq_init (1);
+ s_connect_to_broker (self);
+ return (self);
+}
+
+
+// --------------------------------------------------------------------------
+// Destructor
+
+void
+mdcli_destroy (mdcli_t **self_p)
+{
+ assert (self_p);
+ if (*self_p) {
+ mdcli_t *self = *self_p;
+ zmq_close (self->client);
+ zmq_term (self->context);
+ free (self->broker);
+ free (self);
+ *self_p = NULL;
+ }
+}
+
+
+// --------------------------------------------------------------------------
+// Send request to broker and get reply by hook or crook
+// Returns the reply message or NULL if there was no reply.
+
+zmsg_t *
+mdcli_send (mdcli_t *self, char *service, zmsg_t *request)
+{
+ int retries_left = REQUEST_RETRIES;
+ while (retries_left) {
+ // Prefix request with protocol frames
+ // Frame 1: "MDPCxy" (six bytes, MDP/Client x.y)
+ // Frame 2: Service name (printable string)
+ zmsg_t *msg = zmsg_dup (request);
+ zmsg_push (msg, service);
+ zmsg_push (msg, MDPC_HEADER);
+ zmsg_send (&msg, self->client);
+
+ while (1) {
+ // Poll socket for a reply, with timeout
+ zmq_pollitem_t items [] = { { self->client, 0, ZMQ_POLLIN, 0 } };
+ zmq_poll (items, 1, REQUEST_TIMEOUT * 1000);
+
+ // If we got a reply, process it
+ if (items [0].revents & ZMQ_POLLIN) {
+ zmsg_t *msg = zmsg_recv (self->client);
+
+ // Don't try to handle errors, just assert noisily
+ assert (zmsg_parts (msg) >= 3);
+
+ char *header = zmsg_pop (msg);
+ assert (strcmp (header, MDPC_HEADER) == 0);
+ free (header);
+
+ char *service = zmsg_pop (msg);
+ assert (strcmp (service, service) == 0);
+ free (service);
+
+ return msg; // Success
+ }
+ else
+ if (--retries_left) {
+ // Reconnect, and resend message
+ s_connect_to_broker (self);
+ zmsg_t *msg = zmsg_dup (request);
+ zmsg_push (msg, service);
+ zmsg_push (msg, MDPC_HEADER);
+ zmsg_send (&msg, self->client);
+ }
+ else
+ break; // Give up
+ }
+ }
+ return NULL;
+}
+
View
18 examples/C/mdclient.c
@@ -0,0 +1,18 @@
+//
+// Majordomo client
+//
+#include "mdcliapi.c"
+
+int main (void)
+{
+ mdcli_t *mdcli;
+ int rc;
+
+ printf (" * mdcli: ");
+ mdcli = mdcli_new ("tcp://127.0.0.1:5055");
+
+ mdcli_destroy (&mdcli);
+ return 0;
+
+ return 0;
+}
View
10 examples/C/mdworker.c
@@ -0,0 +1,10 @@
+//
+// Majordomo worker
+//
+#include "mdwrkapi.c"
+
+
+
+
+int main (void)
+{
View
213 examples/C/mdwrkapi.c
@@ -0,0 +1,213 @@
+/* =========================================================================
+ mdwrkapi.c
+
+ Majordomo Protocol Worker API
+ Implements the MDP/Server spec at http://rfc.zeromq.org/spec:7.
+
+ Follows the ZFL class conventions and is further developed as the ZFL
+ mdwrk class. See http://zfl.zeromq.org for more details.
+
+ -------------------------------------------------------------------------
+ Copyright (c) 1991-2011 iMatix Corporation <www.imatix.com>
+ Copyright other contributors as noted in the AUTHORS file.
+
+ This file is part of the ZeroMQ Guide: http://zguide.zeromq.org
+
+ This is free software; you can redistribute it and/or modify it under the
+ terms of the GNU Lesser General Public License as published by the Free
+ Software Foundation; either version 3 of the License, or (at your option)
+ any later version.
+
+ This software is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
+ ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General
+ Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+ =========================================================================
+*/
+
+#ifndef __MDWRKAPI_H_INCLUDED__
+#define __MDWRKAPI_H_INCLUDED__
+
+#include "zhelpers.h"
+#include "zmsg.c"
+
+// This is the version of MDP/Server we implement
+#define MDPS_HEADER "MDPS01"
+
+// Reliability parameters
+#define HEARTBEAT_LIVENESS 3 // 3-5 is reasonable
+#define HEARTBEAT_INTERVAL 1000 // msecs
+#define RECONNECT_INTERVAL 1000 // Delay between attempts
+
+// Protocol commands
+#define MDPS_READY "\001"
+#define MDPS_REQUEST "\002"
+#define MDPS_REPLY "\003"
+#define MDPS_HEARTBEAT "\004"
+#define MDPS_DISCONNECT "\005"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+// Opaque class structure
+typedef struct _mdwrk_t mdwrk_t;
+
+mdwrk_t *mdwrk_new (char *broker,char *service);
+void mdwrk_destroy (mdwrk_t **self_p);
+zmsg_t *mdwrk_recv (mdwrk_t *self, zmsg_t *reply);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
+
+// Structure of our class
+// We access these properties only via class methods
+
+struct _mdwrk_t {
+ char *broker;
+ char *service;
+ void *context;
+ void *worker; // Socket to broker
+
+ // Heartbeat management
+ uint64_t heartbeat_at; // When to send HEARTBEAT
+ size_t liveness; // How many attempts left
+
+ // Internal state
+ int expect_reply; // Zero only at start
+};
+
+
+// --------------------------------------------------------------------------
+// Connect or reconnect to broker
+
+void s_connect_to_broker (mdwrk_t *self)
+{
+ if (self->worker)
+ zmq_close (self->worker);
+ self->worker = zmq_socket (self->context, ZMQ_XREQ);
+ int linger = 0;
+ zmq_setsockopt (self->worker, ZMQ_LINGER, &linger, sizeof (linger));
+ zmq_connect (self->worker, self->broker);
+
+ // Register service with broker
+ zmsg_t *msg = zmsg_new ();
+ zmsg_append (msg, MDPS_HEADER);
+ zmsg_append (msg, MDPS_READY);
+ zmsg_append (msg, self->service);
+ zmsg_send (&msg, self->worker);
+
+ // If liveness hits zero, queue is considered disconnected
+ self->liveness = HEARTBEAT_LIVENESS;
+ self->heartbeat_at = s_clock () + HEARTBEAT_INTERVAL;
+}
+
+
+// --------------------------------------------------------------------------
+// Constructor
+
+mdwrk_t *
+mdwrk_new (char *broker,char *service)
+{
+ mdwrk_t
+ *self;
+
+ assert (broker);
+ assert (service);
+ s_version_assert (2, 1);
+ self = malloc (sizeof (mdwrk_t));
+ memset (self, 0, sizeof (mdwrk_t));
+
+ self->broker = strdup (broker);
+ self->service = strdup (service);
+ self->context = zmq_init (1);
+ s_connect_to_broker (self);
+ return (self);
+}
+
+
+// --------------------------------------------------------------------------
+// Destructor
+
+void
+mdwrk_destroy (mdwrk_t **self_p)
+{
+ assert (self_p);
+ if (*self_p) {
+ mdwrk_t *self = *self_p;
+ zmq_close (self->worker);
+ zmq_term (self->context);
+ free (self->broker);
+ free (self->service);
+ free (self);
+ *self_p = NULL;
+ }
+}
+
+
+// --------------------------------------------------------------------------
+// Send reply, if any, to broker and wait for next request.
+
+zmsg_t *
+mdwrk_recv (mdwrk_t *self, zmsg_t *reply)
+{
+ // Format and send the reply if we were provided one
+ assert (reply || !self->expect_reply);
+ if (reply) {
+ zmsg_t *msg = zmsg_dup (reply);
+ zmsg_push (msg, MDPS_REPLY);
+ zmsg_push (msg, MDPS_HEADER);
+ zmsg_send (&msg, self->worker);
+ }
+ self->expect_reply = 1;
+
+ while (1) {
+ zmq_pollitem_t items [] = { { self->worker, 0, ZMQ_POLLIN, 0 } };
+ zmq_poll (items, 1, HEARTBEAT_INTERVAL * 1000);
+
+ if (items [0].revents & ZMQ_POLLIN) {
+ zmsg_t *msg = zmsg_recv (self->worker);
+ self->liveness = HEARTBEAT_LIVENESS;
+
+ // Don't try to handle errors, just assert noisily
+ assert (zmsg_parts (msg) >= 3);
+
+ char *header = zmsg_pop (msg);
+ assert (strcmp (header, MDPS_HEADER) == 0);
+ free (header);
+
+ char *command = zmsg_pop (msg);
+ if (strcmp (command, MDPS_REQUEST) == 0)
+ return msg; // We have a request to process
+ else
+ if (strcmp (command, MDPS_HEARTBEAT) == 0)
+ ; // Do nothing for heartbeats
+ else
+ if (strcmp (command, MDPS_DISCONNECT) == 0)
+ break; // Return empty handed
+ else {
+ printf ("E: invalid input message (%d)\n", (int) command [1]);
+ zmsg_dump (msg);
+ }
+ free (command);
+ }
+ else
+ if (--self->liveness == 0) {
+ s_sleep (RECONNECT_INTERVAL);
+ s_connect_to_broker (self);
+ }
+ // Send HEARTBEAT if it's time
+ if (s_clock () > self->heartbeat_at) {
+ self->heartbeat_at = s_clock () + HEARTBEAT_INTERVAL;
+ s_send (self->worker, "HEARTBEAT");
+ }
+ }
+ // We exit if we've been disconnected
+ return NULL;
+}
View
28 examples/C/zmsg.c
@@ -1,5 +1,5 @@
/* =========================================================================
- zmsg.h
+ zmsg.c
Multipart message class for example applications.
@@ -59,6 +59,9 @@ void zmsg_body_fmt (zmsg_t *self, char *format, ...);
void zmsg_push (zmsg_t *self, char *part);
char *zmsg_pop (zmsg_t *self);
+// Generic append message part to end
+void zmsg_append (zmsg_t *self, char *part);
+
// Read and set message envelopes
char *zmsg_address (zmsg_t *self);
void zmsg_wrap (zmsg_t *self, char *address, char *delim);
@@ -381,7 +384,7 @@ zmsg_push (zmsg_t *self, char *part)
memmove (&self->_part_size [1], &self->_part_size [0],
(ZMSG_MAX_PARTS - 1) * sizeof (size_t));
s_set_part (self, 0, (void *) part, strlen (part));
- self->_part_count += 1;
+ self->_part_count++;
}
@@ -407,6 +410,21 @@ zmsg_pop (zmsg_t *self)
// --------------------------------------------------------------------------
+// Append message part to end of message parts
+
+void
+zmsg_append (zmsg_t *self, char *part)
+{
+ assert (self);
+ assert (part);
+ assert (self->_part_count < ZMSG_MAX_PARTS - 1);
+
+ s_set_part (self, self->_part_count, (void *) part, strlen (part));
+ self->_part_count++;
+}
+
+
+// --------------------------------------------------------------------------
// Return pointer to outer message address, if any
// Caller should not modify the provided data
@@ -562,6 +580,12 @@ zmsg_test (int verbose)
assert (strcmp (part, "World") == 0);
assert (zmsg_parts (zmsg) == 0);
+ // Check append method
+ zmsg_append (zmsg, "Hello");
+ zmsg_append (zmsg, "World!");
+ assert (zmsg_parts (zmsg) == 2);
+ assert (strcmp (zmsg_body (zmsg), "World!") == 0);
+
zmsg_destroy (&zmsg);
assert (zmsg == NULL);
View
28 notes.txt
@@ -1,5 +1,12 @@
Notes for The Guide
++++ A Service-Oriented Queue Device
+
+- how to route to workers based on service names
+- kind of like custom pubsub but with answers going back to clients
+
+
+
- collection of patterns
Least Recently Used
Asynchronous Client-Server
@@ -55,12 +62,6 @@ Then we can use this to start building some reusable pieces:
- robots come and go...
-+++ A Service-Oriented Queue Device
-
-- how to route to workers based on service names
-- kind of like custom pubsub but with answers going back to clients
-
-
+++ A ZeroMQ Name Service
- name service
@@ -136,6 +137,21 @@ We'll look at how to secure pubsub data against snooping. The actual technique
We'll now look at how the pgm: and epgm: protocols work. With pgm:, the network switch basically acts as a hardware FORWARDER device.
+++++ Customized Publish-Subscribe
+
+- use identity to route message explicitly to A or B
+- not using PUBSUB at all but XREP/????
+ - limitations: no multicast, only TCP
+ - how to scale with devices...
+
+When a client activates, it chooses a random port that is not in use and creates a SUB socket listening for all traffic on it. The client then sends a message via REQ to the publisher containing the port that it is listening on. The publisher receives this message, acknowledges it, and creates a new pub socket specific to that client. All published events specific to this client go out that socket.
+
+When the client deactivates, it sends a message to the publisher with the port to deactivate and close.
+
+You end up creating a lot more PUB sockets on your server end and doing all of the filtering at the server. This sounds acceptable to you.
+
+I didn't need to do this to avoid network bandwidth bottlenecks; I created this to enforce some security and entitlements.
+
+++ A Clock Device
We'll look at various ways of building a timer into a network. A clock device sends out a signal (a message) at more or less precise intervals so that other nodes can use these signals for internal timing.
View
164 wip.txt
@@ -1,122 +1,57 @@
-discuss Disk-based Reliability
-
-You can, and people do, use spinning rust to store messages. It rather makes a mess of the idea of "performance" but we're usually more comfortable knowing a really important message (such as that transfer of $400M to my Cyprus account) is stored on disk rather than only in memory. Spinning rust only makes sense for some patterns, mainly request-reply. If we get bored in this chapter we'll play with that, but otherwise, just shove really critical messages into a database that all parties can access, and skip 0MQ for those parts of your dialog.
-
-
-Doing this right is non-trivial, as we'll see.
-This is a fair amount of work to do and get right in every client application that needs reliability. It's simpler to place a queue device in the middle, to which all clients and servers connect. Now we can put the server management logic in that queue device.
-
-Here is the architecture. We take the client-side pirate and add the LRU Pattern queue from Chapter 3, with some extra sauce:
-
-There are two levels of reliability at play here. First, from client to queue. Second, from queue to servers.
-
-++++ Reliable Queuing, Advanced
-
-### Handshaking at Startup
-
-We must use XREP-to-XREP sockets because we want to connect N clients to N servers without (necessarily) an intermediary queue device.
-
-In an XREP-to-XREP socket connection, one side of the connection must know the identity of the other. You cannot do xrep-to-xrep flows between two anonymous sockets since an XREP socket requires an explicit identity. In practice this means we will need a name service share the identities of the servers. The client will connect to the server, then send it a message using the server's known identity as address, and then the server can respond to the client.
-
-In this prototype we'll use fixed, hardcoded identities for the servers. We'll develop the name service in a later prototype.
-
-
-
-We assume that the servers - which contain application code - will crash far more often than the queue. So the client uses the client-side pirate pattern to create a reliable link to the queue. The real work of managing servers sits in the queue:
-
-
-This example runs the clients, servers, and queue in a single process using multiple threads. In reality each of these layers would be a stand-alone process. The code is largely ready for that: each task already creates its own context.
-
-Servers are just the same as the LRU worker tasks from the lruqueue example in Chapter 3. The server connects its REQ socket to the queue and signals that it's ready for a new task by sending a request. The queue then sends the task as a reply. The server does its work, and sends its results back as a new "I'm ready (oh, and BTW here is the stuff I worked on)" request message.
-
-The queue binds to XREP frontend and backend sockets, and handles requests and replies asynchronously on these using the LRU queue logic. It works with these data structures:
-
-* A pool (a hash map) of all known servers, which identify themselves using unique IDs.
-* A list of servers that are ready for work.
-* A list of servers that are busy doing work.
-* A list of requests sent by clients but not yet successfully processed.
-
-The queue polls all sockets for input and then processes all incoming messages. It queues tasks and distributes them to servers that are alive. Any replies from servers are sent back to their original clients, unless the server is disabled, in which case the reply is dropped.
-
-Idle servers must signal that they are alive with a ready message or a heartbeat, or they will be marked as disabled until they send a message again. This is to detect blocked and disconnected servers (since 0MQ does not report disconnections).
-
-The queue detects a disabled server with a timeout on request processing. If a reply does not come back within (e.g.) 10ms, the queue marks the server as disabled, and retries with the next server.
-
-
-
-++++ Server Idle Failure Detection
-
-
-
-The server connects its XREQ socket to the queue and uses the LRU approach, i.e. it signals when it's ready for a new task by sending a request, and the queue then sends the task as a reply. It does its work, and sends its results back as a new "I'm ready (oh, and BTW here is the stuff I worked on)" request message. When waiting for work, the server sends a heartbeat message (which is an empty message) to the queue each second. This is why the server uses an XREQ socket instead of a REQ socket (which does not allow multiple requests to be sent before a response arrives).
- +-----------+ +-----------+ +-----------+
- | Poll | | Poll | | Poll |
- | Heartbeat | | Heartbeat | | Heartbeat |
-
-* If there is just one server in the pool, the we wait with a timeout for the server to reply. If the server does not reply within the timeout, we retry a number of times before abandoning.
-* If there are multiple servers in the pool, we try each server in succession, but do not retry the same server twice.
-* If a server appears to be really dead (i.e. has not responded for some time), we remove it from the pool.
-
-The server randomly simulates two problems when it receives a task:
-1. A crash and restart while processing a request, i.e. close its socket, block for 5 seconds, reopen its socket and restart.
-2. A temporary busy wait, i.e. sleep 1 second then continue as normal.
-
-When waiting for work, the server sends a heartbeat message (which is an empty message) to the queue each second. This is why the server uses an XREQ socket instead of a REQ socket (which does not allow multiple requests to be sent before a response arrives).
-* Detect a server that dies while idle. We do this with heartbeating: if the idle server does not send a heartbeat within a certain time, we treat it as dead.
-
-
-
-
-
-++++ Peer-to-Peer Pirate
-
-
-
-While many Pirate use cases are *idempotent* (i.e. executing the same request more than once is safe), some are not. Examples of an idempotent Pirate include:
-
-* Stateless task distribution, i.e. a collapsed pipeline where the client is both ventilator and sink, and the servers are stateless workers that compute a reply based purely on the state provided by a request. In such a case it's safe (though inefficient) to execute the same request many times.
-* A name service that translates logical addresses into endpoints to bind or connect to. In such a case it's safe to make the same lookup request many times.
-
-And here are examples of a non-idempotent Pirate pattern:
-
-* A logging service. One does not want the same log information recorded more than once.
-* Any service that has impact on downstream nodes, e.g. sends on information to other nodes. If that service gets the same request more than once, downstream nodes will get duplicate information.
-* Any service that modifies shared data in some non-idempotent way. E.g. a service that debits a bank account is definitely not idempotent.
-
-When our server is not idempotent, we have to think more carefully about when exactly a server might crash. If it dies when it's idle, or while it's processing a request, that's usually fine. We can use database transactions to make sure a debit and a credit are always done together, if at all. If the server dies while sending its reply, that's a problem, because as far as its concerned, it's done its work.
-
-if the network dies just as the reply is making its way back to the client, the same problem arises. The client will think the server died, will resend the request, and the server will do the same work twice. Which is not what we want.
-
-We use the standard solution of detecting and rejecting duplicate requests. This means:
-
-* The client must stamp every request with a unique client identifier and a unique message number.
-* The server, before sending back a reply, stores it using the client id + message number as a key.
-* The server, when getting a request from a given client, first checks if it has a reply for that client id + message number. If so, it does not process the request but just resends the reply.
-
-The final touch to a robust Pirate pattern is server heartbeating. This means getting the server to say "hello" every so often even when it's not doing any work. The smoothest design is where the client pings the server, which pongs back. We don't need to send heartbeats to a working server, only one that's idle. Knowing when an idle server has died means we don't uselessly send requests to dead servers, which improves response time in those cases.
+Chapter 4
++++ Reliable Publish-Subscribe (Clone Pattern)
+Pubsub is like a radio broadcast, you miss everything before you join, and then how much information you get depends on the quality of your reception. It's so easy to lose messages with this pattern that you might wonder why 0MQ bothers to implement it at all.[[footnote]]If you're German or Norwegian, that is what we call 'humor'. There are many cases where simplicity and speed are more important than pedantic delivery. In fact the radio broadcast covers perhaps the majority of information distribution in the real world. Think of Facebook and Twitter. No, I'm not still joking.[[/footnote]]
+However, reliable pubsub is also a useful tool. Let's do as before and define what that 'reliability' means in terms of what can go wrong.
+Happens all the time:
+* Subscribers join late, so miss messages the server already sent.
+* Subscriber connections take a non-zero time, and can lose messages during that time.
+Happens exceptionally:
+* Subscribers can crash, and restart, and lose whatever data they already received.
+* Subscribers can fetch messages too slowly, so queues build up and then overflow.
+* Networks can become overloaded and drop data (specifically, for PGM).
+* Networks can become too slow, so publisher-side queues overflow.
+A lot more can go wrong but these are the typical failures we see in a realistic system. The difficulty in defining 'reliability' now is that we have no idea, at the messaging level, what the application actually does with its data. So we need a generic model that we can implement once, and then use for a wide range of applications.
+What we'll design is a simple *shared key-value cache* that stores a set of blobs indexed by unique keys. Don't confuse this with *distributed hash tables*, which solve the wider problem of connecting peers in a distributed network, or with *distributed key-value tables*, which act like non-SQL databases. All we will build is a system that reliably clones some in-memory state from a server to a set of clients. We want to:
+* Let a client join the network at any time, and reliably get the current server state.
+* Let any client update the key-value cache (inserting new key-value pairs, updating existing ones, or deleting them).
+* Reliably propagates changes to all clients, and does this with minimum latency overhead.
+* Handle very large numbers of clients, e.g. tens of thousands or more.
+The key aspect of the Clone pattern is that clients talk back to servers, which is more than we do in a simple pub-sub dialog. This is why I use the terms 'server' and 'client' instead of 'publisher' and 'subscriber'. We'll use pubsub as part of the Clone pattern but it is more than that.
+When a client joins the network, it subscribes a SUB socket, as we'd expect, to the data stream coming from the server (the publisher). This goes across some pub-sub topology (a multicast bus, perhaps, or a tree of forwarder devices, or direct client-to-server connections).
+At some undetermined point, it will start getting messages from the server. Note that we can't predict what the client will receive as its first message. If a zmq_connect[3] call takes 10msec, and in that time the server has sent 100 messages, the client might get messages starting from the 100th message.
+Let's define a message as a key-value pair. The semantics are simple: if the value is provided, it's an insert or update operation. If there is no value, it's a delete operation. The key provides the subscription filter, so clients can treat the cache as a tree, and select whatever branches of the tree they want to hold.
+The client now connects to the server using a different socket (a REQ socket) and asks for a snapshot of the cache. It tells the server two things: which message it received (which means the server has to number messages), and which branch or branches of the cache it wants. To keep things simple we'll assume that any client has exactly one server that it talks to, and gets its cache from. The server *must* be running; we do not try to solve the question of what happens if the server crashes (that's left as an exercise for you to hurt your brain over).
+The server builds a snapshot and sends that to the client's REQ socket. This can take some time, especially if the cache is large. The client continues to receive updates from the server on its SUB socket, which it queues but does not process. We'll assume these updates fit into memory. At some point it gets the snapshot on its REQ socket. It then applies the updates to that snapshot, which gives it a working cache.
+You'll perhaps see one difficulty here. If the client asks for a snapshot based on message 100, how does the server provide this? After all, it may have sent out lots of updates in the meantime. We solve this by cheating gracefully. The server just sends its current snapshot, but tells the client what its latest message number is. Say that's 200. The client gets the snapshot, and in its queue, it has messages 100 to 300. It throws out 100 to 200, and starts applying 201 to 300 to the snapshot.
+Once the client has happily gotten its cache, it disconnects from the server (destroys that REQ socket), which is not used for anything more.
+How does Clone handle updates from clients? There are several options but the simplest seems to be that each client acts as a publisher back to the server, which subscribes. In a TCP network this will mean persistent connections between clients and servers. In a PGM network this will mean using a shared multicast bus that clients write to, and the server listens to.
+So the client, at startup, opens a PUB socket and part of its initial request to the server includes the address of that socket, so the server can open a SUB socket and connect back to it.
+Why don't we allow clients to publish updates directly to other clients? While this would reduce latency, it makes it impossible to sequence messages. Updates *must* pass through the server to make sense to other clients. There's a more subtle second reason. In many applications it's important that updates have a single order, across many clients. Forcing all updates through the server ensures that they have the same order when they finally get to clients.
+With unique sequencing, clients can detect the nastier failures - network congestion and queue overflow. If a client discovers that its incoming message stream has a hole, it can take action. It seems sensible that the client contact the server and ask for the missing messages, but in practice that isn't useful. If there are holes, adding more stress to the network will make things worse. All the client can really do is warn its users "Unable to continue", and stop, and not restart until someone has manually checked the cause of the problem.
+Clone is complex enough in practice that you don't want to implement it directly in your applications. Instead, it makes a good basis for an application server framework, which talks to applications via the key-value table.
++++ Reliable Pipeline (Harmony Pattern)
@@ -140,40 +75,5 @@ The Harmony pattern takes pipeline and makes it robust against the only failure
- if task missing, resend
- if end of batch missing, resend from last response
+++ How to deliver jobs one by one using push/pull? i.e. ensure jobs don't get lost...
-+++ Pirates
-
-
-+++ Clones
-
-
-+++ Harmony
-
-
-
-
-
-
-
-
-
-++++ Customized Publish-Subscribe
-
-- use identity to route message explicitly to A or B
-- not using PUBSUB at all but XREP/????
- - limitations: no multicast, only TCP
- - how to scale with devices...
-
-When a client activates, it chooses a random port that is not in use and creates a SUB socket listening for all traffic on it. The client then sends a message via REQ to the publisher containing the port that it is listening on. The publisher receives this message, acknowledges it, and creates a new pub socket specific to that client. All published events specific to this client go out that socket.
-
-When the client deactivates, it sends a message to the publisher with the port to deactivate and close.
-
-You end up creating a lot more PUB sockets on your server end and doing all of the filtering at the server. This sounds acceptable to you.
-
-I didn't need to do this to avoid network bandwidth bottlenecks; I created this to enforce some security and entitlements.
-
-
-- chapter 4
- - heartbeating & presence detection
- - reliable request-reply
- -

0 comments on commit 5fb492b

Please sign in to comment.