Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Working on pirate patterns

  • Loading branch information...
commit d4af6b07056a49a6e675e3a00320b01cb79cd151 1 parent 4f41571
@hintjens hintjens authored
View
76 chapter3.txt
@@ -234,10 +234,10 @@ We'll make an example where the dealers don't talk back, they're pure sinks. Our
[[code type="textdiagram"]]
+-------------+
| |
- | Client | Send to "A" or "B"
+ | Client | Send to "A" or "B"
| |
+-------------+
- | ROUTER |
+ | XREP | (ROUTER)
\------+------/
|
|
@@ -246,7 +246,7 @@ We'll make an example where the dealers don't talk back, they're pure sinks. Our
| |
v v
/-----------\ /-----------\
- | DEALER | | DEALER |
+ | XREQ | | XREQ | (DEALER)
| "A" | | "B" |
+-----------+ +-----------+
| | | |
@@ -307,7 +307,7 @@ Like dealers, mamas can only talk to one router and since mamas always start by
| Client | Send to "A" or "B"
| |
+-------------+
- | ROUTER | OK, it's really XREP
+ | XREP | (ROUTER)
\-------------/
^
| (1) Mama says Hi
@@ -317,7 +317,7 @@ Like dealers, mamas can only talk to one router and since mamas always start by
| | (2) Router gives laundry
v v
/-----------\ /-----------\
- | MAMA | | MAMA | Aka. REQ
+ | REQ | | REQ | (MAMA)
| "A" | | "B" |
+-----------+ +-----------+
| | | |
@@ -399,7 +399,7 @@ A core philosophy of 0MQ is that the edges are smart and many, and the middle is
| Client | Send to "A" or "B"
| |
+-------------+
- | ROUTER | Yes, it's still XREP
+ | XREP | (ROUTER)
\-------------/
^
|
@@ -409,7 +409,7 @@ A core philosophy of 0MQ is that the edges are smart and many, and the middle is
| |
v v
/-----------\ /-----------\
- | PAPA | | PAPA | REP, naturally
+ | REP | | REP | (PAPA)
| "A" | | "B" |
+-----------+ +-----------+
| | | |
@@ -486,15 +486,15 @@ To start with, let's look back at the classic request-reply pattern and then see
+--------+
| Client |
+--------+
- | Mama | REQ
- +---+----+ ^
- | |
- | |
- +-----------+-----------+ |
- | | | |
- | | | |
-+---+----+ +---+----+ +---+----+ v
-| Papa | | Papa | | Papa | REP
+ | REQ | (MAMA)
+ +---+----+
+ |
+ |
+ +-----------+-----------+
+ | | |
+ | | |
++---+----+ +---+----+ +---+----+
+| REP | | REP | | REP | (PAPA)
+--------+ +--------+ +--------+
| Worker | | Worker | | Worker |
+--------+ +--------+ +--------+
@@ -509,23 +509,23 @@ This extends to multiple papas, but if we want to handle multiple mamas as well
+--------+ +--------+ +--------+
| Client | | Client | | Client |
+--------+ +--------+ +--------+
-| Mama | | Mama | | Mama | REQ
+| REQ | | REQ | | REQ | MAMA
+---+----+ +---+----+ +---+----+ ^
| | | |
+-----------+-----------+ |
| |
+---+----+ v
- | Router | XREP
+ | XREP | ROUTER
+--------+ :
| Device | :
+--------+ :
- | Dealer | XREQ
+ | XREQ | DEALER
+---+----+ ^
| |
+-----------+-----------+ |
| | | |
+---+----+ +---+----+ +---+----+ v
-| Papa | | Papa | | Papa | REP
+| REP | | REP | | REP | PAPA
+--------+ +--------+ +--------+
| Worker | | Worker | | Worker |
+--------+ +--------+ +--------+
@@ -542,23 +542,23 @@ In the above design, we're using the built-in load balancing routing that the de
+--------+ +--------+ +--------+
| Client | | Client | | Client |
+--------+ +--------+ +--------+
- | Mama | | Mama | | Mama | REQ
+ | REQ | | REQ | | REQ | MAMA
+---+----+ +---+----+ +---+----+ ^
| | | |
+-----------+-----------+ |
| |
+---+----+ v
- | Router | Frontend XREP
+ | XREP | Frontend ROUTER
+--------+ :
| Device | LRU queue :
+--------+ :
- | Router | Backend XREP
+ | XREP | Backend ROUTER
+---+----+ ^
| |
+-----------+-----------+ |
| | | |
+---+----+ +---+----+ +---+----+ v
- | Mama | | Mama | | Mama | REQ
+ | REQ | | REQ | | REQ | MAMA
+--------+ +--------+ +--------+
| Worker | | Worker | | Worker |
+--------+ +--------+ +--------+
@@ -774,7 +774,7 @@ In the router-to-dealer example we saw a 1-to-N use case where one client talks
| Client | | Client |
| | | |
+-----------+ +-----------+
- | DEALER | | DEALER |
+ | XREQ | | XREQ | (DEALER)
\-----------/ \-----------/
^ ^
| |
@@ -784,7 +784,7 @@ In the router-to-dealer example we saw a 1-to-N use case where one client talks
|
v
/------+------\
- | ROUTER |
+ | XREP | (ROUTER)
+-------------+
| |
| Server |
@@ -846,7 +846,7 @@ The socket logic in the server is fairly wicked. This is the detailed architectu
: | :
: +-------------+-------------+ :
: | | | :
- : v v v :
+ : v v v :
: connect connect connect :
: /---------\ /---------\ /---------\ :
: | XREQ | | XREQ | | XREQ | :
@@ -857,7 +857,7 @@ The socket logic in the server is fairly wicked. This is the detailed architectu
: +---------+ +---------+ +---------+ :
: :
\---------------------------------------------/
-
+
Figure # - Detail of async server
[[/code]]
@@ -879,11 +879,11 @@ The second design is much simpler, so that's what we use:
When you build servers that maintain stateful conversations with clients, you will run into a classic problem. If the server keeps some state per client, and clients keep coming and going, eventually it will run of resources. Even if the same clients keep connecting, if you're using transient sockets (no explicit identity), each connection will look like a new one.
-We cheat in the above example by keeping state only for a very short time (the time it takes a worker to process a request) and then throwing away the state. But that's not practical for many cases.
+We cheat in the above example by keeping state only for a very short time (the time it takes a worker to process a request) and then throwing away the state. But that's not practical for many cases.
To properly manage client state in a stateful asynchronous server you must:
-* Do heartbeating from client to server. In our example we send a request once per second, which can reliably be used as a heartbeat.
+* Do heartbeating from client to server. In our example we send a request once per second, which can reliably be used as a heartbeat.
* Store state using the client identity as key. This works for both durable and transient sockets.
* Detect a stopped heartbeat. If there's no request from a client within, say, two seconds, the server can detect this and destroy any state it's holding for that client.
@@ -898,7 +898,7 @@ We've seen XREP/router sockets talking to dealers, mamas, and papas. The last ca
| Front-end | | Front-end |
| | | |
+-----------+ +-----------+
- | Router | | Router | XREP
+ | XREP | | XREP | ROUTER
\-----------/ \-----------/ ^
connect connect |
^ ^ |
@@ -911,7 +911,7 @@ We've seen XREP/router sockets talking to dealers, mamas, and papas. The last ca
v v |
bind bind |
/-----------\ /-----------\ v
- | Router | | Router | XREP
+ | XREP | | XREP | ROUTER
+-----------+ +-----------+
| | | |
| Worker | | Worker |
@@ -996,7 +996,7 @@ For reasons we already looked at, clients and workers won't speak to each other
+--------+ +--------+ +--------+
| Client | | Client | | Client |
+--------+ +--------+ +--------+
- | Mama | | Mama | | Mama | REQ
+ | REQ | | REQ | | REQ | (MAMA)
+---+----+ +---+----+ +---+----+
| | |
+-----------+-----------+
@@ -1004,11 +1004,11 @@ For reasons we already looked at, clients and workers won't speak to each other
+--------------------------------+
| | |
| +-----+------+ |
- | | Router | | XREP
+ | | XREP | | (ROUTER)
| +------------+ |
| | LRU Queue | |
| +------------+ |
- | | Router | | XREP
+ | | XREP | | (ROUTER)
| +-----+------+ |
| | Broker :
+--------------------------------+
@@ -1017,7 +1017,7 @@ For reasons we already looked at, clients and workers won't speak to each other
+-----------+-----------+
| | |
+---+----+ +---+----+ +---+----+
- | Mama | | Mama | | Mama | REQ
+ | REQ | | REQ | | REQ | (MAMA)
+--------+ +--------+ +--------+
| Worker | | Worker | | Worker |
+--------+ +--------+ +--------+
@@ -1070,7 +1070,7 @@ Let's explore Idea #1. Workers connecting to both brokers and accepting jobs fro
:
| | | |
+------------+ +------------+
- | Router | | Router |
+ | XREP | | XREP |
+-----+------+ +-----+------+
| |
+---------|-+--=--------+--------------+
@@ -1079,7 +1079,7 @@ Let's explore Idea #1. Workers connecting to both brokers and accepting jobs fro
| : | : | :
| : | : | :
+---+-+--+ +---+-+--+ +---+-+--+
- | Router | | Router | | Router |
+ | XREP | | XREP | | XREP |
+--------+ +--------+ +--------+
| Worker | | Worker | | Worker |
+--------+ +--------+ +--------+
View
126 chapter4.txt
@@ -104,22 +104,50 @@ There are, roughly, three ways to connect clients to servers, each needing a spe
Each of these has their trade-offs and often you'll mix them. We'll look at all three of these in detail.
-++++ Pirate Client
+++++ Client-side Reliability (Lazy Pirate Pattern)
-The simplest Pirate pattern only requires changes in the client. Rather than doing a blocking receive, we:
+We can get very simple reliable request-reply with only some changes in the client. We call this the Lazy Pirate pattern. Rather than doing a blocking receive, we:
* Poll the REQ socket and only receive from it when it's sure a reply has arrived.
* Resend a request several times, it no reply arrived within a timeout period.
* Abandon the transaction if after several requests, there is still no reply.
+[[code type="textdiagram"]]
+ +-----------+ +-----------+ +-----------+
+ | | | | | |
+ | Client | | Client | | Client |
+ | | | | | |
+ +-----------+ +-----------+ +-----------+
+ | Poll | | Poll | | Poll |
+ | Retry | | Retry | | Retry |
+ +-----------+ +-----------+ +-----------+
+ | REQ | | REQ | | REQ |
+ \-----------/ \-----------/ \-----------/
+ ^ ^ ^
+ | | |
+ \---------------+---------------/
+ |
+ v
+ /-------------\
+ | REP |
+ +-------------+
+ | |
+ | Server |
+ | |
+ +-------------+
+
+
+ Figure # - Lazy Pirate pattern
+[[/code]]
+
If you try to use a REQ socket in anything than a strict send-recv fashion, you'll get an EFSM error. This is slightly annoying when we want to use REQ in a pirate pattern, because we may send several requests before getting a reply. The pretty good brute-force solution is to close and reopen the REQ socket after an error:
-[[code type="example" title="Pirate Client" name="piclient"]]
+[[code type="example" title="Lazy Pirate client" name="lapicli"]]
[[/code]]
Run this together with the matching server:
-[[code type="example" title="Pirate Client server" name="server"]]
+[[code type="example" title="Lazy Pirate server" name="lapisrv"]]
[[/code]]
To run this testcase, start the client and the server in two console windows. The server will randomly misbehave after a few messages. You can check the client's response. Here is a typical output from the server:
@@ -147,63 +175,25 @@ I: connecting to server...
E: server seems to be offline, abandoning
[[/code]]
-[[code type="textdiagram"]]
- +-----------+ +-----------+ +-----------+
- | | | | | |
- | Client | | Client | | Client |
- | | | | | |
- +-----------+ +-----------+ +-----------+
- | Poll | | Poll | | Poll |
- | Retry | | Retry | | Retry |
- +-----------+ +-----------+ +-----------+
- | REQ | | REQ | | REQ |
- \-----------/ \-----------/ \-----------/
- ^ ^ ^
- | | |
- \---------------+---------------/
- |
- v
- /-------------\
- | REP |
- +-------------+
- | |
- | Server |
- | |
- +-------------+
-
-
- Figure # - Pirate Client
-[[/code]]
-
The client sequences each message, and checks that replies come back exactly in order: that no requests or replies are lost, and no replies come back more than once, or out of order. Run the test a few times until you're convinced this mechanism actually works.
The client uses a REQ socket, and does the brute-force close/reopen because REQ sockets impose a strict send/receive cycle. You might be tempted to use an XREQ instead, but it would not be a good decision. First, it would mean emulating the secret sauce that REQ does with envelopes (if you've forgotten what that is, it's a good sign you don't want to have to do it). Second, it would mean potentially getting back replies that you didn't expect.
Handling failures only at the client works when we have a set of clients talking to a single server. It can handle a server crash, but only if recovery means restarting that same server. If there's a permanent error - e.g. a dead power supply on the server hardware - this approach won't work. Since the application code in servers is usually the biggest source of failures in any architecture, depending on a single server is not a great idea.
-++++ Pirate Work Queues
-
-Our second approach takes the pirate client pattern and extends it with a queue device that lets us talk, transparently, to multiple servers, which we can more accurately call 'workers'. Workers are stateless, or have some shared state we don't know about, e.g. a shared database. Having a queue device means workers can come and go without clients knowing anything about it. If one worker dies, another takes over. This is a nice simple topology with only one real weakness, namely the central queue itself, which can become a problem to manage, and a single point of failure.
-
-We'll develop the pirate queue concept in stages, starting with a minimal working model. The basis for the pirate work queue pattern is the least-recently-used (LRU) routing queue from Chapter 3. What is the very //minimum// we need to do to handle dead or blocked workers? Turns out, its surprisingly little. We already have a retry mechanism in the client. So using the standard LRU queue will work pretty well. This fits with 0MQ's philosophy that we can extend a peer-to-peer pattern like request-reply by plugging naive devices in the middle.
+++++ Basic Reliable Queuing (Simple Pirate Pattern)
-Here is the queue, which is exactly a LRU queue, no more or less:
-
-[[code type="example" title="Pirate Work Queue" name="piqueue"]]
-[[/code]]
+Our second approach takes Lazy Pirate pattern and extends it with a queue device that lets us talk, transparently, to multiple servers, which we can more accurately call 'workers'. We'll develop this in stages, starting with a minimal working model, the Simple Pirate pattern.
-Here is the worker, which takes the piserver code and makes it work with the LRU queue (using the REQ 'ready' signaling):
-
-[[code type="example" title="Pirate Work Queue worker" name="piworker"]]
-[[/code]]
+In all these Pirate patterns, workers are stateless, or have some shared state we don't know about, e.g. a shared database. Having a queue device means workers can come and go without clients knowing anything about it. If one worker dies, another takes over. This is a nice simple topology with only one real weakness, namely the central queue itself, which can become a problem to manage, and a single point of failure.
-To test this, start a handlful of workers, a client, and the queue, in any order. You'll see that the workers eventually all crash and burn, and the client retries and then gives up. The queue never stops, and you can restart workers and clients ad-nauseam. This model works with any number of clients and workers.
+The basis for the queue device is the least-recently-used (LRU) routing queue from Chapter 3. What is the very //minimum// we need to do to handle dead or blocked workers? Turns out, its surprisingly little. We already have a retry mechanism in the client. So using the standard LRU queue will work pretty well. This fits with 0MQ's philosophy that we can extend a peer-to-peer pattern like request-reply by plugging naive devices in the middle.
[[code type="textdiagram"]]
+-----------+ +-----------+ +-----------+
| | | | | |
- | Client | | Client | | Client | piclient
+ | Client | | Client | | Client |
| | | | | |
+-----------+ +-----------+ +-----------+
| Poll | | Poll | | Poll |
@@ -220,7 +210,8 @@ To test this, start a handlful of workers, a client, and the queue, in any order
| XREP |
+-----------+
| |
- | Queue | piqueue
+ | LRU |
+ | Queue |
| |
+-----------+
| XREP |
@@ -234,21 +225,47 @@ To test this, start a handlful of workers, a client, and the queue, in any order
| REQ | | REQ | | REQ |
+-----------+ +-----------+ +-----------+
| | | | | |
- | Worker | | Worker | | Worker | piworker
+ | LRU | | LRU | | LRU |
+ | Worker | | Worker | | Worker |
| | | | | |
+-----------+ +-----------+ +-----------+
- Figure # - Pirate Work Queue
+ Figure # - Simple Pirate Pattern
+[[/code]]
+
+Here is the queue, which is exactly a LRU queue, no more or less:
+
+[[code type="example" title="Simple Pirate queue" name="spqueue"]]
[[/code]]
+Here is the worker, which takes the Lazy Pirate server and makes it work with the LRU queue (using the REQ 'ready' signaling):
+
+[[code type="example" title="Simple Pirate worker" name="spworker"]]
+[[/code]]
+
+To test this, start a handlful of workers, a client, and the queue, in any order. You'll see that the workers eventually all crash and burn, and the client retries and then gives up. The queue never stops, and you can restart workers and clients ad-nauseam. This model works with any number of clients and workers.
+
+The Simple Pirate Queue pattern works pretty well but it has some weaknesses:
+
+* It's not robust against a queue crash and restart. The client will recover, but the workers won't. While 0MQ will reconnect workers' sockets automatically, as far as the newly started queue is concerned, the workers haven't signalled "READY", so don't exist.
+* The queue does not detect worker failure, so if a worker dies while idle, the queue can only remove it from its worker queue by first sending it a request. The client waits and retries for nothing. It's not a critical problem but it's not nice.
+
+To fix these issues we have to make the worker and queue somewhat smarter, which brings us to the Advanced Pirate Queue pattern.
+
+++++ Reliable Queuing, Advanced
+
+The simple worker uses a REQ socket. For the Advanced Pirate Queue pattern we'll switch to an XREQ socket. This has the advantage of letting us send and receive messages at any time, rather than the lock-step send/receive that REQ imposes. The downside of XREQ is that we have to do our own envelope management. Whereas REQ carefully opens the envelope and gives us just the content, XREQ gives us the whole package.
+
+So we'll bake the following functionality into the Advanced Pirate Queue pattern:
+
+
+
+
-- piqueue, piworker
-- sleep in client, not worker
- worker handle queue failure/recovery
- worker heartbeating so queue can detect idle failures
-- others from proto2/proto3?
- put protocol version into header
- piqueue2, piworker2
- server presence protocol SPP
@@ -272,8 +289,7 @@ Here is the architecture. We take the client-side pirate and add the LRU (least-
There are two levels of reliability at play here. First, from client to queue. Second, from queue to servers.
-- if queue stops/restarts, workers need to detect, recreate socket or use XREP
--
+++++ Reliable Queuing, Advanced
### Handshaking at Startup
View
2  examples/C/piclient.c → examples/C/lpclient.c
@@ -1,5 +1,5 @@
//
-// Client-side pirate
+// Lazy Pirate client
// Use zmq_poll to do a safe request-reply
// To run, start piserver and then randomly kill/restart it
//
View
2  examples/C/piserver.c → examples/C/lpserver.c
@@ -1,5 +1,5 @@
//
-// Client-side pirate server
+// Lazy Pirate server
// Binds REQ socket to tcp://*:5555
// Like hwserver except:
// - echoes request as-is
View
0  examples/C/piqueue1.c → examples/C/spqueue.c
File renamed without changes
View
0  examples/C/piworker1.c → examples/C/spworker.c
File renamed without changes
Please sign in to comment.
Something went wrong with that request. Please try again.