diff --git a/chapter1.txt b/chapter1.txt index 52545c28b..51ed0552f 100644 --- a/chapter1.txt +++ b/chapter1.txt @@ -89,7 +89,7 @@ Now this looks too simple to be realistic, but ZeroMQ sockets have, as we alread Let us explain briefly what these two programs are actually doing. They create a ZeroMQ context to work with, and a socket. Don't worry what the words mean. You'll pick it up. The server binds its REP (reply) socket to port 5555. The server waits for a request in a loop, and responds each time with a reply. The client sends a request and reads the reply back from the server. -If you kill the server (Ctrl-C) and restart it, the client won't recover properly. Recovering from crashing processes isn't quite that easy. Making a reliable request-reply flow is complex enough that we won't cover it until [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns]. +If you kill the server (Ctrl-C) and restart it, the client won't recover properly. Recovering from crashing processes isn't quite that easy. Making a reliable request-reply flow is complex enough that we won't cover it until [#reliable-request-reply]. There is a lot happening behind the scenes but what matters to us programmers is how short and sweet the code is, and how often it doesn't crash, even under a heavy load. This is the request-reply pattern, probably the simplest way to use ZeroMQ. It maps to RPC and the classic client/server model. @@ -212,7 +212,7 @@ Then the subscriber will most likely not receive anything. You'll blink, check t Making a TCP connection involves to and from handshaking that takes several milliseconds depending on your network and the number of hops between peers. In that time, ZeroMQ can send many messages. For sake of argument assume it takes 5 msecs to establish a connection, and that same link can handle 1M messages per second. During the 5 msecs that the subscriber is connecting to the publisher, it takes the publisher only 1 msec to send out those 1K messages. -In [php:chapter2#sockets-and-patterns|Chapter 2 - Sockets and Patterns] we'll explain how to synchronize a publisher and subscribers so that you don't start to publish data until the subscribers really are connected and ready. There is a simple and stupid way to delay the publisher, which is to sleep. Don't do this in a real application, though, because it is extremely fragile as well as inelegant and slow. Use sleeps to prove to yourself what's happening, and then wait for [php:chapter2#sockets-and-patterns|Chapter 2 - Sockets and Patterns] to see how to do this right. +In [#sockets-and-patterns] we'll explain how to synchronize a publisher and subscribers so that you don't start to publish data until the subscribers really are connected and ready. There is a simple and stupid way to delay the publisher, which is to sleep. Don't do this in a real application, though, because it is extremely fragile as well as inelegant and slow. Use sleeps to prove to yourself what's happening, and then wait for [#sockets-and-patterns] to see how to do this right. The alternative to synchronization is to simply assume that the published data stream is infinite and has no start and no end. One also assumes that the subscriber doesn't care what transpired before it started up. This is how we built our weather client example. @@ -329,7 +329,7 @@ Let's look at some aspects of this code in more detail: #-------------# [[/code]] -The pipeline pattern also exhibits the "slow joiner" syndrome, leading to accusations that PUSH sockets don't load balance properly. If you are using PUSH and PULL, and one of your workers gets way more messages than the others, it's because that PULL socket has joined faster than the others, and grabs a lot of messages before the others manage to connect. If you want proper load balancing, you probably want to look at the load balancing pattern in [php:chapter3#advanced-request-reply|Chapter 3 - Advanced Request-Reply Patters]. +The pipeline pattern also exhibits the "slow joiner" syndrome, leading to accusations that PUSH sockets don't load balance properly. If you are using PUSH and PULL, and one of your workers gets way more messages than the others, it's because that PULL socket has joined faster than the others, and grabs a lot of messages before the others manage to connect. If you want proper load balancing, you probably want to look at the load balancing pattern in [#advanced-request-reply]. ++ Programming with ZeroMQ diff --git a/chapter2.txt b/chapter2.txt index 2c21d80b3..f955b1d8b 100644 --- a/chapter2.txt +++ b/chapter2.txt @@ -2,7 +2,7 @@ .bookmark sockets-and-patterns + Sockets and Patterns -In [php:chapter1#basics|Chapter 1 - The Basics] we took ZeroMQ for a drive, with some basic examples of the main ZeroMQ patterns: request-reply, pub-sub, and pipeline. In this chapter, we're going to get our hands dirty and start to learn how to use these tools in real programs. +In [#basics] we took ZeroMQ for a drive, with some basic examples of the main ZeroMQ patterns: request-reply, pub-sub, and pipeline. In this chapter, we're going to get our hands dirty and start to learn how to use these tools in real programs. We'll cover: @@ -178,7 +178,7 @@ The built-in core ZeroMQ patterns are: * **Exclusive pair**, which connects two sockets exclusively. This is a pattern for connecting two threads in a process, not to be confused with "normal" pairs of sockets. -We looked at the first three of these in [php:chapter1#basics|Chapter 1 - The Basics], and we'll see the exclusive pair pattern later in this chapter. The {{zmq_socket[3]}} man page is fairly clear about the patterns -- it's worth reading several times until it starts to make sense. These are the socket combinations that are valid for a connect-bind pair (either side can bind): +We looked at the first three of these in [#basics], and we'll see the exclusive pair pattern later in this chapter. The {{zmq_socket[3]}} man page is fairly clear about the patterns -- it's worth reading several times until it starts to make sense. These are the socket combinations that are valid for a connect-bind pair (either side can bind): * PUB and SUB * REQ and REP @@ -196,7 +196,7 @@ You'll also see references to XPUB and XSUB sockets, which we'll come to later ( These four core patterns are cooked into ZeroMQ. They are part of the ZeroMQ API, implemented in the core C++ library, and are guaranteed to be available in all fine retail stores. -On top of those, we add //high-level messaging patterns//. We build these high-level patterns on top of ZeroMQ and implement them in whatever language we're using for our application. They are not part of the core library, do not come with the ZeroMQ package, and exist in their own space as part of the ZeroMQ community. For example the Majordomo pattern, which we explore in [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns], sits in the GitHub Majordomo project in the ZeroMQ organization. +On top of those, we add //high-level messaging patterns//. We build these high-level patterns on top of ZeroMQ and implement them in whatever language we're using for our application. They are not part of the core library, do not come with the ZeroMQ package, and exist in their own space as part of the ZeroMQ community. For example the Majordomo pattern, which we explore in [#reliable-request-reply], sits in the GitHub Majordomo project in the ZeroMQ organization. One of the things we aim to provide you with in this book are a set of such high-level patterns, both small (how to handle messages sanely) and large (how to make a reliable pub-sub architecture). @@ -233,7 +233,7 @@ In memory, ZeroMQ messages are {{zmq_msg_t}} structures (or classes depending on If you want to send the same message more than once, and it's sizable, create a second message, initialize it using {{zmq_msg_init[3]}}, and then use {{zmq_msg_copy[3]}} to create a copy of the first message. This does not copy the data but copies a reference. You can then send the message twice (or more, if you create more copies) and the message will only be finally destroyed when the last copy is sent or closed. -ZeroMQ also supports //multipart// messages, which let you send or receive a list of frames as a single on-the-wire message. This is widely used in real applications and we'll look at that later in this chapter and in [php:chapter3#advanced-request-reply|Chapter 3 - Advanced Request-Reply Patters]. +ZeroMQ also supports //multipart// messages, which let you send or receive a list of frames as a single on-the-wire message. This is widely used in real applications and we'll look at that later in this chapter and in [#advanced-request-reply]. Frames (also called "message parts" in the ZeroMQ reference manual pages) are the basic wire format for ZeroMQ messages. A frame is a length-specified block of data. The length can be zero upwards. If you've done any TCP programming you'll appreciate why frames are a useful answer to the question "how much data am I supposed to read of this network socket now?" @@ -501,7 +501,7 @@ But our broker has to be nonblocking. Obviously, we can use {{zmq_poll[3]}} to w #---------# #---------# #---------# [[/code]] -Luckily, there are two sockets called DEALER and ROUTER that let you do nonblocking request-response. You'll see in [php:chapter3#advanced-request-reply|Chapter 3 - Advanced Request-Reply Patters] how DEALER and ROUTER sockets let you build all kinds of asynchronous request-reply flows. For now, we're just going to see how DEALER and ROUTER let us extend REQ-REP across an intermediary, that is, our little broker. +Luckily, there are two sockets called DEALER and ROUTER that let you do nonblocking request-response. You'll see in [#advanced-request-reply] how DEALER and ROUTER sockets let you build all kinds of asynchronous request-reply flows. For now, we're just going to see how DEALER and ROUTER let us extend REQ-REP across an intermediary, that is, our little broker. In this simple extended request-reply pattern, REQ talks to ROUTER and DEALER talks to REP. In between the DEALER and ROUTER, we have to have code (like our broker) that pulls messages off the one socket and shoves them onto the other[figure]. @@ -881,7 +881,7 @@ All the code should be recognizable to you by now. How it works: * The server starts a proxy that connects the two sockets. The proxy pulls incoming requests fairly from all clients, and distributes those out to workers. It also routes replies back to their origin. -Note that creating threads is not portable in most programming languages. The POSIX library is pthreads, but on Windows you have to use a different API. In our example, the {{pthread_create}} call starts up a new thread running the {{worker_routine}} function we defined. We'll see in [php:chapter3#advanced-request-reply|Chapter 3 - Advanced Request-Reply Patters] how to wrap this in a portable API. +Note that creating threads is not portable in most programming languages. The POSIX library is pthreads, but on Windows you have to use a different API. In our example, the {{pthread_create}} call starts up a new thread running the {{worker_routine}} function we defined. We'll see in [#advanced-request-reply] how to wrap this in a portable API. Here the "work" is just a one-second pause. We could do anything in the workers, including talking to other nodes. This is what the MT server looks like in terms of ØMQ sockets and nodes. Note how the request-reply chain is {{REQ-ROUTER-queue-DEALER-REP}}[figure]. diff --git a/chapter3.txt b/chapter3.txt index 6a42f9c0b..afbb9bf32 100644 --- a/chapter3.txt +++ b/chapter3.txt @@ -2,7 +2,7 @@ .bookmark advanced-request-reply + Advanced Request-Reply Patterns -In z[php:chapter2#sockets-and-patterns|Chapter 2 - Sockets and Patterns] we worked through the basics of using ZeroMQ by developing a series of small applications, each time exploring new aspects of ZeroMQ. We'll continue this approach in this chapter as we explore advanced patterns built on top of ZeroMQ's core request-reply pattern. +In [#sockets-and-patterns] we worked through the basics of using ZeroMQ by developing a series of small applications, each time exploring new aspects of ZeroMQ. We'll continue this approach in this chapter as we explore advanced patterns built on top of ZeroMQ's core request-reply pattern. We'll cover: @@ -45,7 +45,7 @@ If you spy on the network data flowing between {{hwclient}} and {{hwserver}}, th +++ The Extended Reply Envelope -Now let's extend the REQ-REP pair with a ROUTER-DEALER proxy in the middle and see how this affects the reply envelope. This is the //extended request-reply pattern// we already saw in [php:chapter2#sockets-and-patterns|Chapter 2 - Sockets and Patterns]. We can, in fact, insert any number of proxy steps[figure]. The mechanics are the same. +Now let's extend the REQ-REP pair with a ROUTER-DEALER proxy in the middle and see how this affects the reply envelope. This is the //extended request-reply pattern// we already saw in [#sockets-and-patterns]. We can, in fact, insert any number of proxy steps[figure]. The mechanics are the same. [[code type="textdiagram" title="Extended Request-Reply Pattern"]] #-------# #-------# @@ -131,7 +131,7 @@ The REQ socket picks this message up, and checks that the first frame is the emp +++ What's This Good For? -To be honest, the use cases for strict request-reply or extended request-reply are somewhat limited. For one thing, there's no easy way to recover from common failures like the server crashing due to buggy application code. We'll see more about this in [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns]. However once you grasp the way these four sockets deal with envelopes, and how they talk to each other, you can do very useful things. We saw how ROUTER uses the reply envelope to decide which client REQ socket to route a reply back to. Now let's express this another way: +To be honest, the use cases for strict request-reply or extended request-reply are somewhat limited. For one thing, there's no easy way to recover from common failures like the server crashing due to buggy application code. We'll see more about this in [#reliable-request-reply]. However once you grasp the way these four sockets deal with envelopes, and how they talk to each other, you can do very useful things. We saw how ROUTER uses the reply envelope to decide which client REQ socket to route a reply back to. Now let's express this another way: * Each time ROUTER gives you a message, it tells you what peer that came from, as an identity. * You can use this with a hash table (with the identity as key) to track new peers as they arrive. @@ -195,7 +195,7 @@ And when we receive a message, we: +++ The REQ to ROUTER Combination -In the same way that we can replace REQ with DEALER, we can replace REP with ROUTER. This gives us an asynchronous server that can talk to multiple REQ clients at the same time. If we rewrote the "Hello World" server using ROUTER, we'd be able to process any number of "Hello" requests in parallel. We saw this in the [php:chapter2#sockets-and-patterns|Chapter 2 - Sockets and Patterns] {{mtserver}} example. +In the same way that we can replace REQ with DEALER, we can replace REP with ROUTER. This gives us an asynchronous server that can talk to multiple REQ clients at the same time. If we rewrote the "Hello World" server using ROUTER, we'd be able to process any number of "Hello" requests in parallel. We saw this in the [#sockets-and-patterns] {{mtserver}} example. We can use ROUTER in two distinct ways: @@ -218,7 +218,7 @@ When you replace a REP with a DEALER, your worker can suddenly go full asynchron +++ The ROUTER to ROUTER Combination -This sounds perfect for N-to-N connections, but it's the most difficult combination to use. You should avoid it until you are well advanced with ZeroMQ. We'll see one example it in the Freelance pattern in [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns], and an alternative DEALER to ROUTER design for peer-to-peer work in [php:chapter8#moving-pieces]|Chapter 8 - A Framework for Distributed Computing]. +This sounds perfect for N-to-N connections, but it's the most difficult combination to use. You should avoid it until you are well advanced with ZeroMQ. We'll see one example it in the Freelance pattern in [#reliable-request-reply], and an alternative DEALER to ROUTER design for peer-to-peer work in [#moving-pieces]. +++ Invalid Combinations @@ -338,7 +338,7 @@ Anywhere you can use REQ, you can use DEALER. There are two specific differences * The REQ socket always sends an empty delimiter frame before any data frames; the DEALER does not. * The REQ socket will send only one message before it receives a reply; the DEALER is fully asynchronous. -The synchronous versus asynchronous behavior has no effect on our example because we're doing strict request-reply. It is more relevant when we address recovering from failures, which we'll come to in [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns]. +The synchronous versus asynchronous behavior has no effect on our example because we're doing strict request-reply. It is more relevant when we address recovering from failures, which we'll come to in [#reliable-request-reply]. Now let's look at exactly the same example but with the REQ socket replaced by a DEALER socket: diff --git a/chapter4.txt b/chapter4.txt index 779f39fe9..6eae57760 100644 --- a/chapter4.txt +++ b/chapter4.txt @@ -2,7 +2,7 @@ .bookmark reliable-request-reply + Reliable Request-Reply Patterns -[php:chapter3#advanced-request-reply|Chapter 3 - Advanced Request-Reply Patters] covered advanced uses of ZeroMQ's request-reply pattern with working examples. This chapter looks at the general question of reliability and builds a set of reliable messaging patterns on top of ZeroMQ's core request-reply pattern. +[#advanced-request-reply] covered advanced uses of ZeroMQ's request-reply pattern with working examples. This chapter looks at the general question of reliability and builds a set of reliable messaging patterns on top of ZeroMQ's core request-reply pattern. In this chapter, we focus heavily on user-space request-reply //patterns//, reusable models that help you design your own ZeroMQ architectures: @@ -183,7 +183,7 @@ In all these Pirate patterns, workers are stateless. If the application requires #-----------# #-----------# #-----------# [[/code]] -The basis for the queue proxy is the load balancing broker from [php:chapter3#advanced-request-reply|Chapter 3 - Advanced Request-Reply Patters]. What is the very //minimum// we need to do to handle dead or blocked workers? Turns out, it's surprisingly little. We already have a retry mechanism in the client. So using the load balancing pattern will work pretty well. This fits with ZeroMQ's philosophy that we can extend a peer-to-peer pattern like request-reply by plugging naive proxies in the middle[figure]. +The basis for the queue proxy is the load balancing broker from [#advanced-request-reply]. What is the very //minimum// we need to do to handle dead or blocked workers? Turns out, it's surprisingly little. We already have a retry mechanism in the client. So using the load balancing pattern will work pretty well. This fits with ZeroMQ's philosophy that we can extend a peer-to-peer pattern like request-reply by plugging naive proxies in the middle[figure]. We don't need a special client; we're still using the Lazy Pirate client. Here is the queue, which is identical to the main task of the load balancing broker: @@ -241,7 +241,7 @@ The Simple Pirate Queue pattern works pretty well, especially because it's just We'll fix these in a properly pedantic Paranoid Pirate Pattern. -We previously used a REQ socket for the worker. For the Paranoid Pirate worker, we'll switch to a DEALER socket[figure]. This has the advantage of letting us send and receive messages at any time, rather than the lock-step send/receive that REQ imposes. The downside of DEALER is that we have to do our own envelope management (re-read [php:chapter3#advanced-request-reply|Chapter 3 - Advanced Request-Reply Patters] for background on this concept). +We previously used a REQ socket for the worker. For the Paranoid Pirate worker, we'll switch to a DEALER socket[figure]. This has the advantage of letting us send and receive messages at any time, rather than the lock-step send/receive that REQ imposes. The downside of DEALER is that we have to do our own envelope management (re-read [#advanced-request-reply] for background on this concept). We're still using the Lazy Pirate client. Here is the Paranoid Pirate queue proxy: diff --git a/chapter5.txt b/chapter5.txt index 04e2fe8b9..80a3ec8e0 100644 --- a/chapter5.txt +++ b/chapter5.txt @@ -2,7 +2,7 @@ .bookmark advanced-pub-sub + Advanced Pub-Sub Patterns -In [php:chapter3#advanced-request-reply|Chapter 3 - Advanced Request-Reply Patters] and [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns] we looked at advanced use of ZeroMQ's request-reply pattern. If you managed to digest all that, congratulations. In this chapter we'll focus on publish-subscribe and extend ZeroMQ's core pub-sub pattern with higher-level patterns for performance, reliability, state distribution, and monitoring. +In [#advanced-request-reply] and [#reliable-request-reply] we looked at advanced use of ZeroMQ's request-reply pattern. If you managed to digest all that, congratulations. In this chapter we'll focus on publish-subscribe and extend ZeroMQ's core pub-sub pattern with higher-level patterns for performance, reliability, state distribution, and monitoring. We'll cover: @@ -55,7 +55,7 @@ All of these failure cases have answers, though not always simple ones. Reliabil ++ Pub-Sub Tracing (Espresso Pattern) -Let's start this chapter by looking at a way to trace pub-sub networks. In [php:chapter2#sockets-and-patterns|Chapter 2 - Sockets and Patterns] we saw a simple proxy that used these to do transport bridging. The {{zmq_proxy[3]}} method has three arguments: a //frontend// and //backend// socket that it bridges together, and a //capture// socket to which it will send all messages. +Let's start this chapter by looking at a way to trace pub-sub networks. In [#sockets-and-patterns] we saw a simple proxy that used these to do transport bridging. The {{zmq_proxy[3]}} method has three arguments: a //frontend// and //backend// socket that it bridges together, and a //capture// socket to which it will send all messages. The code is deceptively simple: @@ -131,7 +131,7 @@ And now run as many instances of the subscriber as you want to try, each time co Each subscriber happily reports "Save Roger", and Gregor the Escaped Convict slinks back to his seat for dinner and a nice cup of hot milk, which is all he really wanted in the first place. -One note: by default, the XPUB socket does not report duplicate subscriptions, which is what you want when you're naively connecting an XPUB to an XSUB. Our example sneakily gets around this by using random topics so the chance of it not working is one in a million. In a real LVC proxy, you'll want to use the {{ZMQ_XPUB_VERBOSE}} option that we implement in [php:chapter6#the-community]|Chapter 6 - The ZeroMQ Community] as an exercise. +One note: by default, the XPUB socket does not report duplicate subscriptions, which is what you want when you're naively connecting an XPUB to an XSUB. Our example sneakily gets around this by using random topics so the chance of it not working is one in a million. In a real LVC proxy, you'll want to use the {{ZMQ_XPUB_VERBOSE}} option that we implement in [#the-community] as an exercise. ++ Slow Subscriber Detection (Suicidal Snail Pattern) @@ -283,7 +283,7 @@ A first decision we have to make is whether we work with a central server or not * Conceptually, a central server is simpler to understand because networks are not naturally symmetrical. With a central server, we avoid all questions of discovery, bind versus connect, and so on. -* Generally, a fully-distributed architecture is technically more challenging but ends up with simpler protocols. That is, each node must act as server and client in the right way, which is delicate. When done right, the results are simpler than using a central server. We saw this in the Freelance pattern in [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns]. +* Generally, a fully-distributed architecture is technically more challenging but ends up with simpler protocols. That is, each node must act as server and client in the right way, which is delicate. When done right, the results are simpler than using a central server. We saw this in the Freelance pattern in [#reliable-request-reply]. * A central server will become a bottleneck in high-volume use cases. If handling scale in the order of millions of messages a second is required, we should aim for decentralization right away. @@ -295,7 +295,7 @@ So, for the Clone pattern we'll work with a //server// that publishes state upda We'll develop Clone in stages, solving one problem at a time. First, let's look at how to update a shared state across a set of clients. We need to decide how to represent our state, as well as the updates. The simplest plausible format is a key-value store, where one key-value pair represents an atomic unit of change in the shared state. -We have a simple pub-sub example in [php:chapter1#basics|Chapter 1 - The Basics], the weather server and client. Let's change the server to send key-value pairs, and the client to store these in a hash table. This lets us send updates from one server to a set of clients using the classic pub-sub model[figure]. +We have a simple pub-sub example in [#basics], the weather server and client. Let's change the server to send key-value pairs, and the client to store these in a hash table. This lets us send updates from one server to a set of clients using the classic pub-sub model[figure]. An update is either a new key-value pair, a modified value for an existing key, or a deleted key. We can assume for now that the whole store fits in memory and that applications access it by key, such as by using a hash table or dictionary. For larger stores and some kind of persistence we'd probably store the state in a database, but that's not relevant here. @@ -540,7 +540,7 @@ What we want is a way for the server to recover from being killed, or crashing. * The server process or machine gets disconnected from the network, e.g., a switch dies or a datacenter gets knocked out. It may come back at some point, but in the meantime clients need an alternate server. -Our first step is to add a second server. We can use the Binary Star pattern from [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns] to organize these into primary and backup. Binary Star is a reactor, so it's useful that we already refactored the last server model into a reactor style. +Our first step is to add a second server. We can use the Binary Star pattern from [#reliable-request-reply] to organize these into primary and backup. Binary Star is a reactor, so it's useful that we already refactored the last server model into a reactor style. We need to ensure that updates are not lost if the primary server crashes. The simplest technique is to send them to both servers. The backup server can then act as a client, and keep its state synchronized by receiving updates as all clients do. It'll also get new updates from clients. It can't yet store these in its hash table, but it can hold onto them for a while. @@ -642,7 +642,7 @@ Here is the sixth and last model of the Clone server: This model is only a few hundred lines of code, but it took quite a while to get working. To be accurate, building Model Six took about a full week of "Sweet god, this is just too complex for an example" hacking. We've assembled pretty much everything and the kitchen sink into this small application. We have failover, ephemeral values, subtrees, and so on. What surprised me was that the up-front design was pretty accurate. Still the details of writing and debugging so many socket flows is quite challenging. -The reactor-based design removes a lot of the grunt work from the code, and what remains is simpler and easier to understand. We reuse the bstar reactor from [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns]. The whole server runs as one thread, so there's no inter-thread weirdness going on--just a structure pointer ({{self}}) passed around to all handlers, which can do their thing happily. One nice side effect of using reactors is that the code, being less tightly integrated into a poll loop, is much easier to reuse. Large chunks of Model Six are taken from Model Five. +The reactor-based design removes a lot of the grunt work from the code, and what remains is simpler and easier to understand. We reuse the bstar reactor from [#reliable-request-reply]. The whole server runs as one thread, so there's no inter-thread weirdness going on--just a structure pointer ({{self}}) passed around to all handlers, which can do their thing happily. One nice side effect of using reactors is that the code, being less tightly integrated into a poll loop, is much easier to reuse. Large chunks of Model Six are taken from Model Five. I built it piece by piece, and got each piece working //properly// before going onto the next one. Because there are four or five main socket flows, that meant quite a lot of debugging and testing. I debugged just by dumping messages to the console. Don't use classic debuggers to step through ZeroMQ applications; you need to see the message flows to make any sense of what is going on. @@ -654,7 +654,7 @@ While the server is pretty much a mashup of the previous model plus the Binary S Roughly, there are two ways to design a complex protocol such as this one. One way is to separate each flow into its own set of sockets. This is the approach we used here. The advantage is that each flow is simple and clean. The disadvantage is that managing multiple socket flows at once can be quite complex. Using a reactor makes it simpler, but still, it makes a lot of moving pieces that have to fit together correctly. -The second way to make such a protocol is to use a single socket pair for everything. In this case, I'd have used ROUTER for the server and DEALER for the clients, and then done everything over that connection. It makes for a more complex protocol but at least the complexity is all in one place. In [php:chapter7#advanced-architecture]|Chapter 7 - Advanced Architecture using ZeroMQ] we'll look at an example of a protocol done over a ROUTER-DEALER combination. +The second way to make such a protocol is to use a single socket pair for everything. In this case, I'd have used ROUTER for the server and DEALER for the clients, and then done everything over that connection. It makes for a more complex protocol but at least the complexity is all in one place. In [#advanced-architecture] we'll look at an example of a protocol done over a ROUTER-DEALER combination. Let's take a look at the CHP specification. Note that "SHOULD", "MUST" and "MAY" are key words we use in protocol specifications to indicate requirement levels. @@ -804,7 +804,7 @@ CHP does not implement any authentication, access control, or encryption mechani +++ Building a Multithreaded Stack and API -The client stack we've used so far isn't smart enough to handle this protocol properly. As soon as we start doing heartbeats, we need a client stack that can run in a background thread. In the Freelance pattern at the end of [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns] we used a multithreaded API but didn't explain it in detail. It turns out that multithreaded APIs are quite useful when you start to make more complex ZeroMQ protocols like CHP. +The client stack we've used so far isn't smart enough to handle this protocol properly. As soon as we start doing heartbeats, we need a client stack that can run in a background thread. In the Freelance pattern at the end of [#reliable-request-reply] we used a multithreaded API but didn't explain it in detail. It turns out that multithreaded APIs are quite useful when you start to make more complex ZeroMQ protocols like CHP. [[code type="textdiagram" title="Multithreaded API"]] #--------------# @@ -863,7 +863,7 @@ clone_connect (clone_t *self, char *address, char *service) * We may want to expose the frontend pipe socket handle to allow the class to be integrated into further poll loops. Otherwise any {{recv}} method would block the application. -The clone class has the same structure as the {{flcliapi}} class from [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns] and adds the logic from the last model of the Clone client. Without ZeroMQ, this kind of multithreaded API design would be weeks of really hard work. With ZeroMQ, it was a day or two of work. +The clone class has the same structure as the {{flcliapi}} class from [#reliable-request-reply] and adds the logic from the last model of the Clone client. Without ZeroMQ, this kind of multithreaded API design would be weeks of really hard work. With ZeroMQ, it was a day or two of work. The actual API methods for the clone class are quite simple: diff --git a/chapter6.txt b/chapter6.txt index 70c7fb842..d07eff85a 100644 --- a/chapter6.txt +++ b/chapter6.txt @@ -773,7 +773,7 @@ So, when we trust the solitary experts, they make classic mistakes. They focus o Can we turn the above theory into a reusable process? In late 2011, I started documenting C4 and similar contracts, and using them both in ZeroMQ and in closed source projects. The underlying process is something I call "Simplicity Oriented Design", or SOD. This is a reproducible way of developing simple and elegant products. It organizes people into flexible supply chains that are able to navigate a problem landscape rapidly and cheaply. They do this by building, testing, and keeping or discarding minimal plausible solutions, called "patches". Living products consist of long series of patches, applied one atop the other. -SOD is relevant first because it's how we evolve ZeroMQ. It's also the basis for the design process we will use in [php:chapter7#advanced-architecture]|Chapter 7 - Advanced Architecture using ZeroMQ] to develop larger-scale ZeroMQ applications. Of course, you can use any software architecture methodology with ZeroMQ. +SOD is relevant first because it's how we evolve ZeroMQ. It's also the basis for the design process we will use in [#advanced-architecture] to develop larger-scale ZeroMQ applications. Of course, you can use any software architecture methodology with ZeroMQ. To best understand how we ended up with SOD, let's look at the alternatives. diff --git a/chapter7.txt b/chapter7.txt index 4aa938bdd..06da31bcf 100644 --- a/chapter7.txt +++ b/chapter7.txt @@ -6,7 +6,7 @@ One of the effects of using ZeroMQ at large scale is that because we can build d My experience when teaching ZeroMQ to groups of engineers is that it's rarely sufficient to just explain how ZeroMQ works and then just expect them to start building successful products. Like any technology that removes friction, ZeroMQ opens the door to big blunders. If ZeroMQ is the ACME rocket-propelled shoe of distributed software development, a lot of us are like Wile E. Coyote, slamming full speed into the proverbial desert cliff. -We saw in [php:chapter6#the-community]|Chapter 6 - The ZeroMQ Community] that ZeroMQ itself uses a formal process for changes. One reason we built this process, over some years, was to stop the repeated cliff-slamming that happened in the library itself. +We saw in [#the-community] that ZeroMQ itself uses a formal process for changes. One reason we built this process, over some years, was to stop the repeated cliff-slamming that happened in the library itself. Partly, it's about slowing down and partially, it's about ensuring that when you move fast, you go--and this is essential Dear Reader--in the //right direction//. It's my standard interview riddle: what's the rarest property of any software system, the absolute hardest thing to get right, the lack of which causes the slow or fast death of the vast majority of projects? The answer is not code quality, funding, performance, or even (though it's a close answer), popularity. The answer is //accuracy//. @@ -31,7 +31,7 @@ We'll cover the following juicy topics: I'll introduce Message-Oriented Pattern for Elastic Design (MOPED), a software engineering pattern for ZeroMQ architectures. It was either "MOPED" or "BIKE", the Backronym-Induced Kinetic Effect. That's short for "BICICLE", the Backronym-Inflated See if I Care Less Effect. In life, one learns to go with the least embarrassing choice. -If you've read this book carefully, you'll have seen MOPED in action already. The development of Majordomo in [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns] is a near-perfect case. But cute names are worth a thousand words. +If you've read this book carefully, you'll have seen MOPED in action already. The development of Majordomo in [#reliable-request-reply] is a near-perfect case. But cute names are worth a thousand words. The goal of MOPED is to define a process by which we can take a rough use case for a new distributed application, and go from "Hello World" to fully-working prototype in any language in under a week. @@ -107,7 +107,7 @@ Now, I've nothing personal against committees. The useless folk need a place to It used to be, decades ago, when the Internet was a young modest thing, that protocols were short and sweet. They weren't even "standards", but "requests for comments", which is as modest as you can get. It's been one of my goals since we started iMatix in 1995 to find a way for ordinary people like me to write small, accurate protocols without the overhead of the committees. -Now, ZeroMQ does appear to provide a living, successful protocol abstraction layer with its "we'll carry multipart messages over random transports" way of working. Because ZeroMQ deals silently with framing, connections, and routing, it's surprisingly easy to write full protocol specs on top of ZeroMQ, and in [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns] and [php:chapter5#advanced-pub-sub|Chapter 5 - Advanced Pub-Sub Patterns] I showed how to do this. +Now, ZeroMQ does appear to provide a living, successful protocol abstraction layer with its "we'll carry multipart messages over random transports" way of working. Because ZeroMQ deals silently with framing, connections, and routing, it's surprisingly easy to write full protocol specs on top of ZeroMQ, and in [#reliable-request-reply] and [#advanced-pub-sub] I showed how to do this. Somewhere around mid-2007, I kicked off the Digital Standards Organization to define new simpler ways of producing little standards, protocols, and specifications. In my defense, it was a quiet summer. At the time, I wrote that a new specification should take [http://www.digistan.org/spec:1 "minutes to explain, hours to design, days to write, weeks to prove, months to become mature, and years to replace."] @@ -418,11 +418,11 @@ What if I told you of a way to build custom IDL generators cheaply and quickly? At iMatix, until a few years ago, we used code generation to build ever larger and more ambitious systems until we decided the technology (GSL) was too dangerous for common use, and we sealed the archive and locked it with heavy chains in a deep dungeon. We actually posted it on GitHub. If you want to try the examples that are coming up, grab [https://github.com/imatix/gsl the repository] and build yourself a {{gsl}} command. Typing "make" in the src subdirectory should do it (and if you're that guy who loves Windows, I'm sure you'll send a patch with project files). -This section isn't really about GSL at all, but about a useful and little-known trick that's useful for ambitious architects who want to scale themselves, as well as their work. Once you learn the trick, you can whip up your own code generators in a short time. The code generators most software engineers know about come with a single hard-coded model. For instance, Ragel "compiles executable finite state machines from regular languages", i.e., Ragel's model is a regular language. This certainly works for a good set of problems, but it's far from universal. How do you describe an API in Ragel? Or a project makefile? Or even a finite-state machine like the one we used to design the Binary Star pattern in [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns]? +This section isn't really about GSL at all, but about a useful and little-known trick that's useful for ambitious architects who want to scale themselves, as well as their work. Once you learn the trick, you can whip up your own code generators in a short time. The code generators most software engineers know about come with a single hard-coded model. For instance, Ragel "compiles executable finite state machines from regular languages", i.e., Ragel's model is a regular language. This certainly works for a good set of problems, but it's far from universal. How do you describe an API in Ragel? Or a project makefile? Or even a finite-state machine like the one we used to design the Binary Star pattern in [#reliable-request-reply]? All these would benefit from code generation, but there's no universal model. So the trick is to design your own models as you need them, and then make code generators as cheap compilers for that model. You need some experience in how to make good models, and you need a technology that makes it cheap to build custom code generators. A scripting language, like Perl and Python, is a good option. However, we actually built GSL specifically for this, and that's what I prefer. -Let's take a simple example that ties into what we already know. We'll see more extensive examples later, because I really do believe that code generation is crucial knowledge for large-scale work. In [php:chapter4#reliable-request-reply|Chapter 4 - Reliable Request-Reply Patterns], we developed the [http://rfc.zeromq.org/spec:7 Majordomo Protocol (MDP)], and wrote clients, brokers, and workers for that. Now could we generate those pieces mechanically, by building our own interface description language and code generators? +Let's take a simple example that ties into what we already know. We'll see more extensive examples later, because I really do believe that code generation is crucial knowledge for large-scale work. In [#reliable-request-reply], we developed the [http://rfc.zeromq.org/spec:7 Majordomo Protocol (MDP)], and wrote clients, brokers, and workers for that. Now could we generate those pieces mechanically, by building our own interface description language and code generators? When we write a GSL model, we can use //any// semantics we like, in other words we can invent domain-specific languages on the spot. I'll invent a couple--see if you can guess what they represent: @@ -1303,7 +1303,7 @@ I use track for things like updating my MP3 player mounted as a USB drive. As I +++ Internal Architecture -To build FileMQ I used a lot of code generation, possibly too much for a tutorial. However the code generators are all reusable in other stacks and will be important for our final project in [php:chapter8#moving-pieces]|Chapter 8 - A Framework for Distributed Computing]. They are an evolution of the set we saw earlier: +To build FileMQ I used a lot of code generation, possibly too much for a tutorial. However the code generators are all reusable in other stacks and will be important for our final project in [#moving-pieces]. They are an evolution of the set we saw earlier: * {{codec_c.gsl}}: generates a message codec for a given protocol. * {{server_c.gsl}}: generates a server class for a protocol and state machine. @@ -1483,7 +1483,7 @@ Because we've collected all operations on files in a single class ({{fmq_file}}) +++ Recovery and Late Joiners -As it stands now, FileMQ has one major remaining problem: it provides no way for clients to recover from failures. The scenario is that a client, connected to a server, starts to receive files and then disconnects for some reason. The network may be too slow, or breaks. The client may be on a laptop which is shut down, then resumed. The WiFi may be disconnected. As we move to a more mobile world (see [php:chapter8#moving-pieces]|Chapter 8 - A Framework for Distributed Computing]) this use case becomes more and more frequent. In some ways it's becoming a dominant use case. +As it stands now, FileMQ has one major remaining problem: it provides no way for clients to recover from failures. The scenario is that a client, connected to a server, starts to receive files and then disconnects for some reason. The network may be too slow, or breaks. The client may be on a laptop which is shut down, then resumed. The WiFi may be disconnected. As we move to a more mobile world (see [#moving-pieces]) this use case becomes more and more frequent. In some ways it's becoming a dominant use case. In the classic ZeroMQ pub-sub pattern, there are two strong underlying assumptions, both of which are usually wrong in FileMQ's real world. First, that data expires very rapidly so that there's no interest in asking from old data. Second, that networks are stable and rarely break (so it's better to invest more in improving the infrastructure and less in addressing recovery). diff --git a/chapter8.txt b/chapter8.txt index 5ab66a315..d4fcb44b9 100644 --- a/chapter8.txt +++ b/chapter8.txt @@ -54,7 +54,7 @@ If we can solve these problems reasonably well, and the further problems that wi You should have guessed from my rhetorical questions that there are two broad directions in which we can go. One is to centralize everything. The other is to distribute everything. I'm going to bet on decentralization. If you want centralization, you don't really need ZeroMQ; there are other options you can use. -So very roughly, here's the story. One, the number of moving pieces increases exponentially over time (doubles every 24 months). Two, these pieces stop using wires because dragging cables everywhere gets //really// boring. Three, future applications run across clusters of these pieces using the Benevolent Tyrant pattern from [php:chapter6#the-community]|Chapter 6 - The ZeroMQ Community]. Four, today it's really difficult, nay still rather impossible, to build such applications. Five, let's make it cheap and easy using all the techniques and tools we've built up. Six, partay! +So very roughly, here's the story. One, the number of moving pieces increases exponentially over time (doubles every 24 months). Two, these pieces stop using wires because dragging cables everywhere gets //really// boring. Three, future applications run across clusters of these pieces using the Benevolent Tyrant pattern from [#the-community]. Four, today it's really difficult, nay still rather impossible, to build such applications. Five, let's make it cheap and easy using all the techniques and tools we've built up. Six, partay! ++ The Secret Life of WiFi @@ -494,7 +494,7 @@ If you do sit down and sketch out a UDP multicast protocol, realize that you nee ++ Spinning Off a Library Project -At this stage, however, the code is growing larger than an example should be, so it's time to create a proper GitHub project. It's a rule: build your projects in public view, and tell people about them as you go so your marketing and community building starts on Day 1. I'll walk through what this involves. I explained in [php:chapter6#the-community]|Chapter 6 - The ZeroMQ Community] about growing communities around projects. We need a few things: +At this stage, however, the code is growing larger than an example should be, so it's time to create a proper GitHub project. It's a rule: build your projects in public view, and tell people about them as you go so your marketing and community building starts on Day 1. I'll walk through what this involves. I explained in [#the-community] about growing communities around projects. We need a few things: * A name * A slogan @@ -511,7 +511,7 @@ I'm somewhat shy about pushing new projects into the ZeroMQ community too aggres Start with the basics. The protocol (UDP and ZeroMQ/TCP) will be ZRE (ZeroMQ Realtime Exchange protocol) and the project will be Zyre. I need a second maintainer, so I invite my friend Dong Min (the Korean hacker behind JeroMQ, a pure-Java ZeroMQ stack) to join. He's been working on very similar ideas so is enthusiastic. We discuss this and we get the idea of building Zyre on top of JeroMQ, as well as on top of CZMQ and {{libzmq}}. This would make it a lot easier to run Zyre on Android. It would also give us two fully separate implementations from the start, which is always a good thing for a protocol. -So we take the FileMQ project I built in [php:chapter7#advanced-architecture]|Chapter 7 - Advanced Architecture using ZeroMQ] as a template for a new GitHub project. The GNU autoconf tools are quite decent, but have a painful syntax. It's easiest to copy existing project files and modify them. The FileMQ project builds a library, has test tools, license files, man pages, and so on. It's not too large so it's a good starting point. +So we take the FileMQ project I built in [#advanced-architecture] as a template for a new GitHub project. The GNU autoconf tools are quite decent, but have a painful syntax. It's easiest to copy existing project files and modify them. The FileMQ project builds a library, has test tools, license files, man pages, and so on. It's not too large so it's a good starting point. I put together a README to summarize the goals of the project and point to C4. The issue tracker is enabled by default on new GitHub projects, so once we've pushed the UDP ping code as a first version, we're ready to go. However, it's always good to recruit more maintainers, so I create an issue "Call for maintainers" that says: @@ -710,7 +710,7 @@ I'm not going to work through the implementation of group messaging in detail be * A hash of groups for other peers, which we update with information from {{HELLO}}, {{JOIN}}, and {{LEAVE}} commands; * A hash of peers for each group, which we update with the same three commands. -At this stage, I'm starting to get pretty happy with the binary serialization (our codec generator from [php:chapter7#advanced-architecture]|Chapter 7 - Advanced Architecture using ZeroMQ]), which handles lists and dictionaries as well as strings and integers. +At this stage, I'm starting to get pretty happy with the binary serialization (our codec generator from [#advanced-architecture]), which handles lists and dictionaries as well as strings and integers. This version is tagged in the repository as v0.2.0 and you can [https://github.com/zeromq/zyre/tags download the tarball] if you want to check what the code looked like at this stage. @@ -1196,7 +1196,7 @@ As usual, we'll aim for the very simplest plausible solution and then improve th * Zyre will distribute that file to all peers, both those that are on the network at that time, and those that arrive later. * Each time an interface receives a file it tells its application, "Here is this file". -We might eventually want more discrimination, e.g., publishing to specific groups. We can add that later if it's needed. In [php:chapter7#advanced-architecture]|Chapter 7 - Advanced Architecture using ZeroMQ] we developed a file distribution system (FileMQ) designed to be plugged into ZeroMQ applications. So let's use that. +We might eventually want more discrimination, e.g., publishing to specific groups. We can add that later if it's needed. In [#advanced-architecture] we developed a file distribution system (FileMQ) designed to be plugged into ZeroMQ applications. So let's use that. Each node is going to be a file publisher and a file subscriber. We bind the publisher to an ephemeral port (if we use the standard FileMQ port 5670, we can't run multiple interfaces on one box), and we broadcast the publisher's endpoint in the {{HELLO}} message, as we did for the log collector. This lets us interconnect all nodes so that all subscribers talk to all publishers.