Find file
Fetching contributors…
Cannot retrieve contributors at this time
1591 lines (1074 sloc) 116 KB
.output chapter6.wd
++ Chapter Six - The Human Scale
If you've survived the first five chapters, congratulations. It was hard for for me too. Happily the jokes and the code mostly write themselves, so we'll continue with our journey of exploring 0MQ. In this chapter I'm going to step back from the nuts and bolts of 0MQ's technical machinery, and look more at how to use 0MQ successfully in larger projects.
We'll cover:
* What "software architecture" is really about.
* The Simplicity-Oriented Design process and its ugly cousins Cod and Tod.
* How to use 0MQ to go from idea to working prototype safely.
* Different ways to serialize your data as 0MQ messages.
* How to code-generate binary serialization codecs.
* How to build custom code generators.
* How to write and license an protocol specification.
* How to do fast restartable file transfer over 0MQ.
* How to do credit-based flow control.
* How to do heartbeating for different 0MQ patterns.
* How to build protocol servers and clients as state machines.
* How to make a secure protocol over 0MQ (yay!).
* A large-scale file publishing system (FileMQ).
+++ The Tale of Two Bridges
Two old engineers were talking of their lives and boasting of their greatest projects. One of the engineers explained how he had designed one of the greatest bridges ever made.
"We built it across a river gorge," he told his friend. "It was wide and deep. We spent two years studying the land, and choosing designs and materials. We hired the best engineers and designed the bridge, which took another five years. We contracted the largest engineering firms to build the structures, the towers, the tollbooths, and the roads that would connect the bridge to the main highways. Dozens died during the construction. Under the road level we had trains, and a special path for cyclists. That bridge represented years of my life."
The second man reflected for a while, then spoke. "One evening me and a friend got drunk on vodka, and we threw a rope across a gorge," he said. "Just a rope, tied to two trees. There were two villages, one at each side. At first, people pulled packages across that rope with a pulley and string. Then someone threw a second rope, and built a foot walk. It was dangerous, but the kids loved it. A group of men then rebuilt that, made it solid, and women started to cross, everyday, with their produce. A market grew up on one side of the bridge, and slowly that became a large town, since there was a lot of space for houses. The rope bridge got replaced with a wooden bridge, to allow horses and carts to cross. Then the town built a real stone bridge, with metal beams. Later, they replaced the stone part with steel, and today there's a suspension bridge standing in that same spot."
The first engineer was silent. "Funny thing," he said, "my bridge was demolished about ten years after we built it. Turns out it was built in the wrong place and no-one wanted to use it. Some guys had thrown a rope across the gorge, a few miles further downstream, and that's where everyone went."
+++ Code on the Human Scale
To write a poem that captures the heart, first learn the language. To use 0MQ successfully at scale you have to learn two languages. The first is 0MQ itself. This takes even the best of us time. It's a truism that if you try to port an old architecture onto 0MQ, the results are going to be weird. 0MQ's language is subtle and profound and when you master it you will find yourself removing old complexity, not converting it.
However the real challenge of using 0MQ is that old barriers fall away, and the size of the projects you can do increases hugely. Non-distributed code is often a single-person project. You can work in your corner, perhaps for years, like an author on a book. It's all about concentration. But distributed code is different. To quote my favorite author, it "has to talk to code, has to be chatty, sociable, well-connected".
Writing distributed code is like playing live music: it's all about other people. Concentration is worthless if you can't listen. No-one enjoys listening to an amazingly proficient musician who's out of time with the rest of the group and can't read the mood of the audience. A live jam is entrancing not because of the technical quality but because of the real-time creative energy.
And so it goes with distributed code. Real-time creative energy is what wins, not pure technical quality, and certainly not technical quality combined with inability to work with others.
All this is fine in theory. Here comes the catch: working with other people is //plain hard//. We can expect a musician to be naturally social. But software developers? We're the very caricature of anti-social tunnel-visioned hermits. Other people are hard work. They're slow, they make mistakes, they ask too many questions, they don't respect our code, they make wrong assumptions, they argue.
My response isn't very sympathetic. To succeed in the software industry as it turns into something more like a never-ending live jam, we have to learn to put away our egos, work successfully with others, worry less about our own skills and look more at others, put away our natural insolence and attitude, and to learn to like and trust other people.
So this is what this chapter is really about: writing code at scale by understanding ourselves much better. Of course these lessons apply to all large-scale applications. Using 0MQ we just hit the problem sooner than we'd expect.
+++ Psychology of Software Development
Dirkjan Ochtman pointed me to [ Wikipedia's definition of Software Architecture] as "//the set of structures needed to reason about the system, which comprise software elements, relations among them, and properties of both//". For me this vapid and circular jargon is a good example of how miserably little we understand about what actually makes a successful large scale software architecture.
Architecture is the art and science of making large artificial structures for human use. If there is one thing I've learned and applied successfully in 30 years of making larger and larger software systems it is this: software is about people. Large structures in themselves are meaningless. It's how they function for //human use// that matters. And in software, human use starts with the programmers who make the software itself.
The core problems in software architecture are driven by human psychology, not technology. There are many ways our psychology affects our work. I could point to the way teams seem to get stupider as they get larger, or have to work across larger distances. Does that mean the smaller the team, the more effective? How then does a large global community like 0MQ manage to work successfully?
The 0MQ community wasn't accidental, it was a deliberate design, my contribution to the early days when the code came out of a cellar in Bratislava. The design was based on my pet science of "Social Architecture", which [ Wikipedia defines] (what a coincidence!) as "//the process, and the product, of planning, designing, and growing an on-line community.//"
One of the tenets of Social Architecture is that //how we organize// is more significant than //who we are//. The same group, organized differently, can produce entirely opposite results. We are like peers in a 0MQ network, and our communication patterns have dramatic impact on our performance. Ordinary people, well connected, can far outperform a team of experts working in the wrong patterns. If you're the architect of a larger 0MQ application, you're going to have to help others find the right patterns for working together. Do this right, and your project can succeed. Do it wrong, and your project will fail.
The two most important psychological elements are IMO that we're really bad at understanding complexity, and that we are so good at working together to divide and conquer large problems. We're highly social apes, and kind of smart, but only in the right kind of crowd.
So here is my short list of the Psychological Elements of Software Architecture:
* **Stupidity**: our mental bandwidth is limited, so we're all stupid at some point. The architecture has to be simple to understand. This is the number one rule: simplicity beats functionality, every single time. If you can't understand an architecture on a cold gray Monday morning before coffee, it is too complex.
* **Selfishness**: we act only out of self-interest, so the architecture must create space and opportunity for selfish acts that benefit the whole. Selfishness is often indirect and subtle. For example I'll spend hours helping someone else understand something because that could be worth days to me later.
* **Laziness**: we make lots of assumptions, many of which are wrong. We are happiest when we can spend the least effort to get a result, to test an assumption quickly, so the architecture has to make this possible. Specifically, that means it must be simple.
* **Jealousy**: we're jealous of others, which means we'll overcome our stupidity and laziness to prove others wrong, and beat them in competition. The architecture thus has to create space for public competition based on fair rules that anyone can understand.
* **Reciprocity**: we'll pay extra in terms of hard work, even money, to punish cheats and enforce fair rules. The architecture should be heavily rule-based, telling people how to work together, but not what to work on.
* **Pride**: we're intensely aware of our social status, and we'll work hard to avoid looking stupid or incompetent in public. The architecture has to make sure every piece we make has our name on it, so we'll have sleepless nights stressing about what others will say about our work.
* **Greed**: we're ultimately economic animals (see selfishness), so the architecture has to give us economic incentive to invest in making it happen. Maybe it's polishing our reputation as experts, maybe it's literally making money from some skill or component. It doesn't matter what it is, but there must be economic incentive. Think of architecture as a market place, not an engineering design.
* **Conformity**: we're happiest to conform, out of fear and laziness, so the architecture should be strongly rule-based, and rules should be clear, accurate, well-documented, and enforced.
* **Fear**: we're unwilling to take risks, especially if it makes us look stupid. Fear of failure is a major reason people conform and follow the group in mass stupidity. The architecture should make silent experimentation easy and cheap, giving people opportunity for success without punishing failure.
These strategies work on large scale but also on small scale, within an organization or team.
+++ The Bad, the Ugly, and the Delicious
Complexity is easy, it's simplicity that is hard. Whether our software is bad, ugly, or so delicious that it feels wrong to consume alone, doesn't depend so much on our individual skills as how we work together. That is, our processes.
There are many aspects to getting product-building teams and organizations to think wisely. You need diversity, freedom, challenge, resources, and so on. I discuss these in detail in [ Software and Silicon]. However, even if you have all the right ingredients, the default processes that skilled engineers and designers develop will result in complex, hard-to-use products.
The classic errors are: to focus on ideas, not problems; to focus on the wrong problems; to misjudge the value of solving problems; to not use ones' own work; and in many other ways to misjudge the real market.
I'll propose a process called "Simplicity Oriented Design", or SOD, which is as far as I can tell a reliable, repeatable way of developing simple and elegant products. This process organizes people into flexible supply chains that are able to navigate a problem landscape rapidly and cheaply. They do this by building, testing, and keeping or discarding minimal plausible solutions, called "patches". Living products consist of long series of patches, applied one atop the other. Yes, you may recognize the process by which we develop 0MQ.
Let's first look at the more common and less joyful processes, TOD and COD.
++++ Trash-Oriented Design
The most popular design process in large businesses seems to be "Trash Oriented Design", or TOD. TOD feeds off the belief that all we need to make money are great ideas. It's tenacious nonsense but a powerful crutch for people who lack imagination. The theory goes that ideas are rare, so the trick is to capture them. It's like non-musicians being awed by a guitar player, not realizing that great talent is so cheap it literally plays on the streets for coins.
The main output of TODs are expensive "ideations": concepts, design documents, and products that go straight into the trash can. It works as follows:
* The Creative People come up with long lists of "we could do X and Y". I've seen endlessly detailed lists of everything amazing a product could do. Once the creative work of idea generation has happened, it's just a matter of execution, of course.
* So the managers and their consultants pass their brilliant, world-shattering ideas to designers who acres of detailed, preciously refined design documents. The designers take the tens of ideas the managers came up with, and turn them into hundreds of amazing, world-changing designs.
* These designs get given to engineers who scratch their heads and wonder who the heck came up with such stupid nonsense. They start to argue back but the designs come from up high, and really, it's not up to engineers to argue with creative people and expensive consultants.
* So the engineers creep back to their cubicles, humiliated and threatened into building the gigantic but oh-so-elegant pile of junk. It is bone-breakingly hard work since the designs take no account of practical costs. Minor whims might take weeks of work to build. As the project gets delayed, the managers bully the engineers into giving up their evenings and weekends.
* Eventually, something resembling a working product makes it out of the door. It's creaky and fragile, complex and ugly. The designers curse the engineers for their incompetence and pay more consultants to put lipstick onto the pig, and slowly the product starts to look a little nicer.
* By this time, the managers have started to try to sell the product and they find, shockingly, that no-one wants it. Undaunted and courageously they build million-dollar web sites and ad campaigns to explain to the public why they absolutely need this product. They do deals with other businesses to force the product on the lazy, stupid and ungrateful market.
* After twelve months of intense marketing, the product still isn't making profits. Worse, it suffers dramatic failures and gets branded in the press as a disaster. The company quietly shelves it, fires the consultants, buys a competing product from a small start-up and re-brands that as its own Version 2. Hundreds of millions of dollars end-up in the trash.
* Meanwhile, another visionary manager, somewhere in the Organization, drinks a little too much tequila with some marketing people and has a Brilliant Idea.
Trash-Oriented Design would be a caricature if it wasn't so common. Something like 19 out of 20 market-ready products built by large firms are failures (yes, 87% of statistics are made up on the spot). The remaining one in 20 probably only succeeds because the competitors are so bad and the marketing is so aggressive.
The main lessons of TOD are quite straight-forward but hard to swallow. They are:
* Ideas are cheap. No exceptions. There are no brilliant ideas. Anyone who tries to start a discussion with "oooh, we can do this too!" should be beaten down with all the passion one reserves for traveling evangelists. It is like sitting in a cafe at the foot of a mountain, drinking a hot chocolate and telling others, "//hey, I have a great idea, we can climb that mountain! And build a chalet on top! With two saunas! And a garden! Hey, and we can make it solar powered! Dude, that's awesome! What color should we paint it? Green! No, blue! OK, go and make it, I'll stay here and make spreadsheets and graphics!//"
* The starting point for a good design process is to collect real problems that confront real people. The second step is to evaluate these problems with the basic question, "how much is it worth to solve this problem?" Having done that, we can collect that set of problems that are worth solving.
* Good solutions to real problems will succeed as products. Their success will depend on how good and cheap the solution is, and how important the problem is (and sadly, how big the marketing budgets are). But their success will also depend on how much they demand in effort to use, in other words how simple they are.
Hence after slaying the dragon of utter irrelevance, we attack the demon of complexity.
++++ Complexity-Oriented Design
Really good engineering teams and small firms can usually build decent products. But the vast majority of products still end up being too complex and less successful than they might be. This is because specialist teams, even the best, often stubbornly apply a process I call "Complexity-Oriented Design", or COD, which works as follows:
* Management correctly identifies some interesting and difficult problem with economic value. In doing so they already leapfrog over any TOD team.
* The team with enthusiasm start to build prototypes and core layers. These work as designed and thus encouraged, the team go off into intense design and architecture discussions, coming up with elegant schemas that look beautiful and solid.
* Management comes back and challenges team with yet more difficult problems. We tend to equate value with cost, so the harder the problem, and more expensive to solve, the more the solution should be worth, in their minds.
* The team, being engineers and thus loving to build stuff, build stuff. They build and build and build and end-up with massive, perfectly-designed complexity.
* The products go to market, and the market scratches its head and asks, "seriously, is this the best you can do?" People do use the products, especially if they aren't spending their own money in climbing the learning curve.
* Management gets positive feedback from its larger customers, who share the same idea that high cost (in training and use) means high value. and so continues to push the process.
* Meanwhile somewhere across the world, a small team is solving the same problem using a better process, and a year later smashes the market to little pieces.
COD is characterized by a team obsessively solving the wrong problems to the point of collective insanity. COD products tend to be large, ambitious, complex, and unpopular. Much open source software is the output of COD processes. It is insanely hard for engineers to **stop** extending a design to cover more potential problems. They argue, "what if someone wants to do X?" but never ask themselves, "what is the real value of solving X?"
A good example of COD in practice is Bluetooth, a complex, over-designed set of protocols that users hate. It continues to exist only because in a massively-patented industry there are no real alternatives. Bluetooth is perfectly secure, which is close to pointless for a proximity protocol. At the same time it lacks a standard API for developers, meaning it's really costly to use Bluetooth in applications.
On the #zeromq IRC channel, Wintre once wrote of how enraged he was many years ago when he "//found that XMMS 2 had a working plugin system but could not actually play music.//"
COD is a form of large-scale "rabbit holing", in which designers and engineers cannot distance themselves from the technical details of their work. They add more and more features, utterly misreading the economics of their work.
The main lessons of COD are also simple but hard for experts to swallow. They are:
* Making stuff that you don't immediately have a need for is pointless. Doesn't matter how talented or brilliant you are, if you just sit down and make stuff people are not actually asking for, you are most likely wasting your time.
* Problems are not equal. Some are simple, and some are complex. Ironically, solving the simpler problems often has more value to more people than solving the really hard ones. So if you allow engineers to just work on random things, they'll most focus on the most interesting but least worthwhile things.
* Engineers and designers love to make stuff and decoration, and this inevitably leads to complexity. It is crucial to have a "stop mechanism", a way to set short, hard deadlines that force people to make smaller, simpler answers to just the most crucial problems.
++++ Simplicity-Oriented Design
Finally, we come to the rare but precious Simplicity-Oriented Design. This process starts with a realization: we do not know what we have to make until after we start making it. Coming up with ideas, or large-scale designs isn't just wasteful, it's a direct hindrance to designing the truly accurate solutions. The really juicy problems are hidden like far valleys, and any activity except active scouting creates a fog that hides those distant valleys. You need to keep mobile, pack light, and move fast.
SOD works as follows:
* We collect a set of interesting problems (by looking at how people use technology or other products) and we line these up from simple to complex, looking for and identifying patterns of use.
* We take the simplest, most dramatic problem and we solve this with a minimal plausible solution, or "patch". Each patch solves exactly a genuine and agreed problem in a brutally minimal fashion.
* We apply one measure of quality to patches, namely "can this be done any simpler while still solving the stated problem?" We can measure complexity in terms of concepts and models that the user has to learn or guess in order to use the patch. The fewer, the better. A perfect patch solves a problem with zero learning required by the user.
* Our product development consists of a patch that solves the problem "we need a proof of concept" and then evolves in an unbroken line to a mature series of products, through hundreds or thousands of patches piled on top of each other.
* We do not do //anything// that is not a patch. We enforce this rule with formal processes that demand that every activity or task is tied to a genuine and agreed problem, explicitly enunciated and documented.
* We build our projects into a supply chain where each project can provide problems to its "suppliers" and receive patches in return. The supply chain creates the "stop mechanism" since when people are impatiently waiting for an answer, we necessarily cut our work short.
* Individuals are free to work on any projects, and provide patches at any place they feel it's worthwhile. No individuals "own" any project, except to enforce the formal processes. A single project can have many variations, each a collection of different, competing patches.
* Projects export formal and documented interfaces so that upstream (client) projects are unaware of change happening in supplier projects. Thus multiple supplier projects can compete for client projects, in effect creating a free and competitive market.
* We tie our supply chain to real users and external clients and we drive the whole process by rapid cycles so that a problem received from outside users can be analyzed, evaluated, and solved with a patch in a few hours.
* At every moment from the very first patch, our product is shippable. This is essential, because a large proportion of patches will be wrong (10-30%) and only by giving the product to users can we know which patches have become problems and themselves need solving.
SOD is a form of "hill climbing algorithm", a reliable way of finding optimal solutions to the most significant problems in an unknown landscape. You don't need to be a genius to use SOD successfully, you just need to be able to see the difference between the fog of activity and the progress towards new real problems.
A really good designer with a good team can use SOD to build world-class products, rapidly and accurately. To get the most out of SOD, the designer has to use the product continuously, from day 1, and develop his or her ability to smell out problems such as inconsistency, surprising behavior, and other forms of friction. We naturally overlook many annoyances but a good designer picks these up, and thinks about how to patch them. Design is about removing friction in the use of a product.
In an open source setting, we do this work in public. There's no "let's open the code" moment. Projects that do this are in my view missing the point of open source, which is to engage your users in your exploration, and to build community around the seed of the architecture.
+++ Message Oriented Pattern for Elastic Design
Now I'll introduce MOPED, which is a SOD pattern custom-designed for 0MQ architectures. It was either MOPED or BIKE, the Backronym-Induced Kinetic Effect. That's short for BICICLE, the Backronym-Inflated See if I Care Less Effect. In life, one learns to go with the least embarrassing choices.
Speaking of embarrassments, just as 0MQ lets us aim for really massive architectures, it also, like any technology that removes friction, opens the door to truly massive blunders. If 0MQ is the ACME rocket-propelled shoe of distributed software development, a lot of us are like Wile E. Coyote, slamming full speed into the proverbial desert cliff.
So MOPED is meant to save us from such mistakes. Partly it's about slowing down, partly it's about ensuring that when you move fast, you go - and this is essential, dear reader - in the //right direction//. It's my standard interview riddle: what's the rarest property of any software system, the absolute hardest thing to get right, the lack of which causes the slow or fast death of the vast majority of projects? The answer is not code quality, funding, performance, or even (though it's a close answer), popularity. The answer is "accuracy".
If you've read the Guide observantly you'll have seen MOPED in action already. The development of Majordomo in Chapter 4 is a near-perfect case. But cute names are worth a thousand words.
The goal of MOPED is to define a process, a pattern by which we can take a rough use case for a new distributed application, and go from "hello world" to fully-working prototype in any language in under a week.
Using MOPED, you grow, more than build, a working 0MQ architecture from the ground-up, with minimal risk of failure. By focusing on the contracts, rather than the implementations, you avoid the risk of premature optimization. By driving the design process through ultra-short test-based cycles, you can be more certain what you have works, before you add more.
We can turn this into five real steps:
* Step 1: internalize the 0MQ semantics.
* Step 2: draw a rough architecture.
* Step 3: decide on the contracts.
* Step 4: make a minimal end-to-end solution.
* Step 5: solve one problem and repeat.
++++ Step 1: Internalize the Semantics
To repeat myself: you must learn 0MQ's language. The only way to learn a language is to use it. There's no way to avoid this investment, no tapes you can play while you sleep, no chips you can plug in to magically become smarter. Read the Guide, work through the code examples, understand what's going on, and (most importantly) write some examples yourself, and then //throw them away//.
At a certain point you'll feel a clicking noise in your brain. Maybe you'll have a weird chili-induced dream where little 0MQ tasks run around trying to eat you alive. Maybe you'll just think "aaahh, so //that's// what it means!" If we did our work right, it should take 2-3 days. However long it takes, until you start thinking in terms of 0MQ sockets and patterns, you're not ready for step 2.
++++ Step 2: Draw a Rough Architecture
Whiteboard time. Get a couple of colleagues and try to draw your architecture on a whiteboard. you want to draw boxes connected with arrows, showing the flow of work, data, results, etc. Since we live in a gravity well, it's best to draw the main arrows going down. Almost all architectures have a //direction//, and a certain symmetry, and what you want to do is capture that as simply and cleanly as you can.
Ignore anything that's not central to the core problem. Ignore logging, error handling, recovery from failures, etc. What you leave out is as important as what you capture: you can always add, but it's very hard to remove. When you have a simple, clean drawing, you're ready for step 3.
++++ Step 3: Decide on the Contracts
Human scale depends on contracts, and the more explicit they are, the better things scale. You don't care //how// things happen, only the results. If I send an email, I don't care how it arrives at its destination, so long as the contract (it arrives within a few minutes, it's not modified, it doesn't get lost) is respected.
And to build a large system that works well, you must focus on the contracts, before the implementations. It may sound obvious but all too often, people forget and ignore this, or are just too shy to impose themselves. I wish I could say 0MQ had done this properly but for years our public contracts were second-rate afterthoughts instead of primary in-your-face pieces of work.
So what is a contract in a distributed system? There are, in my experience, two types of contract:
* The APIs to client applications. Remember the Psychological Elements. The APIs need to be as absolutely //simple//, //consistent//, and //familiar// as possible. Yes, you can generate API documentation from code, but you must first design it, and designing an API is often hard.
* The protocols that connect the pieces. It sounds like rocket science, but it's really just a simple trick, and one that 0MQ makes particularly easy. In fact they're so simple to write, and need so little bureaucracy that I call them "unprotocols".
You write minimal contracts that are mostly just place markers. Most messages and most API methods will be missing, or empty. You also want to write down any known technical requirements in terms of throughput, latency, reliability, etc. These are the criteria on which you will accept, or reject, any particular piece of work.
++++ Step 4: Write a Minimal End-to-End Solution
The goal is to test out the overall architecture as rapidly as possible. Make skeleton applications that call the APIs, and skeleton stacks that implement both sides of every protocol. You want to get a working end-to-end "hello world" as soon as you can. You want to be able to test code, as you write it, to weed-out the broken assumptions and inevitable errors you make. Do not go off and spend six months writing a test suite! Instead, make a minimal bare-bones application that uses our still-hypothetical API.
If you design an API wearing the hat of the person who implements it, you'll start to think of performance, features, options, and so on. You'll make it more complex, more irregular, and more surprising than it should be. But, and here's the trick (it's a cheap one, was big in Japan), if you design an API while wearing the hat of the poor sucker who has to actually write apps that use it, you use all that laziness and fear to our advantage.
Write down the protocols, on a wiki or shared document, in such a way that you can explain every command clearly without too much detail. Strip off any real functionality, because it'll create inertia that just makes it harder to move stuff around. You can always add weight. Don't spend effort defining formal message structures: pass the minimum around, in the simplest possible fashion, using 0MQ's multi-part framing.
Our goal is to get the simplest test case working, without any avoidable functionality. Everything you can chop off the list of things to do, you chop. Ignore the groans from colleagues and bosses. I'll repeat this once again: you can //always// add functionality, that's relatively easy. But aim to keep the overall weight to a minimum.
++++ Step 5: Solve One Problem and Repeat
You're now in the Happy Loop of issue-driven development where you can start to solve tangible problems instead of adding features. Write issues that state a clear problem, and propose a solution. Keep in mind, as you design the API, your standards for names, consistency, and behavior. Writing these down in prose often helps keep them sane.
From here, every single change you make to the architecture and code is now proven by running the test case, watching it not work, making the change, and then watching it work.
Now you go through the whole cycle (extending the test case, fixing the API, updating the protocol, extending the code, as needed), taking problems one at a time and testing the solutions individually. It should take about 10-30 minutes for each cycle, with the occasional spike due to random confusion.
+++ Unprotocols
++++ Why Unprotocols?
When this man thinks of protocols, this man thinks of massive documents written by committees, over years. This man thinks of the IETF, W3C, ISO, Oasis, regulatory capture, FRAND patent license disputes, and soon after, this man thinks of retirement to a nice little farm in northern Bolivia up in the mountains where the only other needlessly stubborn beings are the goats chewing up the coffee plants.
Now, I've nothing personal against committees. The useless folk need a place to sit out their lives with minimal risk of reproducing, after all, that only seems fair. But most committee protocols tend towards complexity (the ones that work), or trash (the ones we don't talk about). There's a few reasons for this. One is the amount of money at stake. More money means more people who want their particular prejudices and assumptions expressed in prose. But two is the lack of good abstractions on which to build. People have tried to build reusable protocol abstractions, like BEEP. Most did not stick, and those that did, like SOAP and XMPP, are on the complex side of things.
It used to be, decades ago, when the Internet was a young modest thing, that protocols were short and sweet. They weren't even "standards", but "requests for comments", which is as modest as you can get. It's been one of my goals since we started iMatix in 1995 to find a way for ordinary people like me to write small, accurate protocols without the overhead of the committees.
Now, 0MQ does appear to provide a living, successful protocol abstraction layer with its "we'll carry multi-part messages over random transports" way of working. Since 0MQ deals silently with framing, connections, and routing, it's surprisingly easy to write full protocol specs on top of 0MQ, and in Chapters four and five I showed how to do this.
Somewhere around mid-2007, I kicked-off the Digital Standards Organization to define new simpler ways of producing little standards, protocols, specifications. In my defense, it was a quiet summer. At the time [ I wrote that] a new specification should take "//minutes to explain, hours to design, days to write, weeks to prove, months to become mature, and years to replace.//"
In 2010 we started calling such little specifications "unprotocols", which some people might mistake for a dastardly plan for world domination by a shadowy international organization, but which really just means, "protocols without the goats".
++++ How to Write Unprotocols
Here's an unprotocol called NOM that we'll come back to later in this chapter:
nom-protocol = open-peering *use-peering
open-peering = C:OHAI ( S:OHAI-OK / S:WTF )
use-peering = C:ICANHAZ
I've actually used these keywords (OHAI, WTF) in commercial projects. They make developers giggly and happy. They confuse management. They're good in first drafts that you want to throw away later.
When you start to write unprotocols, stick to a consistent structure so that your readers know what to expect. Here is the structure I use:
* Cover section: with a 1-line summary, URL to the spec, formal name, version, who to blame.
* License for the text: absolutely needed for public specifications.
* The change process: i.e. how I as a reader fix problems in the specification?
* Use of language: MUST, MAY, SHOULD, etc. with a reference to RFC 2119.
* Maturity indicator: is this a experimental, draft, stable, legacy, retired?
* Goals of the protocol: what problems is it trying to solve?
* Formal grammar: prevents arguments due to different interpretation of the text.
* Technical explanation: semantics of each message, error handling, etc.
* Security discussion: explicitly, how secure the protocol is.
* References: to other documents, protocols, etc.
Writing clear, expressive text is hard. Do avoid trying to describe implementations of the protocol. Remember that you're writing a contract. You describe in clear language the obligations and expectations of each party, the level of obligation, and the penalties for breaking the rules. You do not try to define //how// each party honors its part of the deal.
If you need reference material to start with, read the site, which has a bunch of unprotocols that you can copy/paste from.
Here are some key points about unprotocols:
* As long as your process is open then you don't need a committee: just make clean minimal designs and make sure anyone is free to improve them.
* If use an existing license then you don't have legal worries afterwards. I use GPLv3 for my public specifications and advise you to do the same. For in-house work, standard copyright is perfect.
* The formality is valuable. That is, learn to write [ ABNF] and use this to fully document your messages.
* Use a market-driven life-cycle process like [ Digistan's COSS] so that people place the right weight on your specs as they mature (or don't).
++++ Why use the GPLv3 for Public Specifications?
The license you choose is particularly crucial for public specifications. Traditionally, protocols are published under custom licenses, where the authors own the text and derived works are forbidden. This sounds great (after all, who wants to see a protocol forked?) but it's in fact highly risky. A protocol committee is vulnerable to capture, and if the protocol is important and valuable, the incentive for capture grows.
Once captured, like some wild animals, an important protocol will often die. The real problem is there's no way to //free// a captive protocol published under a conventional license. The word "free" isn't just an adjective to describe speech or air, it's also a verb, and the right to fork a work, //against the wishes of the owner//, is essential to avoiding capture.
Let me explain this in shorter words. Imagine iMatix writes a protocol today, that's really amazing and popular. We publish the spec and many people implement it. Those implementations are fast and awesome, and free as in beer. And they start to threaten an existing business. Their expensive commercial product is slower and can't compete. So one day they come to our iMatix office in Maetang-Dong, South Korea, and offer to buy our firm. Since we're spending vast amounts on sushi and beer and GFEs, we accept gratefully. With evil laughter the new owners of the protocol stop improving the public version, and close the specification and add patented extensions. Their new products support this, and they take over the whole market.
When you contribute to an open source project, you really want to know your hard work won't used against you by a closed-source competitor. Which is why the GPL beats the "more permissive" BSD/MIT/X11 licenses. These license give permission to cheat. This applies just as much to protocols as to source code.
When you implement a GPLv3 specification, your applications are of course yours, and licensed any way you like. But you can be sure and certain of two things. One, that specification will //ever// be embraced and extended into proprietary forms. Any derived forms of the specification must also be GPLv3. Two, no-one who ever implements or uses the protocol will ever launch a patent attack on anything it covers.
+++ Serializing your Data
When we start to design a protocol, one of the first questions we face is how we encode data on the wire. There is, sadly, no universal answer. There are a half-dozen different ways to serialize data, each with pros and cons. We'll explore these.
However, there is a general lesson I've learned over a couple of decades of writing protocols small and large. I call this the "Cheap and Nasty" pattern: you can often split your work into two layers, and solve these separately, one using a "cheap" approach, the other using a "nasty" approach.
++++ Cheap and Nasty
The key insight to making Cheap and Nasty work is to realize that many protocols mix a low-volume chatty part for control, and a high-volume asynchronous part for data. For instance, HTTP has a chatty dialog to authenticate and get pages, and an asynchronous dialog to stream data. FTP actually splits this over two ports; one port for control and one port for data.
Protocol designers who don't separate control from data tend to make awful protocols, because the trade-offs in the two cases are almost totally opposite. What is perfect for control is terrible for data, and what's ideal for data just doesn't work for control. It's especially true when we want high-performance at the same time as extensibility and good error checking.
Let's break this down using a classic client-server use-case. The client connects to the server, and authenticates. It then asks for some resource. The server chats back, then starts to send data back to the client. Eventually the client disconnects or the server finishes, and the conversation is over.
Now, before starting to design these messages, stop and think, and let's compare the control dialog, and the data flow:
* The control dialog lasts a short time and involve very few messages. The data flow could last for hours or days, and involve billions of messages.
* The control dialog is where all the "normal" errors happen, e.g. not authenticated, not found, payment required, censored, etc. Any errors that happen during the data flow are exceptional (disk full, server crashed).
* The control dialog is where things will change over time, as we add more options, parameters, and so on. The data flow should barely change over time since the semantics of a resource are fairly constant over time.
* The control dialog is essentially a synchronous request/reply dialog. The data flow is essentially a 1-way asynchronous flow.
These differences are critical. When we talk about performance, it applies //only// to data flows. It's pathological to design a one-time control dialog to be fast. When we talk about the cost of serialization, thus, this only applies to the data flow. The cost of encoding/decoding the control flow could be huge, and for many cases it would not change a thing. So, we encode control using "Cheap", and we encode data flows using "Nasty".
Cheap is essentially synchronous, verbose, descriptive, and flexible. A Cheap message is full of rich information that can change for each application. Your goal as designer is to make this information easy to encode and to parse, trivial to extend for experimentation or growth, and highly robust against change both forwards and backwards. The Cheap part of a protocol looks like this:
* It uses a simple self-describing structured encoding for data, be it XML, JSON, HTTP-style headers, or some other. Any encoding is fine so long as there are standard simple parsers for it in your target languages.
* It uses a straight request-reply model where each request has a success/failure reply. This makes it trivial to write correct clients and servers for a Cheap dialog.
* It doesn't try, even marginally, to be fast. Performance doesn't matter when you do something once or a few times per session.
A Cheap parser is something you take off the shelf, and throw data at. It shouldn't crash, shouldn't leak memory, should be highly tolerant, and should be relatively simple to work with. That's it.
Nasty however is essentially asynchronous, terse, silent, and inflexible. A Nasty message carries minimal information that practically never changes. Your goal as designer is to make this information ultrafast to parse, and possibly even impossible to extend and experiment with. The ideal Nasty pattern looks like this:
* It uses a hand-optimized binary layout for data, where every bit is precisely crafted.
* It uses a pure asynchronous model where one or both peers send data without acknowledgments (or if they do, they use sneaky asynchronous techniques like credit-based flow control).
* It doesn't try, even marginally, to be friendly. Performance is all that matters when you are doing something several million times per second.
A Nasty parser is something you write by hand, which writes or reads bits, bytes, words, and integers individually and precisely. It rejects anything it doesn't like, does no memory allocations at all, and never crashes.
Cheap and Nasty isn't a universal pattern; not all protocols have this dichotomy. Also, how you use Cheap and Nasty will depend. In some cases, it can be two parts of a single protocol. In other cases it can be two protocols, one layered on top of the other.
++++ 0MQ Framing
The simplest and most widely used serialization format for 0MQ applications is 0MQ's own multi-part framing. For example, here is how the [ Majordomo Protocol] defines a request:
Frame 0: Empty frame
Frame 1: "MDPW01" (six bytes, representing MDP/Worker v0.1)
Frame 2: 0x02 (one byte, representing REQUEST)
Frame 3: Client address (envelope stack)
Frame 4: Empty (zero bytes, envelope delimiter)
Frames 5+: Request body (opaque binary)
To read and write this in code is easy. But this is a classic example of a control flow (the whole of MDP is, really, since it's a chatty request-reply protocol). When we came to improve MDP for the second version, we had to change this framing. Excellent, we broke all existing implementations!
Backwards compatibility is hard, but using 0MQ framing for control flows //does not help//. Here's how I should have designed this protocol if I'd followed by own advice (and I'll fix this in the next version). It's split into a Cheap part and a Nasty part, and uses the 0MQ framing to separate these:
Frame 0: "MDP/2.0" for protocol name and version
Frame 1: command header
Frame 2: command body
Where we'd expect the parse the command header in the various intermediaries (client API, broker, and worker API), and pass the command body untouched from application to application.
++++ Serialization Languages
Serialization languages have their fashions. XML used to be big as in popular, then it got big as in over-engineered, and then it fell into the hands of "Enterprise Information Architects" and it's not been seen alive since. Today's XML is the epitome of "somewhere in that mess is small, elegant language trying to escape".
Still XML, was way, way better than its predecessors which included such monsters as the Standard Generalized Markup Language (SGML), which in turn were a cool breeze compared to mind-torturing beasts like EDIFACT. So the history of serialization languages seems to be of gradually emerging sanity, hidden by waves of revolting EIAs doing their best to hold onto their jobs.
JSON popped out of the JavaScript world as a quick-and-dirty "I'd rather resign than use XML here" way to throw data onto the wire and get it back again. JSON is just minimal XML expressed, sneakily, as JavaScript source code.
Here's a simple example of using JSON in a Cheap protocol:
"protocol": {
"name": "MTL",
"version": 1
"virtual-host": "test-env"
The same in XML would be (XML forces us to invent a single top-level entity):
<protocol name = "MTL" version = "1" />
And using plain-old HTTP-style headers:
Protocol: MTL/1.0
Virtual-host: test-env
These are all pretty equivalent so long as you don't go overboard with validating parsers, schemas and such "trust us, this is all for your own good" nonsense. A Cheap serialization language gives you space for experimentation for free ("ignore any elements/attributes/headers that you don't recognize"), and it's simple to write generic parsers that e.g. thunk a command into a hash table, or vice-versa.
However it's not all roses. While modern scripting languages support JSON and XML easily enough, older languages do not. If you use XML or JSON, you create non-trivial dependencies. It's also somewhat of a pain to work with tree-structured data in a language like C.
So you can drive your choice according to the languages you're aiming for. If your universe is a scripting language then go for JSON. If you are aiming to build protocols for wider system use, keep things simple for C developers and stick to HTTP-style headers.
++++ Serialization Libraries
The site says, "//It's like JSON. but fast and small. MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JSON but it's faster and smaller. For example, small integers (like flags or error code) are encoded into a single byte, and typical short strings only require an extra byte in addition to the strings themselves.//"
I'm going to make the perhaps unpopular claim that "fast and small" are features that solve non-problems. The only real problem that serialization libraries solve is, as far as I can tell, the need to document the message contracts and actually serialize data to and from the wire.
Let's start with "fast and small". It's based on a two-part argument. First, that making your messages smaller, and that reducing CPU cost for encoding and decoding will make a significant different to your application's performance. Second, that this equally valid across-the-board to all messages.
But most real applications tend to fall into one of two categories. Either the speed of serialization and size of encoding is marginal compared to other costs, such as database access or application code performance. Or, network performance really is critical, and then all significant costs occur in a few specific message types.
Thus, aiming for "fast and small" across the board is a false optimization. You neither get the easy flexibility of Cheap for your infrequent control flows, nor do you get the brutal efficiency of Nasty for your high-volume data flows. Worse, the assumption that all messages are equal in some way can corrupt your protocol design. Cheap and Nasty isn't only about serialization strategies, it's also about synchronous vs. asynchronous, error handling, and the cost of change.
My experience is that most performance problems in message-based applications can be solved by (a) improving the application itself and (b) hand-optimizing the high-volume data flows. And to hand-optimize your most critical data flows, you need to cheat, know and exploit facts about your data, which is something general-purpose serializers cannot do.
Now to documentation: the need to write our contracts explicitly and formally, not in code. This is a valid problem to solve, indeed one of the main ones if we're to build a long-lasting large-scale message-based architecture.
Here is how we describe a typical message using the MessagePack IDL:
message Person {
1: string surname
2: string firstname
3: optional string email
Now, the same message using the protobufs IDL:
message Person {
required string surname = 1;
required string firstname = 2;
optional string email = 3;
It works but in most practical cases, wins you little over a serialization language backed by decent specifications written by hand or produced mechanically (we'll come to this). The price you'll pay is an extra dependency, and quite probably, worse overall performance than if you used Cheap and Nasty.
++++ Hand-written Binary Serialization
As you'll gather from this book, my preferred language for systems programming is C (upgraded to C99, with a constructor/destructor API model and generic containers). There are two reasons I like this modernized C language: firstly, I'm too weak-minded to learn a big language like C++. Life just seems filled with more interesting things to understand. Secondly, I find that this specific level of manual control lets me produce better results, and faster.
The point here isn't C vs. C++ but the value of manual control for high-end professional users. It's no accident that the best cars and cameras and espresso machines in the world have manual controls. That level of on-the-spot fine-tuning often makes the difference between world-class success, and second-best.
When you are really, truly, concerned about the speed of serialization and/or the size of the result (often these contradict each other), you need hand-written binary serialization, in other words, let's hear it for Mr. Nasty!
Your basic process for writing an efficient Nasty encoder/decoder (codec) is:
* Build representative data sets and test applications that can stress-test your codec.
* Write a first dumb version of the codec.
* Test, measure, improve, and repeat until you run out of time and/or money.
Here are some of the techniques we use to make our codecs better:
* //Use a profiler.// There's simply no way to know what your code is doing until you've profiled it, for function counts and for CPU cost per function. Once you find your hot-spots, fix them.
* //Eliminate memory allocations.// On a modern Linux kernel the heap is very fast, but it's still the bottleneck in most naive codecs. On older kernels the heap can be tragically slow. Use local variables (the stack) instead of the heap where you can.
* //Test on different platforms and with different compilers and compiler options.// Apart from the heap, there are many other differences. You need to learn the main ones, and allow for these.
* //Use state to compress better.// If you are concerned about codec performance, you are almost definitely sending the same kinds of data many times. There will be redundancy between instances of data. You can detect these, and use that to compress (e.g. a short value that means "same as last time").
* //Know your data.// The best compression techniques (in terms of CPU cost for compactness) require knowing about the data. For example the techniques to compress a word list, a video, and a stream of stock market data are all different.
* //Be ready to break the rules.// Do you really need to encode integers in big-endian network byte order? x86 and ARM account for almost all modern CPUs, yet use little-endian (ARM is actually bi-endian but Android, like Windows and iOS, is little-endian).
++++ Code Generation
Reading the previous two sections, you might have wondered, "could I write my own IDL generator that was better than a general-purpose one?" If this thought wandered into your mind, it probably left pretty soon after, chased by dark calculations about how much work that actually involved.
What if I told you of a way to build custom IDL generators cheaply and quickly? A way to get perfectly documented contracts, code that is as evil and domain-specific as you need, and all you need to do is sign away your soul (//who ever really used that, amirite?//) right here...
At iMatix, until a few years ago, we used code generation to build ever larger and more ambitious systems until we decided the technology (GSL) was too dangerous for common use, and we sealed the archive and locked it, with heavy chains, in a deep dungeon. Well, we actually posted it on github. If you want to try the examples that are coming up, grab [ the repository] and build yourself a {{gsl}} command. Typing "make" in the src subdirectory should do it (and if you're that guy who loves Windows, I'm sure you'll send a patch with project files).
This section isn't really about GSL at all, but about a useful and little-known trick that's useful for ambitious architects who want to scale themselves, as well as their work. Once you learn the trick is, you can whip up your own code generators in a short time. The code generators most software engineers know about come with a single hard-coded model. For instance, Ragel //"compiles executable finite state machines from regular languages"//, i.e. Ragel's model is a regular language. This certainly works for a good set of problems but it's far from universal. How do you describe an API in Ragel? Or a project makefile? Or even a finite-state machine like the one we used to design the Binary Star pattern in Chapter 4?
All these would benefit from code generation, but there's no universal model. So the trick is to design your own models as you need them, then make code generators as cheap compilers for that model. You need some experience in how to make good models, and you need a technology that makes it cheap to build custom code generators. Scripting languages like Perl and Python are a good option. However we actually built GSL specifically for this, and that's what I prefer.
Let's take a simple example that ties into what we already know. We'll see more extensive examples later, because I really do believe that code generation is crucial knowledge for large-scale work. In Chapter 4, we developed the [ Majordomo Protocol (MDP)], and wrote clients, brokers, and workers for that. Now could we generate those pieces mechanically, by building our own interface description language and code generators?
When we write a GSL model, we can use //any// semantics we like, in other words we can invent domain-specific languages on the spot. I'll invent a couple - see if you can guess what they represent:
name = Cookery level 3
title = French Cuisine
item = Overview
item = The historical cuisine
item = The nouvelle cuisine
item = Why the French live longer
title = Overview
item = Soups and salads
item = Le plat principal
item = Béchamel and other sauces
item = Pastries, cakes, and quiches
item = Soufflé - cheese to strawberry
How about this one:
name = person
name = firstname
type = string
name = lastname
type = string
name = rating
type = integer
The first we could compile into a presentation. The second, into SQL to create and work with a database table. So for this exercise our domain language, our model, consists of "classes" that contain "messages" that contain "fields" of various types. It's deliberately familiar. Here is the MDP client protocol:
<class name = "mdp_client">
<field name = "empty" type = "string" value = ""
>Empty frame</field>
<field name = "protocol" type = "string" value = "MDPC01"
>Protocol identifier</field>
<message name = "request">
Client request to broker
<field name = "service" type = "string">Service name</field>
<field name = "body" type = "frame">Request body</field>
<message name = "reply">
Response back to client
<field name = "service" type = "string">Service name</field>
<field name = "body" type = "frame">Response body</field>
And here is the MDP worker protocol:
<class name = "mdp_worker">
<field name = "empty" type = "string" value = ""
>Empty frame</field>
<field name = "protocol" type = "string" value = "MDPW01"
>Protocol identifier</field>
<field name = "id" type = "octet">Message identifier</field>
<message name = "ready" id = "1">
Worker tells broker it is ready
<field name = "service" type = "string">Service name</field>
<message name = "request" id = "2">
Client request to broker
<field name = "client" type = "frame">Client address</field>
<field name = "body" type = "frame">Request body</field>
<message name = "reply" id = "3">
Worker returns reply to broker
<field name = "client" type = "frame">Client address</field>
<field name = "body" type = "frame">Request body</field>
<message name = "hearbeat" id = "4">
Either peer tells the other it's still alive
<message name = "disconnect" id = "5">
Either peer tells other the party is over
GSL uses XML as its modeling language. XML has a poor reputation, having been dragged through too many enterprise sewers to smell sweet, but it has some strong positives, as long as you keep it simple. Any way to write a self-describing hierarchy of items and attributes would work.
Now here is a short IDL generator written in GSL that turns our protocol models into documentation:
.# Trivial IDL generator (specs.gsl)
.output "$("
## The $(string.trim (class.?''):left) Protocol
.for message
. frames = count (class->header.field) + count (field)
A $(message.NAME) command consists of a multi-part message of $(frames)
. for class->header.field
. if name = "id"
* Frame $(item ()): 0x$( (1 byte, $(message.NAME))
. else
* Frame $(item ()): "$(value:)" ($(string.length ("$(value)")) \
bytes, $(field.:))
. endif
. endfor
. index = count (class->header.field) + 1
. for field
* Frame $(index): $(field.?'') \
. if type = "string"
(printable string)
. elsif type = "frame"
(opaque binary)
. index += 1
. else
. echo "E: unknown field type: $(type)"
. endif
. index += 1
. endfor
The XML models and this script are in the subdirectory examples/Chapter6. To do the code generation I give this command:
gsl -script:specs mdp_client.xml mdp_worker.xml
Here is the Markdown text we get for the worker protocol:
## The MDP/Worker Protocol
A READY command consists of a multi-part message of 4
* Frame 1: "" (0 bytes, Empty frame)
* Frame 2: "MDPW01" (6 bytes, Protocol identifier)
* Frame 3: 0x01 (1 byte, READY)
* Frame 4: Service name (printable string)
A REQUEST command consists of a multi-part message of 5
* Frame 1: "" (0 bytes, Empty frame)
* Frame 2: "MDPW01" (6 bytes, Protocol identifier)
* Frame 3: 0x02 (1 byte, REQUEST)
* Frame 4: Client address (opaque binary)
* Frame 6: Request body (opaque binary)
A REPLY command consists of a multi-part message of 5
* Frame 1: "" (0 bytes, Empty frame)
* Frame 2: "MDPW01" (6 bytes, Protocol identifier)
* Frame 3: 0x03 (1 byte, REPLY)
* Frame 4: Client address (opaque binary)
* Frame 6: Request body (opaque binary)
A HEARBEAT command consists of a multi-part message of 3
* Frame 1: "" (0 bytes, Empty frame)
* Frame 2: "MDPW01" (6 bytes, Protocol identifier)
* Frame 3: 0x04 (1 byte, HEARBEAT)
A DISCONNECT command consists of a multi-part message of 3
* Frame 1: "" (0 bytes, Empty frame)
* Frame 2: "MDPW01" (6 bytes, Protocol identifier)
* Frame 3: 0x05 (1 byte, DISCONNECT)
Which as you can see is close to what I wrote by hand in the original spec. Now, if you have cloned the Guide repository and you are looking at the code in examples/Chapter6, you can generate the MDP client and worker codecs. We pass the same two models to a different code generator:
gsl -script:codec_c mdp_client.xml mdp_worker.xml
Which gives us mdp_client and mdp_worker classes. Actually MDP is so simple that it's barely worth the effort of writing the code generator. The profit comes when we want to change the protocol (which we did for the standalone Majordomo project). You modify the protocol, run the command, and out pops more perfect code.
The {{codec_c.gsl}} code generator is not short, but the resulting codecs are much better than the hand-written code I originally put together for Majordomo. For instance the hand-written code had no error checking, and would die if you passed it bogus messages.
I'm now going to explain the pros and cons of GSL-powered model-oriented code generation. Power does not come for free and one of the greatest traps in our business is the ability to invent concepts out of thin air. GSL makes this particularly easy, so can be a particularly dangerous tool.
//Do not invent concepts//. The job of a designer is to remove problems, not to add features.
So, first, the advantages of model-oriented code generation:
* You can create 'perfect' abstractions that map to your real world. So, our protocol model maps 100% to the 'real world' of Majordomo. This would be impossible without the freedom to tune and change the model in any way.
* You can develop these perfect models quickly and cheaply.
* You can generate //any// text output. From a single model you can create documentation, code in any language, test tools, literally any output you can think of.
* You can generate (and I mean this literally) //perfect// output since it's cheap to improve your code generators to any level you want.
* You get a single source that combines specifications and semantics.
* You can leverage a small team to a massive size. At iMatix we produced the million-line OpenAMQ messaging product out of perhaps 85K lines of input models, including the code generation scripts themselves.
Now the disadvantages:
* You add tool dependencies to your project.
* You may get carried away and create models for the pure joy of creating them.
* You may alienate newcomers to your work, who will see "strange stuff".
* You may give people a strong excuse to not invest in your project.
Cynically, model-oriented abuse works great in environments where you want to produce huge amounts of perfect code that you can maintain with little effort, and which //no-one can ever take away from you.// Personally, I like to cross my rivers and move on. But if long-term job security is your thing, this is almost perfect.
So if you do use GSL and want to create open communities around your work, here is my advice:
* Use only where you would otherwise be writing tiresome code by hand.
* Design natural models that are what people would expect to see.
* Write the code by hand first so you know what to generate.
* Do not overuse. Keep it simple! //Do not get too meta!!//
* Introduce gradually into a project.
* Put the generated code into your repositories.
We're already using GSL in some projects around 0MQ, for example the high-level C binding, CZMQ, uses GSL to generate the socket options class (zsockopt). A 300-line code generator turns 78 lines of XML model into 1,500 lines of perfect but really boring code. That's a good win.
+++ Transferring Files
Let's take a break from the lecturing and get back to our first love and the reason for doing all of this: code.
"How do I send a file?" is a common question on the 0MQ mailing lists. Not surprising, because file transfer is perhaps the oldest and most obvious type of messaging. Sending files around networks has lots of use-cases apart from annoying the copyright cartels. 0MQ is very good, out of the box, at sending events and tasks but less good at sending files.
I've promised, for a year or two, to write a proper explanation. Here's a gratuitous piece of information to brighten your morning: the word "proper" comes from the archaic French "propre" which means "clean". The dark age English common folk, not being familiar with hot water and soap, changed the word to mean "foreign" or "upper-class", as in "that's proper food!" but later the word meant just "real", as in "that's a proper mess you've gotten us into!"
So, file transfer. There are several reasons you can't just pick up a random file, blindfold it, and shove it whole into a message. The most obvious being that despite decades of determined growth in RAM sizes (and who among us old-timers doesn't fondly remember saving up for that 1,014-byte memory extension card?!), disk sizes obstinately remain much larger. Even if we could send a file with one instruction (say, using a system call like sendfile), we'd hit the reality that networks are not infinitely fast, nor perfectly reliable. After trying to upload a large file several times on a slow flaky network (WiFi, anyone?), you'll realize that a proper file transfer protocol needs a way to recover from failures. That is, a way to send only the part of a file that wasn't yet received.
Finally, after all this, if you build a proper file server, you'll notice that simply sending massive amounts of data to lots of clients creates that situation we like to call, in the technical parlance, "//server went belly-up due to all available heap memory being eaten by a poorly-designed application//". A proper file transfer protocol needs to pay attention to memory use.
We'll solve these problems properly, one by one, which should hopefully get us to a good and proper file transfer protocol running over 0MQ. First, let's generate a 1GB test file with random data (real power-of-two-giga-like-Von-Neumman-intended, not the fake silicon ones the memory industry likes to sell):
dd if=/dev/urandom of=testdata bs=1M count=1024
This is large enough to be troublesome when we have lots of clients asking for the same file at once, and on many machines, 1GB is going to be too large to allocate in memory anyhow. As a base reference, let's measure how long it takes to copy this file from disk back to disk. This will tell us how much our file transfer protocol adds on top (including 'network' costs):
$ time cp testdata testdata2
real 0m7.143s
user 0m0.012s
sys 0m1.188s
The 4-figure precision is misleading; expect variations of 25% either way. This is just an "order of magnitude" measurement.
Here's our first cut at the code, where the client asks for the test data and the server just sends it, without stopping for breath, as a series of messages, where each message holds one 'chunk':
[[code type="example" title="File transfer test, model 1" name="fileio1"]]
It's pretty simple but we already run into a problem: if we send too much data to the ROUTER socket, we can easily overflow it. The simple but stupid solution is to put an infinite high-water mark on the socket. It's stupid because we now have no protection against exhausting the server's memory. Yet without an infinite HWM we risk losing chunks of large files.
Try this: set the HWM to 1,000 (in 0MQ/3.x this is the default) and then reduce the chunk size to 100K so we send 10K chunks in one go. Run the test, and you'll see it never finishes. As the zmq_socket[3] man page says with cheerful brutality, for the ROUTER socket: "//ZMQ_HWM option action: Drop//".
We have to control the amount of data the server sends up-front. There's no point in it sending more than the network can handle. Let's try sending one chunk at a time. In this version of the protocol, the client will explicitly say,"give me chunk N", and the server will fetch that specific chunk from disk and send it.
Here's the improved second model, where the client asks for one chunk at a time, and the server only sends one chunk for each request it gets from the client:
[[code type="example" title="File transfer test, model 2" name="fileio2"]]
It is much slower now, because of the to-and-fro chatting between client and server. We pay about 300 microseconds for each request-reply round-trips, on a local loop connection (client and server on the same box). It doesn't sound like much but it adds up quickly:
$ time ./fileio1
4296 chunks received, 1073741824 bytes
real 0m0.669s
user 0m0.056s
sys 0m1.048s
$ time ./fileio2
4295 chunks received, 1073741824 bytes
real 0m2.389s
user 0m0.312s
sys 0m2.136s
There are two valuable lessons here. First, while request-reply is easy, it's also too slow for high-volume data flows. Paying that 300 microseconds once would be fine. Paying it for every single chunk isn't acceptable, particularly on real networks with latencies of perhaps 1,000 times higher.
The second point is something I've said before but will repeat: it's incredibly easy to experiment, measure, and improve our protocols over 0MQ. And when the cost of something comes way down, you can afford a lot more of it. Do learn to develop and prove your protocols in isolation: I've seen teams waste time trying to improve poorly-designed protocols that are too deeply embedded in applications to be easily testable or fixable.
Our model 2 file transfer protocol isn't so bad, apart from performance:
* It completely eliminates any risk of memory exhaustion. To prove that we set the high-water mark to 1 in both sender and receiver.
* It lets the client choose the chunk size, which is useful because if there's any tuning of the chunk size to be done, for network conditions, for file types, or to reduce memory consumption further, it's the client that should be doing this.
* It gives us fully restartable file transfers.
* It allows the client to cancel the file transfer at any point in time.
If we just didn't have to do a request for each chunk, it'd be a usable protocol. What we need is a way for the server to send multiple chunks, without waiting for the client to request or acknowledge each one. What are the options?
* The server could send 10 chunks at once, then wait for a single acknowledgment. That's exactly like multiplying the chunk size by 10, so pointless. And yes, it's just as pointless for all values of 10.
* The server could send chunks without any chatter from the client but with a slight delay between each send, so that it would send chunks only as fast as the network could handle them. This would require the server to know what's happening at the network layer, which sounds like hard work. It also breaks layering horribly. And what happens if the network is really fast but the client itself is slow? Where are chunks queued then?
* The server could try to spy on the sending queue, i.e. see how full it is, and send only when the queue isn't full. Well, 0MQ doesn't allow that because it doesn't work, for the same reason as throttling doesn't work. The server and network may be more than fast enough, but the client may be a slow little device.
* We could modify libzmq to take some other action on reaching HWM. Perhaps it could block? That would mean that a single slow client would block the whole server, so no thank you. Maybe it could return an error to the caller? Then the server could do something smart like... well, there isn't really anything it could do that's any better than dropping the message.
Apart from being complex and variously unpleasant, none of these options would even work. What we need is a way for the client to tell the server, asynchronously and in the background, that it's ready for more. Some kind of asynchronous flow control. If we do this right, data should flow without interruption from the server to the client, but only as long as the client is reading it. Let's review our three protocols. This was the first one:
C: fetch
S: chunk 1
S: chunk 2
S: chunk 3
And the second introduced a request for each chunk:
C: fetch chunk 1
S: send chunk 1
C: fetch chunk 2
S: send chunk 2
C: fetch chunk 3
S: send chunk 3
C: fetch chunk 4
Now - waves hands mysteriously - here's a changed protocol that fixes the performance problem:
C: fetch chunk 1
C: fetch chunk 2
C: fetch chunk 3
S: send chunk 1
C: fetch chunk 4
S: send chunk 2
S: send chunk 3
It looks suspiciously similar. In fact it's identical except that we send multiple requests without waiting for a reply for each one. This is a technique called "pipelining" and it works because our DEALER and ROUTER sockets are fully asynchronous.
Here's the third model of our file transfer test-bench, with pipelining. The client sends a number of requests ahead (the "credit") and then each time it processes an incoming chunk, it sends one more credit. The server will never send more chunks than the client has asked for:
[[code type="example" title="File transfer test, model 3" name="fileio3"]]
What we've achieved here, with a little magic, is to take control of the end-to-end pipeline including all network buffers and 0MQ queues at sender and receiver, and then ensure that pipeline is always filled with data while never growing beyond a predefined limit. More than that, the client decides exactly when to send "credit" to the sender. It could be when it receives a chunk, or when it has fully processed a chunk. And this happens asynchronously, with no significant performance cost.
In the third model I chose a pipeline size of 10 messages (each message is a chunk). This will cost a maximum of 2.5MB memory per client. So with 1GB of memory we can handle at least 400 clients. We can try to calculate the ideal pipeline size. It takes about 0.7 seconds to send the 1GB file, which is about 160 microseconds for a chunk. A round trip is 300 microseconds, so the pipeline needs to be at least 3-5 to keep the server busy. In practice, I still got performance spikes with a pipeline of 5, probably because the credit messages sometimes get delayed by outgoing data. So at 10, it works consistently.
$ time ./fileio3
4291 chunks received, 1072741824 bytes
real 0m0.777s
user 0m0.096s
sys 0m1.120s
Do measure rigorously. Your calculations may be good but the real world tends to have its own opinions.
What we've made is clearly not yet a real file transfer protocol, but it proves the pattern and I think it is the simplest plausible design. For a real working protocol we'd want to add some or all of:
* Authentication and access controls, even without encryption: the point isn't to protect sensitive data but to catch errors like sending test data to production servers.
* A Cheap-style request including file path, optional compression, and other stuff we've learned is useful from HTTP (such as If-Modified-Since).
* A Cheap-style response, at least for the first chunk, that provides meta data such as file size (so the client can pre-allocate and avoid unpleasant disk-full situations).
* The ability to fetch a set of files in one go, otherwise the protocol becomes inefficient for large sets of small files.
* Confirmation from the client when it's fully received a file, to recover from chunks that might be lost of the client disconnects unexpectedly.
So far, our semantic has been "fetch"; that is, the recipient knows (somehow), that they need a specific file, so they ask for it. The knowledge of which files exist, and where they are is then passed out-of-band (e.g. in HTTP, by links in the HTML page).
How about a "push" semantic? There are two plausible use-cases for this. First, if we adopt a centralized architecture with files on a main "server" (not something I'm advocating, but people do sometimes like this), then it's very useful to allow clients to upload files to the server. Second, it lets do a kind of pub-sub for files, where the client asks for all new files of some type; as the server gets these, it forwards them to the client.
A fetch semantic is synchronous, while a push semantic is asynchronous. Asynchronous is less chatty, so faster. Also, you can do cute things like "//subscribe to this path//" so creating a publish-subscribe file transfer architecture. That is so obviously awesome that I shouldn't need to explain what problem it solves.
Still, here is the problem with the fetch semantic: that out-of-band route to tell clients what files exist. No matter how you do this, it ends up complex. Either clients have to poll, or you need a separate pub-sub channel to keep clients up to date, or you need user interaction.
Thinking this through a little more, though, we can see that fetch is just a special case of publish-subscribe. So we can get the best of both worlds. Here is the general design:
* Fetch this path
* Here is credit (repeat)
To make this work (and we will, my dear readers), we need to be a little more explicit about how we send credit to the server. The cute trick of treating a pipelined "fetch chunk" request as credit won't fly since the client doesn't know any longer what files actually exist, how large they are, anything. If the client says, "I'm good for 250,000 bytes of data", this should work equally for one file of 250K bytes, or 100 files of 2,500 bytes.
And this gives us "credit-based flow control", which effectively removes the need for HWMs, and any risk of memory overflow.
+++ Heartbeating
Just as a real protocol needs to solve the problem of flow control, it also needs to solve the problem of knowing whether a peer is alive or dead. This is not an issue specific to 0MQ. TCP has a long timeout (30 minutes or so), that means that it can be impossible to know whether a peer has died, been disconnected, or gone on a weekend to Prague with a case of vodka, a redhead, and a large expense account.
Heartbeating is not easy to get right, and as with flow control it can make the difference between a working, and failing architecture. So using our standard approach, let's start with the simplest possible heartbeat design, and develop better and better designs until we have one with no visible faults.
++++ Shrugging It Off
A decent first iteration is to do no heartbeating at all and see what actually happens. Many if not most 0MQ applications do this. 0MQ encourages this by hiding peers in many cases. What problems does this approach cause?
* When we use a ROUTER socket in an application that tracks peers, as peers disconnect and reconnect, the application will leak memory and get slower and slower.
* When we use SUB or DEALER-based data recipients, we can't tell the difference between good silence (there's no data) and bad silence (the other end died). When a recipient knows the other side died, it can for example switch over to a backup route.
* If we use a TCP connection that stays silent for a long while, it will, in some networks, just die. Sending something (technically, a "keep-alive" more than a heartbeat), will keep the network alive.
++++ One-Way Heartbeats
So, our first solution is to sending a "heartbeat" message from each node to its peers, every second or so. When one node hears nothing from another, within some timeout (several seconds, typically), it will treat that peer as dead. Sounds good, right? Sadly no. This works in some cases but has nasty edge cases in other cases.
For PUB-SUB, this does work, and it's the only model you can use. SUB sockets cannot talk back to PUB sockets, but PUB sockets can happily send "I'm alive" messages to their subscribers.
As an optimization, you can send heartbeats only when there is no real data to send. Furthermore, you can send heartbeats progressively slower and slower, if network activity is an issue (e.g. on mobile networks where activity drains the battery). As long as the recipient can detect a failure (sharp stop in activity), that's fine.
Now the typical problems with this design:
* It can be inaccurate when we send large amounts of data, since heartbeats will be delayed behind that data. If heartbeats are delayed, you can get false timeouts and disconnections due to network congestion. Thus, always treat //any// incoming data as a heartbeat, whether or not the sender optimizes out heartbeats.
* While the PUB-SUB pattern will drop messages for disappeared recipients, PUSH and DEALER sockets will queue them. So, if you send heartbeats to a dead peer, and it comes back, it'll get all the heartbeats you sent. Which can be thousands. Whoa, whoa!
* This design assumes that heartbeat timeouts are the same across the whole network. But that won't be accurate. Some peers will want very aggressive heart-beating, to detect faults rapidly. And some will want very relaxed heart-beating, to let sleeping networks lie, and save power.
++++ Ping-Pong Heartbeats
Our third design uses a ping-pong dialog. One peer sends a ping command to the other, which replies with a pong command. Neither command has any payload. Pings and pongs are not correlated. Since the roles of "client" and "server" are often arbitrary, we specify that either peer can in fact send a ping and expect a pong in response. However, since the timeouts depend on network topologies known best to dynamic clients, it is usually the client which pings the server.
This works for all ROUTER-based brokers. The same optimizations we used in the second model make this work even better: treat any incoming data as a pong, and only send a ping when not otherwise sending data.
+++ State Machines
Software engineers tend to treat (finite) state machines as a kind of intermediary interpreter. That is, you take a regular language and compile that into a state machine, then execute the state machine. The state machine itself is rarely visible to the developer: it's an internal representation, optimized, compressed, and bizarre.
However it turns out that state machines are also valuable as a first-class modeling languages for protocol handlers, i.e. 0MQ clients and servers. 0MQ makes it rather easy to design protocols, but we've never defined a good pattern for writing those clients and servers properly.
A protocol has at least two levels:
* How we represent individual messages on the wire.
* How messages flow between peers, and the significance of each message.
We've seen in this chapter how to produce codecs that handle serialization. That's a good start. But if we leave the second job to developers, that gives them a lot of room to interpret. As we make more ambitious protocols (file transfer + heart-beating + credit + authentication), it becomes less and less sane to try to implement clients and servers by hand.
Yes, people do this almost systematically. But the costs are high, and they're avoidable. I'll explain how to model protocols using state machines, and how to generate neat and solid code from those models.
My experience with using state machines as a software construction tool dates to 1985 and my first real job making tools for application developers. In 1991 I turned that knowledge into a free software tool called Libero, which spat out executable state machines from a simple text model.
The thing about Libero's model was that it was readable. That is, you described your program logic as named states, each accepting a set of events, each doing some real work. The resulting state machine hooked into your application code, driving it like a boss.
Libero was charmingly good at its job, fluent in many languages, and modestly popular given the enigmatic nature of state machines. We used Libero in anger in dozens of large distributed applications, one of which was finally switched off in 2011. State-machine driven code construction worked so well that it's somewhat impressive this approach never hit the mainstream of software engineering.
So in this section I'm going to explain Libero's model, and show how to use it to generate 0MQ clients and servers. We'll use GSL again but like I said, the principles are general and you can put together code generators using any scripting language.
As a worked example let's see how to carry-on a stateful dialog with a peer on a ROUTER socket. We'll develop the server using a state machine (and the client by hand). We have a simple protocol that I'll call "NOM". I'm using the oh-so-very-serious [ keywords for unprotocols] proposal:
nom-protocol = open-peering *use-peering
open-peering = C:OHAI ( S:OHAI-OK / S:WTF )
use-peering = C:ICANHAZ
I've not found a quick way to explain the true nature of state machine programming. In my experience, it invariably takes a few days of practice. After three or four days' exposure to the idea there is a near-audible 'click!' as something in the brain connects all the pieces together. We'll make it concrete by looking at the state machine for our NOM server.
A useful thing about state machines is that you can read them state by state. Each state has a unique descriptive name, and one or more //events//, which we list in any order. For each event we perform zero or more //actions//, and we then move to a //next state// (or stay in the same state).
In a 0MQ protocol server, we have a state machine instance //per client//. That sounds complex but it isn't, as we'll see. We describe our first state (Start) as having one valid event, "OHAI". We check the user's credentials and then arrive in the Authenticated state!figref().
[[code type="textdiagram" title="The 'Start' State"]]
| Start |
+-------------+ /-------------------\
| OHAI |------------------->| Authenticated |
+-------------+ \-------------------/
Check Credentials
The Check Credentials action produces either an 'ok' or an 'error' event. It's in the Authenticated state that we handle these two possible events, by sending an appropriate reply back to the client!figref(). If authentication failed, we return to the Start state where the client can try again.
[[code type="textdiagram" title="The 'Authenticated' State"]]
| Authenticated |
+-------------+ /-------------------\
| ok |------------------->| Ready |
+-------------+ \-------------------/
| Send OHAI-OK
+-------------+ /-------------------\
| error |------------------->| Start |
+-------------+ \-------------------/
Send WTF
When authentication has succeeded, we arrive in the Ready state. Here we have three possible events: an ICANHAZ or HUGZ message from the client, or a heartbeat timer event!figref().
[[code type="textdiagram" title="The 'Ready' State"]]
| Ready |
+-------------+ /-------------------\
| ICANHAZ |------------------->| Ready |
+-------------+ \-------------------/
+-------------+ /-------------------\
| HUGZ |------------------->| Ready |
+-------------+ \-------------------/
| Send HUGZ-OK
+-------------+ /-------------------\
| heartbeat |------------------->| Ready |
+-------------+ \-------------------/
There are a few more things about this state machine model that are worth knowing:
* Events in upper case (like "HUGZ") are 'external events' that come from the client as messages.
* Events in lower case (like "heartbeat") are 'internal events', produced by code in the server.
* The "Send SOMETHING" actions are shorthand for sending a specific reply back to the client.
* Events that aren't defined in a particular state are silently ignored.
Now, the original source for these pretty pictures is an XML model:
<class name = "nom_server" script = "server_c">
<state name = "start">
<event name = "OHAI" next = "authenticated">
<action name = "check credentials" />
<state name = "authenticated">
<event name = "ok" next = "ready">
<action name = "send" message ="OHAI-OK" />
<event name = "error" next = "start">
<action name = "send" message = "WTF" />
<state name = "ready">
<event name = "ICANHAZ">
<action name = "send" message = "CHEEZBURGER" />
<event name = "HUGZ">
<action name = "send" message = "HUGZ-OK" />
<event name = "heartbeat">
<action name = "send" message = "HUGZ" />
The code generator is in examples/Chapter6/server_c.gsl. It is a fairly complete tool that I'll use and expand for more serious work later. It generates:
* A server class in C (nom_server.c, nom_server.h) that implements the whole protocol flow.
* A selftest method that runs the selftest steps listed in the XML file.
* Documentation in the form of graphics (the pretty pictures).
Here's a simple C main program that starts the generated NOM server:
[[code language="C"]]
#include "czmq.h"
#include "nom_server.h"
int main (int argc, char *argv [])
printf ("Starting NOM protocol server on port 6000...\n");
nom_server_t *server = nom_server_new ();
nom_server_bind (server, "tcp://*:6000");
nom_server_wait (server);
nom_server_destroy (&server);
return 0;
The generated nom_server class is a fairly classic model. It accepts client messages on a ROUTER socket. The first frame on every request is the client's identity. The server manages a set of clients, each with state. As messages arrive, it feeds these as 'events' to the state machine. Here's the core of the state machine, as a mix of GSL commands and the C code we intend to generate:
[[code language="C"]]
client_execute (client_t *self, int event)
self->next_event = event;
while (self->next_event) {
self->event = self->next_event;
self->next_event = 0;
switch (self->state) {
.for class.state
case $(name:c)_state:
. for event
. if index () > 1
. endif
if (self->event == $(name:c)_event) {
. for action
. if name = "send"
zmsg_addstr (self->reply, "$(message:)");
. else
$(name:c)_action (self);
. endif
. endfor
. if defined (
self->state = $(next:c)_state;
. endif
. endfor
if (zmsg_size (self->reply) > 1) {
zmsg_send (&self->reply, self->router);
self->reply = zmsg_new ();
zmsg_add (self->reply, zframe_dup (self->address));
Each client is held as an object with various properties, including the variables we need to represent a state machine instance:
[[code language="C"]]
event_t next_event; // Next event
state_t state; // Current state
event_t event; // Current event
You will see by now that we are generating technically-perfect code that has the precise design and shape we want. The only clue that the nom_server class isn't hand-written is that the code is //too good//. People who complain that code generators produce poor code are obviously used to poor code generators. It is trivial to extend our model as we need it. For example, here's how we generate the selftest code.
First, we add a "selftest" item to the state machine and write our tests. We're not using any XML grammar or validators so it really is just a matter of opening the editor and adding half-a-dozen lines of text:
<step send = "OHAI" body = "Sleepy" recv = "WTF" />
<step send = "OHAI" body = "Joe" recv = "OHAI-OK" />
<step send = "ICANHAZ" recv = "CHEEZBURGER" />
<step send = "HUGZ" recv = "HUGZ-OK" />
<step recv = "HUGZ" />
Designing on the fly, I decided that "send" and "recv" were a nice way to express "send this request, then expect this reply". Here's the GSL code that turns this model into real code:
.for class->selftest.step
. if defined (send)
msg = zmsg_new ();
zmsg_addstr (msg, "$(send:)");
. if defined (body)
zmsg_addstr (msg, "$(body:)");
. endif
zmsg_send (&msg, dealer);
. endif
. if defined (recv)
msg = zmsg_recv (dealer);
assert (msg);
command = zmsg_popstr (msg);
assert (streq (command, "$(recv:)"));
free (command);
zmsg_destroy (&msg);
. endif
Finally, one of the more tricky but absolutely essential parts of any state machine generator is //how do I plug this into my own code?// As a minimal example for this exercise I wanted to implement the "check credentials" action by accepting all OHAIs from my friend Joe (Hi Joe!) and reject everyone else's OHAIs. After some thought I decided to grab code directly from the state machine model. So in nom_server.xml, you'll see this:
<action name = "check credentials">
char *body = zmsg_popstr (self->request);
if (body && streq (body, "Joe"))
self->next_event = ok_event;
self->next_event = error_event;
free (body);
And the code generator grabs that custom code and inserts it into the generated nom_server.c file:
.for class.action
static void
$(name:c)_action (client_t *self) {
$(string.trim (.):)
And now we have something quite elegant: a single source file that describes my server state machine, and which also contains the native implementations for my actions. A nice mix of high-level and low-level that is about 90% smaller than the C code.
Beware, as your head spins with notions of all the amazing things you could produce with such leverage. While this approach gives you real power, it also moves you away from your peers, and if you go too far, you'll find yourself working alone.
By the way, this simple little state machine design exposes just three variables to our custom code:
* {{self->next_event}}
* {{self->request}}
* {{self->reply}}
In the Libero state machine model there are a few more concepts that we've not used here, but which we will need when we write larger state machines:
* Exceptions, which lets us write terser state machines. When an action raises an exception, further processing on the event stops. The state machine can then define how to handle exception events.
* Defaults state, where we can define default handling for events (especially useful for exception events).
+++ Authentication using SASL
When we designed AMQP in 2007, we chose [ SASL] for the authentication layer, one of the ideas we took from the BEEP protocol framework. SASL looks complex at first, but it's simple and fits very nicely into a 0MQ-based protocol. What I especially like about SASL is that it's scalable. You can start with anonymous access, or plain text authentication and no security, and grow to more secure mechanisms over time, without changing your protocol one bit.
I'm not going to give a deep explanation now, since we'll see SASL in action somewhat later. But I'll explain the principle so you're already somewhat prepared.
In the NOM protocol the client started with an OHAI command, which the server either accepted ("Hi Joe!") or rejected. This is simple but not scalable since server and client have to agree upfront what kind of authentication they're going to do.
What SASL introduced, and which is genius, is a fully abstracted and negotiable security layer that's still easy to implement at the protocol level. It works as follows:
* The client connects.
* The server challenges the client, passing a list of security "mechanisms" that it knows about.
* The client chooses a security mechanism that it knows about, and answers the server's challenge with a blob of opaque data that (and here's the neat trick) some generic security library calculates and gives to the client.
* The server takes the security mechanism the client choose, and that blob of data, and passes it to its own security library.
* The library either accepts the client's answer, or the server challenges again.
There are a number of free SASL libraries. When we come to real code, we'll implement just two mechanisms, ANONYMOUS and PLAIN, which don't need any special libraries.
To support SASL we have to add an optional challenge/response step to our "open-peering" flow. Here is what the resulting protocol grammar looks like (I'm modifying NOM to do this):
secure-nom = open-peering *use-peering
open-peering = C:OHAI *( S:ORLY C:YARLY ) ( S:OHAI-OK / S:WTF )
ORLY = 1*mechanism challenge
mechanism = string
challenge = *OCTET
YARLY = mechanism response
response = *OCTET
Where ORLY and YARLY contain a string (a list of mechanisms in ORLY, one mechanism in YARLY) and a blob of opaque data. Depending on the mechanism, the initial challenge from the server may be empty. We don't care a jot: we just pass this to the security library to deal with.
The SASL [ RFC] goes into detail about other features (that we don't need), the kinds of ways SASL could be attacked, and so on.
Unless you're a security geek, all you should care about is the impact on the protocol, which is as simple as I've explained here.
+++ Large-scale File Publishing
Let's put all these techniques together into a file distribution system that I'll call FileMQ. This is going to be a real product, living on []. What we'll make here is a first version of FileMQ, as a training tool. If the concept works, the real thing may eventually get its own Guide.
++++ Why make FileMQ?
Why make a file distribution system? I already explained how to send large files over 0MQ, and it's really quite simple. But if you want to make messaging accessible to a million times more people than can use 0MQ, you need another kind of API. An API that my five-year old son can understand. An API that is universal, requires no programming, and works with just about every single application.
Yes, I'm talking about the file system. It's the DropBox pattern: chuck your files somewhere, and they get magically copied somewhere else, when the network connects again.
However what I'm aiming for is a fully decentralized architecture that looks more like git, that doesn't need any cloud services (though we could put FileMQ in the cloud), and which does multicast, i.e. can send files to many places at once.
FileMQ has to be secure(able), has to be easily hooked into random scripting languages, and has to be as fast as possible across our domestic and office networks.
I want to use it to back-up photos from my mobile phone to my laptop, over WiFi. To share presentation slides in real-time across fifty laptops in a conference. To share documents with colleagues in a meeting. To send earthquake data from sensors to central clusters. To back-up video from my phone as I take it, during protests or riots. To synchronize configuration files across a cloud of Linux servers.
A visionary idea, isn't it? Well, ideas are cheap. The hard part is making this, and making it simple.
++++ Initial Design Cut - the API
Here's the way I see the first design. FileMQ has to be distributed, so every node can be a server and a client at the same time. But I don't want the protocol to be symmetric, because that seems forced. We have a natural flow of files from point A to point B, where A is the "server" and B is the "client". If files flow back the other way, we have two flows. So, FileMQ is //not// a synchronization protocol, though synchronizing two directories is going to be a common use case.
Thus, I'm going to build FileMQ as two pieces: a client, and a server. Then, I'll put these together in a main application (the "filemq" tool) that can act both as client and server. The two pieces will look quite similar to the nom_server, with the same kind of API:
[[code language="C"]]
fmq_server_t *server = fmq_server_new ();
fmq_server_bind (server, "tcp://*:6000");
fmq_server_publish (server, "/home/ph/filemq/share", "/public");
fmq_server_publish (server, "/home/ph/photos/stream", "/photostream");
fmq_client_t *client = fmq_client_new ();
fmq_client_connect (client, "tcp://");
fmq_client_subscribe (server, "/public/", "/home/ph/filemq/share");
fmq_client_subscribe (server, "/photostream/", "/home/ph/photos/stream");
while (!zctx_interrupted)
sleep (1);
fmq_server_destroy (&server);
fmq_client_destroy (&client);
If we wrap this C API in other languages, we can easily script FileMQ, embed it applications, port it to smartphones, and so on.
++++ Initial Design Cut - the Protocol
To start with we write down the protocol as an ABNF grammar. The grammar starts with the flow of commands between the client and server. You should recognize these as a combination of the various techniques we've seen already:
filemq-protocol = open-peering *use-peering [ close-peering ]
open-peering = C:OHAI *( S:ORLY C:YARLY ) ( S:OHAI-OK / error )
use-peering = C:ICANHAZ ( S:ICANHAZ-OK / error )
close-peering = C:KTHXBAI / S:KTHXBAI
error = S:SRSLY / S:RTFM
Here are the messages to and from the server:
; The client opens peering to the server
OHAI = %x01 protocol version identity
protocol = string ; Must be "FILEMQ"
string = size *VCHAR
size = OCTET
version = %x01
identity = 16OCTET
; The server challenges the client using the SASL model
ORLY = %x02 mechanisms challenge
mechanisms = size 1*mechanism
mechanism = string
challenge = *OCTET ; 0MQ frame
; The client responds with SASL authentication information
YARLY = %x03 mechanism response
response = *OCTET ; 0MQ frame
; The server grants the client access
OHAI-OK = %x04
; The client subscribes to a path
ICANHAZ = %x05 path options
path = string ; Full or partial path
options = dictionary
dictionary = size *key-value
key-value = string ; Formatted as name=value
; The server confirms the subscription
; The client sends credit to the server
NOM = %x07 credit
credit = number
number = 8OCTET ; 64-bit integer, network order
sequence = number
; The server sends a chunk of file data
CHEEZBURGER = %x08 sequence operation filename
offset headers chunk
sequence = number
operation = OCTET
filename = string
offset = number
headers = dictionary
chunk = FRAME
; Client or server sends a heartbeat, the other peer responds
HUGZ = %x09
HUGZ-OK = %x0A
; Client closes the peering
And here are the different ways the server can tell the client things went wrong:
; Server error replies
S:SRSLY = %x80 reason ; Refused due to access rights
S:RTFM = %x81 reason ; Client sent an invalid command
The FILEMQ/1.0 protocol is specified on the [ 0MQ unprotocols website].
++++ Building and Trying FileMQ
The FileMQ stack is [ on github]. It works like a classic C/C++ project:
git clone git://
cd filemq
make check
You want to be using the latest CZMQ master for this. Now try running the {{filemq}} service:
cd src
And open two file navigator windows, one into {{src/fmqroot/send}} and one into {{src/fmqroot/recv}}. Drop files into the send folder and you'll see them arrive in the recv folder. The server checks once per second for new files. Delete files in the send folder, and they're deleted in the recv folder similarly.
This isn't a full replication protocol though the delete function suggests that it might become one.
++++ Internal Architecture
To build FileMQ I used a lot of code generation, possibly too much. However the code generators are all reusable in other stacks. They are an evolution of the set we saw earlier:
* codec_c.gsl - generates a message codec for a given protocol.
* server_c.gsl - generates a server class for a protocol and state machine.
* client_c.gsl - generates a client class for a protocol and state machine.
The best way to learn to use GSL code generation is to translate these into a language of your choice and make your own demo protocols and stacks. You'll find it fairly easy. FileMQ itself doesn't try to support multiple languages. It could but it'd make things needlessly complex.
The FileMQ architecture actually slices into two layers. There's a generic set of classes to handle chunks, directories, files, patches, SASL security, and configuration files. Then, there's the generated stack: messages, client, and server. If I was creating a new project I'd fork the whole FileMQ project, and go and modify the three models:
* fmq_msg.xml - which defines the message formats
* fmq_client.xml - which defines the client state machine, API, and implementation.
* fmq_server.xml - which does the same for the server.
You'd want to rename things, to avoid confusion. Why didn't I make the reusable classes into a separate library? The answer is two-fold. First, no-one actually needs this (yet). Second, it'd make things more complex for you as you build and play with FileMQ. It's never worth adding complexity to solve a theoretical problem.
Although I wrote FileMQ in C, it's easy to map to other languages. It is quite amazing how nice C becomes when you add CZMQ's generic zlist and zhash containers, and class style. Let me go through the classes quickly:
* fmq_sasl: encodes and decodes a SASL challenge. I only implemented the PLAIN mechanism, which is enough to prove the concept.
* fmq_chunk: works with variable sized blobs. Not as efficient as 0MQ's messages but they do less weirdness and so are easier to understand. The chunk class has methods to read and write chunks from disk.
* fmq_file: works with files, which may or may not exist on disk. Gives you information about a file (like size), lets you read and write to files, remove files, check if a file exists, and check if a file is "stable" (more on that later).
* fmq_dir: works with directories, reading them from disk and comparing two directories to see what changed. When there are changes, returns a list of "patches".
* fmq_patch: works with one patch, which really just says "create this file" or "delete this file" (referring to a fmq_file item each time).
* fmq_config: works with configuration data. I'll come back to client and server configuration later.
Every class has a test method, and the main development cycle is "edit, test". These are mostly simple self tests but they make the difference between code I can trust. and code I know will still break. It's a safe bet that any code that isn't covered by a test case will have undiscovered errors. I'm not a fan of external test harnesses. But internal test code that you write as you write your functionality... that's like the handle on a knife.
You should, really, be able to read the source code and rapidly understand what these classes are doing. If you can't read the code happily, tell me. If you want to port the FileMQ implementation into other languages, start by forking the whole repository and later we'll see if it's possible to do this in one overall repo.
++++ Public API
The public API consists of two classes (as we sketched earlier):
* fmq_client: provides the client API, with methods to connect to a server, configure the client, and subscribe to paths.
* fmq_server: provides the server API, with methods to bind to a port, configure the server, and publish a path.
If I was a keen young developer eager to use FileMQ in another language, I'd probably spend a happy weekend writing a binding for this public API, then stick it in a subdirectory of the filemq project called, say, "bindings/", and make a pull request.
The actual API methods come from the state machine description, like this (for the server):
<method name = "bind">
<argument name = "endpoint" type = "string" />
zmq_bind (self->router, endpoint);
<method name = "publish">
<argument name = "location" type = "string" />
<argument name = "alias" type = "string" />
mount_t *mount = mount_new (location, alias);
zlist_append (self->mounts, mount);
++++ Design Notes
The hardest part of making FileMQ wasn't the protocol part, but maintaining accurate state internally. An FTP or HTTP server is essentially stateless. But a publish/subscribe server //has// to maintain subscriptions, at least.
So I'll go through some of the design aspects:
* The server and client use virtual paths, much like an HTTP or FTP server. You define the "root" to be, e.g. ./fmqroot/recv, and then file names are relative to that root. Sending physical file names across the network is not a good idea.
* The client detects if the server has died by the lack of heartbeats (HUGZ) coming from the server. It then restarts its dialog by sending an OHAI. There's no timeout on the OHAI since the 0MQ DEALER socket will queue an outgoing message indefinitely.
* The server detects if a client has died by its lack of response (HUGZ-OK) to a heartbeat. In that case it deletes all state for the client including its subscriptions.
* The client API holds subscriptions in memory and replays them when it has connected successfully. This means the called can subscribe at any time (and doesn't care when connections and authentication actually happen).
* The server allows multiple "mount points", i.e. real directories in different places that are presented to clients as a single tree they can subscribe against.
* There are some timing issues: if the server is creating its mount points, while clients are connected and subscribing, the subscriptions won't attach to the right mount points. So, we bind the server port as last thing.
++++ Reliabilty
As it stands, FileMQ implements the classic 0MQ publish-subscribe pattern. That is, clients receive a stream of updates but with no guarantees about overall consistency. To make FileMQ reliable we'd have to add some functionality:
* A way for clients to request all patches since a certain time (possibly, all patches).
* A way for the server to store patches and subscriptions persistently.
++++ Configuration
I've written many servers, like the Xitami web server that was popular in the late 90's, and the OpenAMQ messaging server. Getting configuration easy and obvious was a large part of making these servers fun to use.
We typically aim to solve a number of problems:
* Ship default configuration files with the product.
* Allow users to add custom configuration files that are never overwritten.
* Allow users to configure from the command-line.
And then layer these one on the other, so command-line settings override custom settings, which override default settings. It can be a lot of work to do this right. For FileMQ I've taken a somewhat simpler tack: all configuration is done from the API.
So this is how we start and configure the server, for example:
[[code language="C"]]
server = fmq_server_new ();
fmq_server_configure (server, "server_test.cfg");
fmq_server_publish (server, "./fmqroot/send", "/");
fmq_server_publish (server, "./fmqroot/logs", "/logs");
fmq_server_bind (server, "tcp://*:6000");
We do use a specific format for the config files, which is [ ZPL], a minimalist syntax that we started using for 0MQ "devices" a few years ago, but which works well for any server:
# Configure server for plain access
monitor = 1 # Check mount points
heartbeat = 1 # Heartbeat to clients
location = ./fmqroot/logs
virtual = /logs
echo = I: use guest/guest to login to server
# These are SASL mechanisms we accept
anonymous = 0
plain = 1
login = guest
password = guest
group = guest
login = super
password = secret
group = admin
One cute thing (which seems useful) the generated server code does is to parse this config file (when you use the fmq_server_configure() method) and execute any section that matches an API method. Thus the 'publish' section works as a fmq_server_publish() method.
+++ File Stability
It is quite common to poll a directory for changes and then do something 'interesting' with new files. But as one process is writing to a file, other processes have no idea when the file has been fully written. One solution is to add a second "indicator" file which we create after creating the first file. This is intrusive, however.
There is a neater way, which is to detect when a file is "stable", i.e. no-one is writing to it any longer. FileMQ does this by checking the modification time of the file. If it's more than a second old, then the file is considered stable, at least stable enough to be shipped off to clients. If a process comes along after five minutes and appends to the file, it'll be shipped off again.
For this to work, and this is a requirement for any application hoping to use FileMQ successfully, do not buffer more than a second's worth of data in memory before writing. If you use very large block sizes, the file may look stable when it's not.