No description, website, or topics provided.
Python C Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

#caffeinecrypto guide caffeinecrypto is a C library that provides fast cryptography. It also includes Python bindings. Drew intends the following line to be installation instructions, and to be run from the caffeinecrypto directory.

python build && sudo python install

or drag-n-drop into XCode. Not sure what you're going to "drag-n-drop", so good luck!

Or just use caffeine.

WARNING WARNING WARNING: some compilers are known to produce bad binaries, like the default compiler on Debian Squeeze. You should run the unit tests (tests.c, to verify that your version of caffeinecrypto is correctly built. You are recommended to use clang, as that’s what we use ‘round here.

##What’s the matter with good ol SSL?

Well, SSL is slow. Specifically, an SSL handshake involves four-ish parts (2 RTT), depending on how elaborate the cipher negotiation is. If you are on a broadband connection with a ping of 30ms, this is not a big deal. But if you are on a mobile app, with a ping time of 200ms in a reasonably covered urban area, and you cannot maintain an open SSL connection in the background due to battery life concerns, then this introduces a big delay into your load time.

While I was going to the trouble of fixing that problem, I might as well chuck “features” from SSL that are highly dangerous for API use, and have led to many security vulnerabilities in practice. By forcing clients to actually know the identity of the servers they connect to (which is certainly the case for common API applications), I was able to eliminate the fat from the SSL handshake, going from 6.5k to just 72 bytes. It’s a small thing, but it makes a difference in the mobile world.

Then, I switched the cryptographic algorithms. TLS/SSL go through a complicated negotiation phase to pick algorithms that are mostly pretty old. For example, a lot of SSL traffic (including goes through lowest-common-denominator RC4, which has some very concerning weaknesses. Hashing algorithms like SHA perform poorly on small message sizes, which is a common case for “pings” in modern networked applications. Modern algorithms are often simpler and have a brighter future for efficient implementation in mobile devices. I utilized cutting-edge cryptographic primitives from D. J. Bernstein’s NaCL cryptographic library, and borrowed his code heavily. Where possible, I have tried to verify that the output from my library aligns with his paper, but my design diverges just slightly, so it is not always possible.

Then, most SSL/TLS libraries are designed to sit right atop TLS. In caffeine proper we use ZeroMQ as a wire format, and so that won’t work. Notably, encryption has been discussed endlessly in the ZeroMQ community. For various reasons that you can find on the mailing lists, the best option seems to be doing per-message encryption, but key exchange and performance considerations have made this prohibitive. As I have been thinking very carefully about ZeroMQ-is-wire while working on caffeinecrypto, it may spark some new ideas in that discussion, and is at any rate an early serious attempt at exploring the per-message approach.

Please note, however, that I am not a cryptographer. I just play one on TV. I have done my best with this library, and it is secure enough for my purposes. But this stuff is considerably dangerous, and you should expect vulnerabilities. Please don’t use it to protect the nukes.

##How does this work?

caffeinecrypto provides layers of primitives, which are called boxes.

The outermost box would be caffeine itself, if you are using it (caffeinecrypto is designed to be usable alone). It is worth noting that caffeine does not have much in the way of permissions right now. At the present time, the caffeine server simply checks the identity of each client (from the session layer) against a list of authorized clients. Those clients are then allowed to do just about anything.

Notably, caffeine (or whatever) is responsible for determining a wire format (which is not necessarily TCP).

Then we have the session box. The session performs the handshake which exchanges information needed by the layers below, provides the client’s identity to the server for verification uplayer, and confirms the server’s identity.

Then we have the crypto_streambox. Given a private key and another party’s public key, it generates a shared secret that is utilized downlayer. The mechanism used is Curve25519.

Then we have the crypto_secret_streambox. Given a shared secret, it encrypts messages and computes a MAC so that they cannot be modified. The primitives are Salsa20 and crypto_onetimeauth (poly1305).

Let’s take each of these in turn.

A warning about random numbers.. Always use good random numbers when doing cryptography. If in doubt, use secure_random.c or, which is probably a lot better than whatever you were thinking of doing. Bad random numbers went undetected in Debian for years. Don’t let it happen to you.


You can find unit tests for session inside caffeine proper: in CryptoTests.m. It may be helpful to follow along as you read this explanation.

We first initialize a session. In the case of a server, we simply need a private key, which is any 32 random bytes. In the case of a client, we also need the public key of a server to connect to. If we accidentally provide the public key of a different server, the connection will fail.

The client sends the first packet. The packet has three parts:

  • A magic string (8 bytes) identifying the protocol version
  • The client’s public key (32 bytes)
  • 8 random bytes. The 8 bytes will form part of the session’s nonce.

The server sends the second packet. The packet has two parts:

  • A magic string (8 bytes) identifying the protocol version
  • 16 random bytes. The 16 bytes will form the other part of the session’s nonce.

We’re done. We’re now ready to transmit application data.

###what’s going on with the nonce?

If you encrypt the same thing with the same key, you will get the same result. If I encrypt the command rm -rf * and send it to a server, it might look like pqo3kfj. But it will be the same pqo3kfj each time.

If somebody is snooping on the network traffic, they could just record what a legitimate user does, and “play it back” in the future. This is called a replay attack. So they “play back” pqo3kfj and poof, your files are gone.

Even if you number each message, it just means that somebody has to play the entire session in order to run rm -rf *.

Using a nonce as part of the encryption parameters means that no two sessions are the same. In each session, rm -rf * encrypts to something distinct, even if the keys are the same. And so if somebody tries to dig out old, legitimate packets to “replay” to the server, those packets will be encrypted with a different nonce than the one being used in the new session.

Having each party contribute some bytes to the nonce ensures that it’s not under any one party’s control. Because if it was, that party could pick the nonce from an earlier session, and send old packets.


crypto_streambox has unit tests in the file tests.c. It may be helpful to follow along as you read this description. These unit tests verify some of the outputs from DJB’s paper.

In crypto_streambox, we derive our own public key from our private key, and also derive a shared secret from all of that and the public key of the server we are connecting to. It’s byte-for-byte identical to what DJB does in sections 3-5 of his paper.

The important thing about the private keys is that they don’t go over the wire.

The important thing about the public keys is that you can’t generate a private key given a public key. Just the other way around.

The important thing about the shared secret is

  1. It never goes over the wire, and
  2. The only people who can figure it out are parties to the conversation. That is, you have to have one of the party’s private key to derive it. You can’t derive it from both of the public keys.


crypto_secret_streambox has unit tests in tests.c. It may be helpful to follow along as you read this description.

Here’s where the theory gets a little deep.

We start with the shared secret from the crypto_streambox. We take this key, plus the 24-byte nonce, and through a two-step process derive a key called the secondkey. If you want the nitty gritty details of this, you can see section 8 in DJB’s paper. If you really want the details, and why there’s this “hsalsa20” and “xsalsa20” thing that aren’t quite salsa20 and why there are two steps, you can go read this other paper. The executive summary is that it has to do with cramming a 32-byte key plus a 24-byte nonce into a 32-byte secondkey in a way that is OK.

down the rabbit hole: encryption

Now that we’ve got the secondkey which is a per-session key, we are ready to do some encryption. So we generate a Salsa20 bytestream from that second key, which is (in theory) just as good as a one-time pad. A nearly infinite set of bytes that you can only know if you have the second key. Which, as we have discussed, you can only know if you know the shared secret (see crypto_streambox), which you can only know if you are one of the parties to the conversation (see session).

Then we take whatever the message, and we xor it with bytes from the stream. And then on the other side, they are un-xored.

One big way that we diverge from NaCL is that between each message, they increment the nonce. And then obviously they have to generate a new secondkey, and then send the next message. I was concerned that this is slow for small messages, which is a common case. (Although now I am not so sure that my way is faster when you factor in MAC computation.) So what we do is we leave the nonce the same, leave the secondkey the same, but just keep reading ahead in the Salsa20 stream. Due to the nature of Salsa20, it’s efficient to get stream in 64-byte chunks, so for a one-byte message we might “use up” 64-bytes of Salsa20, but that is still quicker than making a new secondkey. It’s very important that we don’t “double use” the same position in the stream between messages, because then we are abusing the nature of the “one time pad”. With DJB’s system, the whole stream changes with each message, so you can just keep reusing the first N bytes of it. But we can’t do that because our nonce doesn’t change, so we have to track the state of where we are in the stream and always increment it, never decrement it.

###down the rabbit hole: MAC

Unfortunately, just encrypting the data in a way that only the client and server can read is, believe it or not, not enough to actually secure it. This fact is illustrated by the following example:

Suppose that Alice is instructing Bob to add $3 to a customer’s account via the crypto_secret_streambox. Further, suppose that Eve is sitting between Alice and Bob on the network (called a man-in-the-middle attack). She cannot understand the contents of the messages, of course, but she can modify them arbitrarily before they get to Bob.

Now Eve has the source code to Alice and Bob’s software, so she knows that the packet that instructs Bob to add $3 to the account has the number $3 stored at, let’s say, the 20th byte of the message. So all Eve has to do is either know or guess that Alice is sending such a packet, and modify the 20th byte. If Eve knows that Alice has sent the number 0x03, she just xors the number that appears at the 20th byte by 0x03, to recover the original byte in the Salsa20 keystream. And then she re-xors that byte with 0xFF. And then when Bob decrypts it, an instruction to add $255 dollars to the customer’s account!

Obiously this will not do, so we use a message authentication code to ensure that the message isn’t tampered with in transit (or if it is, it is rejected). We encrypt the message, and then Alice prepends a 16-byte code. The code is generated from the encrypted message and a shared key—we consume 32 bytes of salsa20 stream for this purpose. Bob generates the same code, and compares it to the one Alice allegedly created. In this way, any tampering with ciphertext is detected.

Note that, because we consume 32 bytes of the salsa20 stream for the MAC key, it’s important to generate our key consistently either before or after our encrypt/decrypt operation. We choose to generate it before encryption.

This also explains why the ciphertext always exceeds the plaintext by exactly MAC_BYTES, or 16. We need room in the message for the MAC.

The reason we encrypt-then-authenticate, rather than building the MAC based on the plaintext, is that it is often easier for an attacker to construct semantically-equivalent plaintexts (for example, by apending spaces). If you have a lot of plaintexts, there is a higher chance you can figure out how to build a good MAC for one of them. This opens the door to exciting attacks like known plaintext attacks and differential attacks. You might be interested in this paper on them.

We use crypto_onetimeauth for MAC creation.

Some things that we should be protected against, ideally

  • any man-in-the-middle tampering in any packet, any time time, ever, before the handshake or after, should produce an error before the next data packet is decrypted.
  • It should not be possible to replay any packet from an old session, or replay packets within a session
  • It should not be possible to authenticate as a particular client without knowing the client’s private key
  • It should not be possible, given only the source code for a client, to build a server that the client can talk to. The missing ingredient is a private key matching the server’s public key that is hardcoded into the client.
  • It should not be possible to understand the communication between client and server, without having one of their private keys.