Attacks on Anonymity Systems: The Theory

By Len Sassaman and Roger Dingledine

Presented at the Blackhat Conference in 2003

Editor's Note: This is a transcript of a really great talk given in 2003. You can watch the full video of it here. I thought it was a shame that the video had under 100 views and also has pretty poor sound quality, so I decided to create a transcript using a custom script which I then manually edited for accuracy (as well as added section headers to for convenience). You can also see the original slides from the talk here. Note that I disclaim all rights to any of this content. I just want to bring it to the attention of more people in a convenient form, and I'm confident that the late Len Sassaman would be OK with that (RIP Len). If you see any errors in the transcript, please submit a PR to fix them. — Jeff Emanuel

Introduction

We're focusing for both the talks on high latency systems like Mixmaster, things for email, remailers. There are some low latency systems for web browsing or something, but we're going to not focus on those quite as much. Okay, so for this one, before lunch, we've got some stuff about adversaries and threat models. We're going to give you a walkthrough of the Mixminion design process. I'll tell you what that is in a bit.

We've got some alternate designs besides Mixminions so you can get a feel for the flavor of possibilities. And then we've got some controversial issues and some economics to look at. So here's the motivation slide. I think at a conference like this most people have a pretty good sense of the motivation. The obvious one is individuals.

People need their privacy. DoubleClick is tracking you every day. They're going to have such a huge dossier on you in 20 years. They're going to know all of your interests. You try to get insurance from somebody and they say, do you like car racing?

And you say no. And they say, but six years ago you bought all those books on car racing from Amazon. So we're trying to find a way for people to communicate without leaving trails everywhere. On the other hand, political dissidents in oppressive countries need something so that they cannot be monitored all the time. Governments might want to go look at a terrorist site without revealing that they are the ones going to look at the terrorist site.

And corporations need protection against traffic analysis. You'd think that a VPN or encryption or something would be good enough. But if Microsoft is talking a whole lot with Apple suddenly, then maybe people observing the internet will learn something from that. And if you can deal with the traffic analysis side of it also, then you're in better shape. Okay, so the basic idea for anonymity systems is hide users with users.

A lot of people think that you can get pretty good anonymity by breaking into a machine in Korea, going there, and then communicating from there. But if your adversary is large enough to be able to see the machine in Korea, who's talking to it, who's talking from it, then you're really not getting very much. So the idea here is you've got lots of people doing the same basic thing and they blend with each other. You can't really tell who is the one sending the message or receiving the message or something like that. So the more noise, the more people, the more anonymous something in that noise is.

From an economics perspective, we're doing pretty well because senders consume the anonymity. They make use of the anonymity from the other senders. And at the same time, they're providing cover for the other people. So the key point here is users are critical. And that interestingly means that it might be that users are better off on crowded systems even if those are weaker from a security perspective because users are what it's all about.

And we'll get a lot more into that in part two of the talk. So why do we use more than one node? Why is it not simply you go to the machine and then you go out? Why do you spread out? I'll talk a bit more about how you spread out in a bit.

But the idea is you can't simply use an anonymity system for one entity. If the Navy wants to set up a remailer so that nobody knows that messages coming from the remailer are from the Navy, you got to have other people using it. It cannot simply be 20 Navy people saying, aha, you don't know which Navy person I am because that's not what the Navy is trying to do. They're trying to make you not know that it's coming from the Navy. So even for a large corporation or government, you can't simply run the infrastructure yourself because then people know that it's you.

You must carry traffic for other people. And the trick there is other people won't want to use it if only you run the infrastructure. Some guy in Slovenia is not going to be using a remailer that is run entirely by the Navy. He'll set up something that he trusts, the Navy will set up something they trust, and so on and so forth. And that's how you get the distributive trust.

Adversaries and Threat Models

Okay, so there are a bunch of different characteristics we might look at for an adversary. We'll talk more about these concepts in part two, but I'll briefly go over them now. Basically you can pick and choose from this list to create whatever adversary you might be looking at. You might be looking at an adversary who can participate in the system, or maybe he can just look at the wires between nodes. Maybe he just watches and logs like Carnivore, or maybe he owns some hackers and he breaks into things.

Maybe he only has jurisdiction over a couple of the nodes, or can only break into the IRIX boxes or something like that. Or maybe he's global, maybe he can look at the entire internet. Or you can look at static versus adaptive. Maybe he deploys his Carnivores and he's done. Or maybe he has a bunch of Carnivores in the back and as soon as he knows where he wants them he puts a new one out, and that way he can follow you wherever you're going.

Okay, so some sample adversaries. We've got the global passive adversary, that's the one the research people focus on a lot. Somebody who watches the entire internet, but doesn't really do very much. They've hired some smart traffic analysis people and they sit in the back and they learn everything just by watching the traffic. Or you could have some guy who signs up a node in the Remailer network and says, aha, I'm going to watch the traffic through my node, I'm going to learn things.

I'm going to learn maybe who sends, who receives. Maybe I can't correlate them that easily, but maybe I can. Or you've got an external attacker, somebody who sits around and floods one of the nodes to influence the anonymity or the service provided by the network. Okay, so the Mixminion Threat Model. Mixminion is a Remailer that we've designed recently to improve over all the issues that showed up in previous designs.

The Mixminion Threat Model is all three. We want to handle a global passive adversary who can observe everything on the internet. We want to handle somebody who can sign up some of the nodes, hopefully not more than half of them, and somebody who can inject, modify, delete some of the traffic. Not all the traffic, because then we'd be in really bad shape, but he can flood messages, he can delete certain things, he can delay things, stuff like that.

We're not real-time, we're not packet-based, we're not steganographic, those are all the first questions people ask. We're not trying to do this for web browsing or something, it's just for email at this point. We won't actually succeed against an adversary that's this hard, but we'll come kind of close and you'll see which attacks still work towards the end. I mean, this is a pretty hard adversary. Most designs aim for something weaker than this because, I mean, this is large, smart governments.

Remailer Goals and Design

Okay, so basic idea: Alice wants to send a message to Bob and she wants Bob to not know where it came from. So the first thing you can do is set up a direct forwarder. Basically node number one gets the message from Alice. It says, "Hey, give this to Bob," and node number one gives it to Bob. So in the simple naive case, if Bob can only look around him, he doesn't know that it came from Alice because number one strips it off.

On the other hand, if you're observing Alice at the time, then you know, you watch the message come out of her and it says, this is my message, M, please send it to Bob. And you know that Alice is sending it to Bob. So the next step is we add in some encryption. Alice now encrypts her channel to node one and basically that means that somebody watching that channel cannot learn it. On the other hand, you might just own node one already or you might break into node one or subpoena it or all sorts of attacks.

So that's not quite good enough. So we use multiple hops. The basic idea is Alice chooses a path or maybe a path is chosen for her, it depends on the design, and each node, one, two, three, and so on, has a public key. The idea is you take your message and you encrypt it with the public key of the last node in the path and then you encrypt that with the public key of the second to last and so on. And it gets larger and larger figuratively, though it stays the same size because encryption works like that.

And the idea is basically you can send it through the path and at each point they unwrap one layer of the encryption. So each node knows where it came from and where it's going, but it looks different at each point. So if I own node one and node four, then I can't recognize the message that way. I have to rely on timing attacks or something like that. We'll deal with those later.

Okay, so we assume that not all of the hops collude because if the adversary owns one, two, three, and four, then he just watches it go through or he decrypts it all at one or whatever. But hopefully you're okay if you used multiple different jurisdictions or something like that. Okay, so how do you know what the servers are? That's another question. If Alice doesn't know about one, two, three, yes?

That's correct. In this particular case, you see message M going to Bob, that's in plain text. Because we're not assuming Bob has any sort of public key that everybody knows. If Alice knows Bob's public key, then by all means she could use PGP or something like that. So the goal of this, that's a good point, the goal of these remailer systems is to provide anonymity of the sender, not anonymity of the data.

If Alice signs her name on the document, it sucks to be Alice. You got to keep that in mind. We're only providing anonymity of the routing information. We're not providing the data. The other parts are a solved problem.

Use those solutions in conjunction with this. There's no reason to tackle that over again. Yeah? So don't try to enable a scenario where Bob can respond to Alice without going to Alice? That comes next.

Directory Servers and Reply Blocks

Okay, well, almost next. So we've got directory servers here because Alice doesn't know which nodes she can choose. So the idea is you've got servers that keep track of all the nodes. When you join the network, you say, hey, Mr. Directory server, please sign me up and let people know about me. And the directory servers can also send test messages through the new nodes.

So they can know which ones are reliable, how often they're reliable. Maybe they can provide an ordered list of nodes so you can know which ones will work well. Because reliability is kind of an issue here because if Alice sends through one, two, and three, and three drops it, Alice never knows that it got dropped. We'll get a bit into that later. So and of course you want more than one directory server because if you have only one, then that's yet another bottleneck.

Okay, so that gets to your issue. What if Bob, what if Alice wants to let Bob send something back? Can we set up some sort of reply block so that Bob can respond without knowing who Alice is? In this case, I'm switching the identities here so it's now Bob sending to Alice. So when Bob wants to send to Alice back, he has some identity of Alice.

And in the simple case, he just goes to the forward, he goes to node number one and says, hey, I've got this message M, please give it to Alice, whoever Alice is. I've got this pseudonym Alice. And one says, excellent, I've got a table, I look up which one Alice is, Alice is this email address, I send it on. In this case, Alice has told number one where she lives. And this is great except number one knows where Alice lives.

And if you go and subpoena them, then you're all set. Or skip the subpoena and just break into their box. Yes. Okay, so the extension to that is reply blocks. Basically, you do the same thing that we've been doing for multiple hops forward as multiple hops backward.

So the idea here is Bob B generates a reply block for himself. He chooses a path through the network. He does the multiple encryptions. And then he's got this big blob of bits, which Alice can't really read. But Alice knows, if I use this blob of bits, I'll get to Bob.

And he also says, and by the way, send it to node number one. And then Alice gets this blob of bits. And she says, I want to send this message. She goes to node one and says, here's the blob of bits that I hope is this guy I'm sending it to. And here's the message.

And then the first node decrypts it and then finds that it's supposed to go to node two. And here's the new blob to send. And it goes down the line until, at the very end, the message is going to Bob. And if Alice knows a public key for Bob, she could encrypt it. Or if she doesn't, then that's okay too.

On the other hand, it's hard for Bob to get this reply block from, it's hard to get a reply block to Alice. Because how would you know who Bob is? So we add in a Nymserver. Basically, it's a machine out there where if you want to receive mail anonymously, then you go to that place and you say, here's my pseudonym. Here's a reply block that will work for it.

And now when somebody wants to send mail to you, they go to the Nymserver and say, here's my message. Send it to that pseudonym. And they don't know where the pseudonym goes. They don't know which email address it is or which country the person lives in or whatever. But the mail gets to them.

And we can... Yes? Yes. Yes. At this point, the reply block that the Nymserver has is the same. The only reason why the owner of the reply block would want to change it is if one of the nodes in that path dies.

And then you say, oh shucks, I'm not getting any mail. I guess I'll go give it a new reply block. And there are some authentication techniques so other people can give you a reply block. But that's the basic idea. And yes, there's a problem with that.

So we'll fix that in a bit. Okay. So we can combine these two ideas. Bob can choose a path to the Nymserver and encrypt the message that way. And then the Nymserver uses Alice's reply block to encrypt it to Alice.

So now somebody watching the Nymserver sees a message come in out of the ether and a message go off to some reply block. And they can perhaps correlate that it's the pseudonym that it's going to. But they have no idea who the sender was. And they don't know the location of the recipient. And if you stop here, you get Type I or Cypherpunk remailers, which were done in, what, '92? '92, originally. '91, '92. Something like that. Okay.

Monitoring and Attacks

Well, there's one problem here, which is as a message travels through each node, if you're able to watch the network, you can just watch a message go in, go out, go in, go out. And you don't need to know what is actually in that message because even though you can't see the message and can't tell the two messages are the same, you know basically that they're the same message because you've just watched it hop back and forth through the network. So what we want to do is make it so somebody watching the network, the global adversary, can't determine that a certain input is the same as a certain output. So we end up bringing messages in and reordering them so that they go out in a different order than they came in. Combine that with multiple nodes and you have a pretty difficult time tracking which message is which based on order if you have enough people in the, users in the network. There's still a problem with the Cypherpunk system, which is each message is just PGP encrypted in the Type I system and each message then is its own size.

There is not a uniform size for each message. So you could still watch messages and have a pretty good idea which one is which based on their sizes, which are roughly the same for each hop. So to fix that, we pad each message to a fixed size or if it's too large, split it and pad the fragments if need be. So you have random junk added at the bottom of the header and you just rotate, you pull off the top header as it comes through a node, put more filler down at the bottom and then the message looks the same size through every hop and all messages look the same. However there's a bigger problem, particularly with the NIM reply blocks as was brought up earlier, which is replays.

If you were to take a message and grab it out of the network, you're seeing that Alice is sending this message. You don't know who she's sending it to, but you can grab a copy of that and then pump it through again and watch the network. You will see the same message go through the same hops. Do this a few times. If you're not certain, eventually you figure out this is the message that she's sending.

I'm going to watch it. I'm going to see that it comes out. If she's posting to a public forum, it's even easier. You don't need to be a global adversary. You just need to be able to grab a hold of her message and then watch as it pops up on the public forum multiple times.

So the solution to that or the beginning of a solution to that is a replay cache. When a message comes in, take a hash of it, store that, and then the next time that same message comes in, check it against the hash table and if it's in there, just drop the message. It's already gone through.

Resource and Blending Attacks

There's a resource problem with that, which is nobody has indefinite disk space and eventually your table will fill up. It's also an inviting attack there for somebody who wants to cause denial of service trouble, which is just start flooding your remailer with individual messages and let the hash table grow.

So a solution to that, then, is add expiration dates on each message so that if a message hits a node a certain amount of time after it was created, say, seven days, that message is assumed to be a replay and dropped. Well, that leaks some information about the message because you're now time stamping when it was created. So implementations of this generally do not put an accurate time stamp. They just fuzz it by about three days so that there's a more difficult probability that somebody will be able to tell which message came from which sender based on time. There are still possibly statistical attacks against that, though.

And then there's the class of attacks known as blending attacks. You can flood a node so that all messages going into this node are your messages except for a given message that you're interested in. So if the node is processing only your messages plus the message you're watching, when those messages then pop out the other side, you know which messages are yours. The one that you don't know is the message you want, and you just repeat this through the nodes, and you've basically just isolated that message out in its own little bundle of your own messages. So there's no anonymity there at all.

You have no crowd. Well, we solve this by creating a message pool. Messages come in, and they don't just get mixed up with the messages that are coming in at that time and sent out. They come in, and they're stored, and an individual amount of messages at given intervals is then allowed to leave. This makes it much harder to do a blending attack because not all messages are leaving at once.

You need to control the traffic for a long time, and probably you need to somehow prevent other legitimate senders from actually sending to that node. Which is the next attack. Make sure that during the entire pooling cycle, only your messages get in with the exception of the one message you're looking at, and you're back to the same attack working just a little bit more set-up time. If you're the only one who can send to the entire network, if somehow you've stopped all senders from being able to get there, which is possible through a number of different attacks perhaps on the reputation servers, then you win again.

Dummy Messages

So we have this concept of dummy messages now. There's really two classes of dummy messages. There's internal dummy messages, and then there's user dummy messages. The internal dummy messages are easier, so we'll talk about them first. In order to confuse somebody watching the network, you want to add noise to the system. So if the individual nodes are generating dummy traffic, they make it harder for somebody attempting the blending or trickle attacks to know that coming out of that node are the messages they're looking at.

If each node is generating internal dummy messages and pushing out each flush, then you will see your messages that you're trying to discount, and then a bunch of unknown messages. One of which is the message you're looking at, but others are dummies. Is there a public key compromised because of dummy messages? No. These are just like regular messages, only they have random data in them.

Actually, in Mixmaster, it's random data. So the idea is that... So the trick is that the dummy messages are encrypted, except at their last hop, just like every other message that's sent through the network. So they just look like random blobs for our purposes, and our purposes are, it looks like another message. It's true that it's very hard to write convincing payloads for dummy messages, and that's an unsolved problem. So you probably don't want to send dummy messages to actual recipients, because then you'd have to write something for them to receive.

But you can send dummy messages between nodes in the system, and they will always just look like encrypted stuff until they get to the node that says, I am the end of the dummy message, and by the way, I was a dummy message, and then that node drops it. So there's no worry about losing public-private keys for nodes through dummies. However, this does only cover internal traffic. It gets to the last node, I'm a dummy, I'm dropped, it doesn't do anything for the users. If you want dummy traffic with users, the users need to be sending the dummy messages, and they actually need to be going somewhere.

If we have dummies that are being sent by the users and not delivered anywhere, somebody watching this may be able to figure out a percentage of that person's messages being dummies and so forth. Really the best dummy messages for a user are other users' legitimate traffic. So if you stop there, basically you get the current Mixmaster design, the improvement on cyberpunk remailers that's been running on the internet since '95. We will get to that later in the talk.

The first answer to that is you want the users to anonymize their communications to the Nymserver, but they still have a public-private key pair which is associated only with that NIMS, and then they can authenticate their communications, but they can still be anonymous. Are you asking about the Nymserver or the stats servers with the keys? Nymserver, right. In order to communicate with the Nymserver, you do have to go through... You want to be anonymous talking to the Nymserver, so you want to create messages that... All your control messages should be email messages. That's how a good Nymserver is designed, and you send them all through a chain of remailers just as though it's a one-way anonymous message.

Spam and Abuse

I have given full 90-minute talks on abuses, so I don't want to go too deeply into that. The short answer is spam is not really an issue with high-latency remailer systems except with Usenet spam, because as I will get into in the second part of this talk, it does not make economic sense for a spammer to use a remailer system when there's so many better ways of sending spam. If you're using a... I'm getting ahead of myself here, but in order to be anonymous, you have to behave like everybody else. Not everybody sends billions of messages. So if you're sending millions of messages, you're pretty much caught.

Also, those messages do get delayed over time, because as far as the system is concerned, a spam attack is not really any different than a flooding attack and so forth. It can't tell, so with pooling, those messages will then be trickled out to your recipients. The point here is spammers don't do this, because it doesn't make sense for them. We should save this discussion of what actually happens for Part 2. Yeah, that really is a Part 2 issue.

Improvements to Mixmaster

The goal of Part 1 is to give everybody a crash course in what ought to happen so that we can compare that to what does happen. Okay, so now that we've gone through Mixmaster, let's go through a couple of improvements that needed to happen on Mixmaster. One of them is the passive subpoena attack, and I talk about it as subpoena, but it could also be the passive go beat somebody up attack. It doesn't have to just be a jurisdiction issue. The idea is Eve records messages, and then later on, she shows up and says, hey, decrypt this thing for me.

I saw it on the link, it went into you, tell me what the answer is. And they can do it through jurisdiction or through kneecapping or whatever it takes. And so the fix there is you want to do link encryption. Not only do you want to do link encryption, but you also want to use ephemeral keys and rotate those keys maybe every message, every couple of minutes, something like that. So the idea there is a totally passive adversary just watching the links doesn't get anything out of that, because they see a big pile of bits, it's encrypted, and a few minutes later you don't even know the key, the session key that was used to encrypt that.

So that means that even when they show up and try to learn it from you, you cannot provide it. Mixmaster currently has this kind of feature in it now because it relies on SMTP, and most remailers are running SMTP daemons that have start TLS enabled with ephemeral keys. But we really want that in the protocol so that all remailers are always doing this. Also most things like Postfix, I don't think they rotate their keys very often because they're not looking at the threat model that we're looking at. By default they don't rotate them, they don't very often, but that's configurable.

Okay, so if you've resolved that, once we've got link encryption with good key rotation and so on, there's still an attack where if the node sends you the message rather than simply sniffs it off the network in between, if it's the previous hop, then who cares about link encryption? That guy knows what the message looks like. And he comes to you at his leisure and says, hey I sent this message to you last Thursday and it's encrypted to your public key, and I bet you know your private key. Please decrypt it for me. And so at that point you want to do periodic actual public private key pair rotation.

So we need to deal with multiple public keys so you've got an identity key and stuff like that. But it's pretty straightforward. The basic idea is key rotation is good, otherwise all these attacks where you remember a message and then show up later start working. Yes? I'm proposing that every mix should rotate its keys.

Every node. Yes, every node. Yeah, the users, that's up to them. That also has a side benefit of each time you rotate a key, you can get rid of your replay cache because those messages won't be decryptable anymore, it doesn't matter. Yep.

Esoteric Attacks

Okay, so now we've got a couple slightly more esoteric attacks, but still important to pay attention to. If the clients know different things, so we've got these directory servers and some guy goes to a directory server and learns that directory server's opinion of what the network looks like. And then some other guy goes to a different one and learns that directory server's opinion of what the network looks like. And the opinions are very different. And then the two users send messages.

You can guess which message was which based on which nodes they go to. If one directory server gives one list of nodes and one gives another list of nodes, then they're not mixing with each other. These two users are totally separate. So the issue there is you've got to somehow synchronize the directory servers and you also have to have all the clients having the same algorithm. They all have to go every 10 minutes and learn the new thing.

Because if you go, if one client goes at 6 a.m. and then the network looks totally different by 10 p.m. and another client goes at 10 p.m. and learns it, then you've got the same issue as you had before with the two separate directory servers. So the idea here is you've got to make your users act the same way, at least insofar as knowing what the network looks like. Because otherwise they're going to not be blending with each other as much as you'd like. And a critical thing here is we've got to make directory servers part of the spec. We can't just let people come up with their own ad hoc designs and say, well, this is how I'm going to order them, because I think that people in Austria are more reliable, or something like that.

Because you can't, if not everybody does it, then you're, again, making the clients look different. Okay. So that's one part of the partitioning attack issue. So the partitioning attack idea is you can partition the set of senders into people who send in the morning and people who send in the evening, or whatever you can partition them into. If they're hoping to have all N people blending with them, then you actually make it N over 2 or 1 or something like that.

So the directory servers can be out of sync. Evil directory servers can, if Alice shows up to a compromised directory server, then it gives out a rigged directory only to that Alice, and otherwise looks exactly normal. There are all sorts of tricky attacks like this. One of the fixes that you've got to deal with directory bundles. Basically the directory servers have to all agree on what the directory ought to be, and some threshold of them have to sign that directory.

And that's still sort of an unsolved problem, because in the real world, everybody wants to be a directory server, and not everybody should be. And who gets to decide who gets to be a directory server, and who decides how much the threshold should be, and so on. So that's a problem that still needs to be worked on. We might get to that a bit in part two. Well, the word peer-to-peer means many, many different things.

The short answer is no. For the remailer networks, you've got a set of servers, and you've got a set of users, so they're probably not peer-to-peer, whatever peer-to-peer means. On the other hand, for the directory servers, I mean, they're all peers, and they all talk to each other, so maybe it's peer-to-peer. The trick there is that there's a pretty small number of directory servers, and they're reasonably static, because you don't want to change who's a directory server very often. Because in the code base comes the public keys of the directory server.

And if you have too many changes, then people need to upgrade, and so on. Yep. Yes. His requirements are different than ours, but think of it not as peer-to-peer, but more as directory replication. And then there are tricky problems with this, simply because you can't assume that all the directory servers are going to agree.

You have to assume that directory servers could be run by an adversary, and could be trying to cause problems. And we'll get into this more in part two. But the short answer is, the directory servers are going to talk to each other and replicate. If that's peer-to-peer, then sure. If not, no.

Depends what you mean by that. Okay, so there's also another partitioning attack on the message expiration date. So we fixed it, sort of, with what Len described earlier. You generate an expiration date that's three or four days in the future, and then you fuzz it by three or four days, and then people can't really guess based on when it's supposed to expire when it was sent. But the trick is, if you're a mixed node, and a message arrives, then it's got an expiration date of some time in the future.

You hold onto it for a couple of days. And now it's delayed compared to the other people who sent messages at that point. And then you let it through. And it's close to the end of its valid window. But it still works.

And then you can recognize it later on, because not as many messages on the network are about to expire. Sort of a tricky attack. Doesn't always work. It's a statistical attack, not a clear, deterministic attack. But it's still scary if you want to get good security.

So our fix for this is, get rid of this expiration date stuff. If you rotate your key every month, then you keep a replay cache for that month. It doesn't grow more than a couple gigabytes if you're being flooded for the whole month. And that's good enough for us for now. Okay.

Trickier Attacks: Tagging

So now there are some trickier attacks that work against pretty much all deployed remailer systems. One of them is a tagging attack. So the idea is, in Mixmaster headers, so you've got a header that describes each hop you're going to go through. So maybe if you've got a path of length four, then you've got four headers in the message. And the trick here is that Mixmaster headers have a hash to verify the integrity of all the fields inside that particular header.

Not the entire path, but just that one. So when it gets to your node, you decrypt it. You look at your header that says, this is where it goes next. This is the message. Stuff like that.

And you verify the integrity of that stuff. But the problem there is that if you're looking at a message that has, say, four headers, and you just smash the third header. Who knows what it is because it's encrypted, but you know where it is. It's those pile of bits that are going to get shifted up to the top in two more hops. You just smash it.

You flip some bits. Something like that. Now later on, if you happen to own that third node, then you receive a message which is broken. You receive a message where you decrypt it, you try to verify the integrity, and it fails. And in general, people don't send broken messages.

But you can probably identify that that was the one that you sent over here. So now, rather than, so it used to be that if we had a path of length four, and you owned number one and number four, then due to the delaying and the mixing and so on, you couldn't, and also the encryption, you couldn't know that the message you saw at number one was the message you saw at number four. But now you can tag it at number one, recognize it at number four, you know it came from Alice for one, you know that the message maybe was destined for Bob at number four, maybe you didn't trash the entire header, you just trashed one bit of it or something like that. And now you can totally bridge the entire mixnet. So the first fix is, well gosh, you've got a hash in there, why not make the hash cover all of the headers and then you're done, no problem.

So that's pretty straightforward. We've got the same issue with the payload. Maybe Alice is sending a message to Bob saying, hi, I'll meet you on the corner of Foo and Bar and we'll exchange the money, and you flip some bits in the payload. It's encrypted, but it's going to be the same payload each time, it'll just get decrypted. So if you smash a byte in the payload, then it says, hi, I'll meet you in the corner of Foo and, and then there's this high-level ASCII byte, and then the message continues.

Then you can recognize that that was the one that you tagged. In fact, since the message is maybe 30 kilobytes long or something, you can tag any of those. Maybe you tag number six and number eight and number 4,006. And then if a message comes out and has funny little ASCII characters on those specific bytes, then you know that that's the one you tagged. So the fix is, you've got a hash, why not have it cover the payload too, no problem.

Yes? Well, the, the, okay, so the trick is, let's say Alice is sending to Bob through five hops. The hash covers what the message ought to look like at that point. So if, if it comes to number one and it's good, you check the hash, everything's great. It's heading towards number two and the adversary stomps on it on the wire.

It arrives at number two, it's been tagged, number two decrypts it, looks at the hash and says, oh gosh, I mean, it's all encrypted binary, but the hash doesn't work. And then he drops it right then. Yes, yes. And that way, you don't have to deal with it when it comes to the end and say, oh look, this thing looks funny, I, I hope I'll drop it now. Because by the time it gets to the end, it's too late.

Because that could be the adversary looking at it at the end and saying, oh gosh, that looks funny, excellent. So the, the easy fix is make the hash, for each hop, cover the headers and the payload. Okay, so another issue, we're still using the Cypherpunk replies back from slide four or something. I haven't talked any, about anything about replies at this point. There's no replay detection for replies, no batching, messages still change length at every hop, and so on.

And it's important to notice that Mixmaster doesn't do replies. Mixmaster just uses the Cypherpunk reply mechanism from long, long ago. Well, Mixmaster just tells you to go use the Cypherpunk reply mechanism. Because of the anti-replay functionality in Mixmaster, we can't do replies with it, so we didn't. Okay, and there's a slight detail from the crypto side of things.

Since for forward messages, the user encrypts a whole lot of times, and then it decrypts as it goes. And for reply messages, you want to encrypt at each hop. Because it's the recipient who knows all the keys to decrypt it. So you want to use a crypto system where encrypting is as strong as decrypting. There are plenty of those.

Reply Blocks in Mixmaster

So that, that's pretty straightforward. Okay, now there's a big problem here. I can't write a reply block. I can't write the headers that you will attach your message to, and have the hash in those headers correspond to your message at each hop. Because I don't know what payload you're going to attach to my reply block.

Does that make sense to everybody? Excellent. Okay, so the idea is the author of the reply block can't guess the right hashes for what somebody's going to attach. And that means that the payload tagging attack is back. Because we can't have the hash cover the payload.

We must have the hash only cover the header. On the other hand, that's okay. Because for reply blocks, you're encrypting at each hop. And that means that only the recipient can see if it's been tagged. Because it just turns from garbage bits to garbage bits to garbage bits.

And then it gets to the last guy, and he decrypts it all and says, wait a minute, this thing looks funny. There's a high-level ASCII bit over there. And he's the only one who can know that it looks funny. And since he's the one trying to get the anonymity, then we're okay. Because only the recipient can recognize when it's been tagged.

On the other hand, there's still another problem here. Forward messages, messages that Alice sends forward to Bob, and reply messages are now different. Because in the one hand, we want the hash to cover the payload also. On the other hand, we cannot allow the hash to cover the payload, because we don't know what the payload's going to be. If replies are rare relative to forwards, if most people send mail forward in the network, and only a couple of replies are sent every hour, then you can just follow the replies through the network.

Because you see one come in over here, and then you see another reply over there. And you say, well, gosh, only one came in. It's over here. I guess that's the same one. I just bridged the network.

And so what we want to do is find some way for these messages to look the same, even to the nodes. And this is a novel thing compared to any of the previous designs. A number of the previous designs had them indistinguishable as far as the link was concerned. Because you've got link encryption, so you're all set. But I worry that node one will see it, and node four will see it, and then they'll be able to correlate.

Mixminion Design

OK, so I'm going to switch out of attack-defense, attack-defense mode for a little bit to describe the Mixminion design and give you a feel for how these things can be designed. And then we'll go back into attack-defense after that. So the basic idea for Mixminion is we've got three different ways of sending messages. Alice can send a message forward to Bob. Alice is the one getting anonymity.

Bob can reply. Bob is the one. Alice can send a reply to Bob, where Bob is anonymous, and that's the basic idea. And we've got also the combination, anonymized replies, where Alice sends a message through the path that she chose, and then it goes through the path that Bob chose, and they both don't know who each other is. And maybe there are pseudonyms that they can layer on top of it so they can have an idea of whether it's the same Bob or something.

But we aim to provide bidirectional anonymity. And it's important to notice that in order to get this at all deployable, you don't have to use our software unless you want the anonymity. Otherwise, you can't send a message to your senator anonymously, because otherwise your senator would have to run some of our software to be able to read the message. That's no good. So if you want to be anonymous, then you run our software.

Otherwise, you can run normal mail clients, and it must work. OK. So the basic idea here is we've got two headers and a payload. So imagine Alice is sending a message to Bob, and maybe she's got eight hops. You break it up into four hops here and four hops here.

And we've got two different legs that we're looking at. And it'll become clear soon why that matters. So in the message itself, we've got a header for the first leg, a header for the second leg, and then the payload. So for forward messages, Alice just chooses both of the paths. She chooses the first one and the second one.

And that way, she can be sure of getting her anonymity. For reply messages, Alice is sitting here with a reply block. She just smacks it into the first leg and sends the message. And by the time it gets to the end of the first leg, then it has gotten to its recipient. She doesn't know who that is, but good enough, it works.

For anonymized replies, you've got two legs. Alice chooses the first one, so Alice gets her anonymity. And then Bob chooses the second one. So Bob gets his anonymity. That's the basic design here.

I'll argue security for it in a bit. So the legs are connected by this thing we call the crossover point. And the idea for the crossover point is when it arrives, you decrypt the second header with the hash of the payload, and then you swap the headers. So that means that you were on the first leg, and you swapped them, so now you're on the second leg. And the decryption part means that if the second header or the payload has been tagged, then whatever you're now putting up at the top of the message is going to be decrypted with something different than what it's supposed to be decrypted with.

And that means it's just going to turn into garbage. And okay, let me tell you why that actually works. So in the first case, if the second header or the payload are tagged, while it's traveling in the first leg for a forward anonymous message, then the second header is unrecoverable. If either of them is tagged, you take the hash of the payload, you decrypt the second header, you move it to the top, it looks like garbage. So the crossover point says, gosh, I guess it was tagged, but I have no idea where it was going, because that information was in the second header.

And that means that even if the adversary is able to tag it from Alice to the crossover point, then the adversary cannot learn where it was destined after that. So yes, the adversary knows Alice sent a message, he already knew that, but he doesn't know where the message was going to go. And if it's tagged in the second leg, then Alice has already gotten her anonymity for sure from where she sent it to the crossover point. So each leg has to have enough hops in it to make Alice comfortable. But if it's tagged in the second header, if it's tagged in the second leg, then we're okay because Alice is already anonymous.

And granted, the message may not get to Bob, but our goal here is anonymity, not reliability in this case. And it would be nice to get both, but we'll see. Okay, so replies are anonymous. This one's very easy to argue. Same argument as before.

The adversary can't ever know that he tagged it, because Alice plunks in the reply block and sends it off, and maybe the adversary stomps on a certain byte, and it gets crypted at each point. They don't specify decrypted and encrypted, because they should be the same operation now. And the final blob arrives at the recipient, and he says, gosh, this looks like a blob. I wonder if it was a reply message sent to me. So he tries various reply messages until he gets them.

And there's some optimizations on that, but that's the basic approach. And anonymized replies are safe for the combination of these arguments. If you have Alice's leg and Bob's leg, then if the first leg is tagged, then the second leg is unrecoverable, who knows what it is. You know that Alice sent, but you don't know who Alice sent to, or whether it was a reply block or what. And if the second leg is tagged, but the first one isn't, then who cares, because only Bob notices that it's tagged.

Subtle Problems with the Design

Sound good? Okay, so I've sort of convinced you that maybe it's secure, except it's not. Once we worked on this design for a while, we realized that there's another attack that's subtle, but pretty damaging. And the moral there is, gosh, it's hard to design anonymity systems. So the issue is, what happens if Alice sends more than one message?

Imagine if Alice sends a bunch of messages, and she chooses the same path. She sends maybe 15 messages along the same path to the same crossover point, to the same, and so on. Now the hope before was, if it's tagged, then you can't learn the second leg. But in this case, she sends 15 messages, the adversary tags four of them, the adversary owns the crossover point, 15 messages arrive, four of which are tagged, and the others of which have a nice shiny second leg just sitting there. So now you can tag the second leg, and you win.

So that's a pretty bad issue. And you can even attack multiple of them at a time, because you can maybe tag four of these here, or tag eight of them over here, and then when a bunch of messages arrive at the crossover point, you can recognize how many of which one you tag. So that's a pretty bad attack. On the other hand, it only works if your adversary owns the crossover point. If your adversary does not own the crossover point, then 15 messages arrive, four of them get dumped, 11 keep going, and nobody knows which ones got dumped, or whether they got dumped, because that's how the mixing and batching works.

It prevents an observer from knowing how many in some sense, because there's pooling going on. So the fix is, Alice picks K crossover points for her stream of messages. And so the messages sort of move out like this, and then back like that to Bob. And as long as the adversary doesn't own too many, whatever that means, of the crossover points, then he cannot recognize the signal that he put in. And this is sort of tricky to analyze, it goes into covert channels and stuff.

How much information can the adversary stick in through tagging, such that he can still recognize something later on? So this definitely requires more analysis. But it's maybe a little bit more of a fix than just leaving it gaping wide open. Yes? Alice could use a unique path for every message she sends, that's correct.

There are a number of problems with that. So that's basically the logical extension of K, you just use N. So the biggest problem that I have with that is, imagine you have like 50 nodes in the network, 50 mixes. And imagine the adversary runs 10% of them. If the paths are of length 4 or 5 or 8 or something like that, then the probability of choosing some path which is all owned by the adversary, if you send a lot of messages, is pretty much 1. You're pretty much dead if you send a lot of messages.

And so we need to, in some sense, place a bet. We say, I bet those four aren't all owned by the adversary. I'm going to use them. And if I'm wrong, sucks to be me, he sees everything. But if I'm right, he doesn't see anything.

It sort of depends on the threat model you're worried about, and whether you're worried about the adversary learning who you're communicating with, or how much you're communicating with that person. And in our case, we worry more that the adversary will learn who you're communicating with. Because if you're a civil rights worker working in Africa, and you send messages back to Peace Corps every evening, and the adversary is sitting there going, he sends a message every evening. I wonder where. If he ever once learns that it goes to the Peace Corps, then he could guess that maybe the rest of them go there also.

So we hope to prevent the adversary ever learning that. And that means that if you just spray them into the network, then you're probably screwed. Does that make sense? OK. Yeah.

Yes. That's why you use... Yep. The attack there is, if Alice chooses only one path ever, then the adversary should look at the first node and take his time about compromising it. And then he learns the second node and take his time about compromising that, and so on. And eventually he'll learn the entire path, and he'll get to read all of Alice's messages every time.

So that argues that Alice should not use the same path all the time. She should probably pick new ones every so often. And unfortunately, I can't give you numbers about what every so often means. That's still a research question. But it's a function of what you think your adversary will be, which is kind of hard to guess in the first place.

OK. We get into the question of who your adversary is and what his methods are in part two. Well, with this system we have single-use reply blocks, and that's pretty inconvenient for Alice and Bob talking to each other, because Bob wants to talk to Alice, he's got to get a new reply block every single time. How do you do that? Come back to the Nymserver approach, where the Nymserver stores messages.

And the initial thought we had was, well, we'll just fill it up with a cache of single-use reply blocks, and it'll tick them off there, and you refill it eventually. That has a number of different attacks on it, particularly denial-of-service attacks. So the system that Mixminion is proposing is a sort of IMAP over Mixminion approach, where messages come in and they're stored in the Nymserver, and the recipient will send in reply blocks to retrieve messages and also retrieve control information about those messages. First message would perhaps come through, and the response would be, you had this many messages waiting of these sizes. Do you want them?

You need to send this many reply blocks to me. If you don't want them, send a delete, and so forth. And that works sort of well. There's a problem of now all that mail is stored on the Nymserver, and the Nymserver needs to have greater resources allocated to it, but it's a better approach than simply storing caches of reply blocks or not using a Nymserver. So ending right there is the current Mixminion design.

And there's still plenty more to deal with after that. We have a couple of open problems described here, and there are lots more. I think we might have put the paper on the CD. If not, you can Google for Mixminion and you'll find it. So one of the open problems, reputation on the directory servers.

So what we'd like to do is let the directory servers give you an ordered list of who's reliable, because reliability is a big issue. If you use somebody who's been down for three months, then they're probably not going to be up when your message gets to them. So you need to know who's likely to be there when the message arrives. On the other hand, every directory server has to agree on how you should measure reliability, and they also have to be able to measure the reliability without having somebody change that on them. If your test message scenario allows the adversary to say, that's a test message, that's a message from the user, I'll send that one, I'll drop that one, then the directory server says, it's a wonderful remailer, it's great, it's always reliable, but none of the user's messages get through.

So you have to deal with that issue. Or alternatively, you can, maybe the adversary can make the network look a certain way at some point, and then look a different way later, and some clients will update or not, or something like that. There are a bunch of attacks here that we haven't really sorted out, and we need to figure out what they are and how much we care. Key rotation also, again, becomes an issue. There's a vulnerable point where you're retaining your key, and there may be messages still in the mix that apply to the old key, and the directory server, what do you do, and also, how do you keep the directory servers in line, because that's a window of opportunity for a malicious directory server to play funny games with the network there.

Reputation Based Attacks

The other attack that we're worried about, if our adversary is as strong as the threat model I described at the beginning, they probably have a lot of money, they can probably run a lot of nodes, they can probably run a lot of really reliable nodes. So if we have a reputation system, then we have a way for the adversary's nodes to shoot right to the top and look really, really attractive. Not only that, but now that we've got a reputation system, the adversary has an incentive to put his nodes at the top and DOS all the rest of them. Any other allegedly reliable node, gosh, it keeps going down, I can't understand it. You better use these, they're reliable.

So the reputation system seems necessary to let users have reliability, but it also seems to allow attacks that make sure users don't have reliability or anonymity. And we'll get in part two into the interplay between reliability and anonymity, because attacking reliability can attack anonymity. Okay, so there's another potential open problem here, trickle attack on the directory servers. Imagine, so if you change your key every 30 days or 45 days or something, if the mix rotates its key every month, then you can send a, and Alice sends a message, now one of those nodes can say, gosh, I bet this message will still work in 25 days or something. And then you send it, and it makes its way through the network, and then maybe later on the directory looks different.

A bunch of nodes have shown up and become reliable, and a bunch of nodes that were reliable are no longer reliable. And now you release the message, it's going to go to the ones that were reliable 20 days ago. And you can distinguish it, because, I mean, nobody uses that node anymore except this message that you just released. And so this is an extension of the trickle attack that Len was talking about earlier. And since we're allowing an expiration date on the order of a month, then this still looks like a pretty bad issue.

More broadly, we're still in an arms race against flooding attacks and trickle attacks and things like that, because there are a lot of different attacks that can work there. And we'll talk a bit about alternate solutions like synchronization and so on. Okay, so here's the biggest open problem. This is the doozy of the attack on all anonymity systems. It's called the long-term intersection attack.

Basic idea is not all senders are sending all the time. And if you don't, if you send something and somebody receives something, and then later you send something and the same guy receives something, and then later you send something and the same guy receives something, and then later you don't send something and he doesn't receive something, then the observer is starting to get a lot of information about what's going on. And really the only fix for this is to have everybody online all the time sending all the time. And that doesn't seem very practical. And if you don't have that, then it's just a matter of time before the adversary can win against you.

So that's kind of a big open problem. What we're working on right now from the research side is trying to quantify how bad this is in various situations. If you send this much this often, then are you doing better than if you send this much this often? Maybe. We don't even know the answer to that much at this point.

But we're hoping to at least get some idea of how strong the adversary has to be and how many days or whatever he has to be watching traffic before he can learn. And the other fix is that Alice should never ever do the same thing twice. If Alice only sends one message, she's doing pretty well. On the other hand, that's not a very useful network if you can only ever send one message and then afterwards you can be traced. So this is a major unsolved problem in lots of anonymous systems.

Different Approaches

So we're going to change course a little bit and talk about various designs that are different from Mixminions so you can get a sense of what they're about. Right. We will reference these a bit in part two. So just as an overview, we're back to single-hop proxies. We talked about them briefly in the beginning.

It's just a node which will forward messages through and hide source information. They're seen a lot with web browsing, Anonymizer, the defunct SafeWeb attempted that. They are... Len and I argued about putting SafeWeb on that because he said, don't you dare put Anonymizer and SafeWeb in the same category because SafeWeb has all these attacks and Anonymizer doesn't. And I said, well, this is the theory side. Yes.

So from my perspective, they are identical systems. We'll get in part two to why theory isn't quite enough. Thank you, Roger. The moral of that story is JavaScript is evil. The problem with Anonymizer, Orangatango, et cetera, the other currently running single-hop proxies for web is that they are a single point of failure.

If a single-hop proxy were to be compromised, conceivably they could watch the traffic and know who's doing what. There's also legal compromises there. If they're keeping logs or if they're ordered to keep logs, then they could give that up. And as a user, you have no way of knowing that they're actually honest. They can say, we keep logs, and you can look at, for instance, Anonymizer's reputation.

Their company is based on not keeping logs and so forth, but they're not revealing their users. But in the end, they're trusting a third party entirely. However, there is, of course, the advantage of ease of deployment, ease of use, and reliability. And we'll get into why those are critical security and anonymity aspects in part two. So there's free route mixed networks, which is really what we're talking about when we talk about mixed networks with regard to Mixmaster and so forth.

The user picks a path through the network, and your whole intention here is to hide how this message travels, so you can't link sender or receiver. We need dummy traffic, as we discussed earlier. Examples of this are Babel, which was a research project, Mixmaster, and Mixminion. And then we have mixed cascades. Okay, so there are a bunch of different topologies you can look at.

You can look at free route, where Alice chooses the path that she's going to take. Or you can look at a cascade, where everybody goes through mix 1, 2, 3, 4, and out. And this sort of enforces everybody to join the same batch, the same crowd. So 100 messages come in to node 1, node 1 shuffles them around and sends them all to node 2, node 2 shuffles them, and so on. So the idea here is any one node that refuses to reveal the linkability between a message in and the message out is sufficient to guarantee your anonymity, except the long-term intersection attack that I talked about earlier works like a charm, just like normal.

So the anonymity here comes from the users. It doesn't come from more nodes. It comes from having more people send in to the beginning. If you've got 10 nodes or 5 nodes, it doesn't matter, as long as at least one of them is honest. It also assumes a global adversary, because the notion of the free route idea is, well, it goes through a lot of different jurisdictions, and he probably doesn't own all of them, so he probably can't even see where it goes and will be okay.

Whereas in the cascade idea, if you watch the first node and the last node, then you know all the senders and all the receivers, and our goal is just keep you from figuring out which is which. Trickle attacks work great against a cascade unless you try to stop them, because there are 100 messages coming in, and you just sort of let one message in, send your 99, the batch fires, it goes through, 100 messages come out, 99 of which you know, you win, because it's pretty easy to watch the endpoints. It's only two nodes compared to the 50 that you might want to watch for mixed meaning. So there are a lot of examples on these that tend to be, the only deployed examples of these are for web browsing, for low latency systems, and that's, well, we'll get into that later. Okay, so a totally different design here is called crowds.

It's for web browsing. The idea is a pretty weak threat model. You have a bunch of people in your crowd, maybe 15, 20 people, and your goal is to not let the web server know that it's you. You just want the web server to know that it's somebody in your crowd. So the idea is, when you want to do a web request, you flip a coin, and if it's heads, you go to the website.

And if it's tails, you pick a random guy, and then he repeats. So maybe, depending on which probability you pick, you can choose the expected path length of three or four or five or whatever through the crowd. So the idea there is your request bounces around, and once you've got your fixed path, then you just use it for all your web browsing. That guy is you as far as the web server is concerned. And the reason for that is a statistical attack that is a bit complex, so I won't get into it here.

But the basic idea is the web server doesn't know who in the crowd actually originated the request. On the other hand, there's no encryption, there's no mixing, there's no nothing. So if you're a global adversary, you just watch everything. Some guy makes a request, he's the one who originated it. You know it.

That's it. And if you're just a web server, some guy in the crowd connected to you. Who knows if it was the guy who connected to you that meant to or what. Okay, there's a freedom network, which Zero Knowledge had a while ago. It's also connection-oriented like crowds.

They paid ISPs to run freedom nodes. And the idea there was you tunneled all the traffic. You put up something on the user's machine to grab all the packets, and you take each packet, encapsulate it, send it off. So in the case of crowds, it's just for web browsing. This is for everything.

You can even ping through this thing, and it comes out through the other side of the network. On the other hand, there weren't enough users to make it viable, so they're not running it anymore. Part two can be pretty easily summed up as why freedom should never have been made. Onion routing is another design that's very similar to freedom. You've got long-term connections between onion routers.

We're hoping to have volunteers run onion routers rather than pay them, like in freedoms design. You might want link padding between the routers. Freedom actually found that it was very, very expensive to have link padding between the routers, and they're right. So onion routing probably doesn't want that either. An interesting thing to note for these low-latency systems is that we're aiming for security against traffic analysis, not traffic confirmation.

The trick there is that if the adversary sits at the beginning and sits at the end of the path, then he can use timing attacks and packet counting attacks to notice which one's which. In the case of email, it's okay if it takes five hours to deliver, and that five hours lets it mix a lot. In the case of a webpage, five hours is not good. So we want people to actually use the system, and that means that it has to be fast, and that means if you're watching both endpoints, then you win. So onion routing is trying to deal with a different threat model here.

And users should maybe run a node or anonymize their connection to the first node for better privacy. Okay, so we've got a little bit of time to run through some controversial design issues. One of them is cascades versus free routes. If you talk to Andreas Pfitzman, the professor in Germany who knows everything about anonymity, he explains it very clearly. If you use a free route, it's very complex.

Who knows? If you use a cascade, you have provable anonymity. Which should you pick? And we're still using free routes, because they're easier to actually implement on the internet and deploy with a bunch of different volunteers everywhere. So we need to investigate the anonymity that we're actually getting, because he will admit that you might get good anonymity, maybe even better anonymity out of free routes, but it's not provable.

Yes? Andreas Pfitzman, ask me later, I'll spell it for you. Okay, synchronous versus asynchronous is another issue. Most of the arguments that Andreas has for why cascades are better than free routes is that all of the messages go through in synchrony. They all go through together.

It could be that if we could set up a synchronous free route system, where you've got a bunch of different nodes and things can go all over the place, but they all sort of move forward in synchrony and all the hundred messages that came in come out at the same time. Maybe every 30 minutes there's a deadline by which you have to have put it one node forward in the path. Then maybe we can get at the same advantages of cascades without the disadvantages of them. Right now, we're using free routes, but if we find a cascade, why not put it at the beginning? That's potentially a good idea.

Another question is why not put a cascade at the beginning, because then the adversary... So for the choosing k crossover points, it might be that we want to put a cascade at the beginning, because otherwise the adversary can say, oh, well, five went that way, five went that way, five went that way. I'll tag two out of each, and I'll try to recognize them at the crossover points. Whereas if you've got a cascade at the beginning, then it's 15 come out, and he has to tag like six and hope that it turns out to be two of each going into the free route piece of it. Another answer, it could potentially be a really good idea to have a cascade at the middle, because then you get the advantage of mixing everything in one point, and you also have the advantage of lots of different jurisdictions around. So yeah, that's potentially a good idea.

On the other hand, it makes your path much longer. And as we'll see in part two, maybe we don't really want a very long path, because then users don't have reliability or speed or things like that. Okay, so another issue here, should you run your own node? Do you get better anonymity by running a remailer than you get by just being a user? Certainly against the intersection attack, you're doing a lot better, because a lot of messages come in and a lot of messages come out, and maybe one more comes out and nobody will notice as much.

So you can send more unobservably. And cover traffic helps here also, because you've got a bunch of other people sending through you, so you can receive messages to you without people realizing that it was anything other than a dummy. This of course presumes that you're able to run a node within your threat model. If you are running a node, and you're one of the human rights workers in some foreign country where anonymity is frowned upon, you're already in trouble just for running a node. So that's only a case where you have external circumstances that allow it.

The China Issue

Yeah, the question that, Brandon was talking about the super worm and the China problem yesterday, and the China problem is a big problem for us also. How do you have a user in China using the remailer system, because he has to have our software to get his anonymity, and how does he get it, and how does he keep from getting thrown in jail when they find out that he has it, and so on. So that's yet another issue. Okay, so we've got sort of a teaser for the next talk, and also sort of a summary from this perspective. You can really look at anonymity from an economics perspective to get a sense of which parts ought to work, and things like that.

So the first thing to look at, unlike encryption, you actually need an infrastructure and a lot of other people for anonymity to work. In the case of security, in the case of confidentiality, Alice gets up one morning and says, I want to send a confidential message to Bob. I'm going to use his public key, I'm going to use this PGP software, I'm going to do it all by myself, and it'll work great, I trust it. So she uses the software, sends it off, it's encrypted, she gets the job done. She cannot wake up one morning and say, I want to send an anonymous message to Bob.

I'll just use this software here, and send it off to Bob, and it'll be great. You need an infrastructure of nodes that are willing to forward things for you, and you need other users. You're hiding in the other users. So this is different and harder than encryption. Particularly the issue of other users, what we really want are lots of users who are using this for legitimate reasons, whatever they may be, wherever they are.

Somebody who says, I don't need an anonymity system because I don't do anything that's sensitive, are the people you want using the system. You want to make it so that using an anonymity system does not mean that you are doing any of these activities that may be frowned upon by wherever you're actually sitting. You want possible deniability, you want to be able to say, well, the person who was posting anonymous cheesecake recipes to use that, that was me, if ever you're put in that situation. So this leads into you really need to make the system inviting the people who don't particularly care about their anonymity so that they will use it. If their desire for anonymity is only some percentage of their overall experience concerns, you want to make sure you get that so you get them into the system, and not make them go through a lot of effort to use a system that doesn't give them as much return as they're actually giving to the system.

Question? Yes, there are a number of research designs on that where, yeah, we'll get into that in part two because we don't have enough time. Come back after lunch. Yes. Yes.

Internal Messages versus External Messages

Well, because that's, again, the problem of internal messages versus external messages. If there's lots of white noise within the network, but only a few messages actually being allowed to exit the network, and those are the legitimate messages, then that doesn't address the problem. For instance, if everyone using the network is a political dissident in China, it doesn't matter if they know what you're actually saying. The fact that you're sending something says that you're a person on their bad people's list. You need to make sure that everyone using it is doing different legitimate things with it that are all, you know, basically indistinguishable, so you can't nail down one person as being an anonymous sender of this particular bit of information, and we can't really solve that with dummy traffic because, again, dummy traffic is not any real message.

It's just a message saying, I'm a dummy, delete me, when it hits the end of its chain. It's harder than that because you need to get messages that actually look like the human meant to send them and wrote them at that time and so on. Anything automated can also be automatically detected and filtered out, and then you're left with the messages that were actually sent, and it's a little bit harder than what I just said. Maybe that's a better technique than not doing it. And actually, let's hold.

We're going to get into this more deeply in part two, and we want to finish up the slides here and get to questions, so let's hold now until then the slides. So we need a lot of traffic in order to attract users who want that traffic. On the other hand, most people don't actually care. Most people don't actually want to use a complex system. Most people don't want to download new software that just provides a little bit more anonymity.

Maybe they can use Hotmail and they're happy. So we need to somehow provide value to those people so that we can get them as users, and then we might be able to attract the people that we're actually trying to write this for. That leads into the issue of weak security systems. They have other benefits. They're fast, or they're more reliable, or they don't require client software.

Any of the trade-offs that we need to make in order to make systems more appealing to users can end up getting more users using that system, and therefore, at the end of the day, possibly be stronger anonymity systems than the strong anonymity systems which require users to go through all these hurdles. We will discuss this extensively in part two. The key part here is it depends on what adversary you're up against. You may have a system which is weaker against the global adversary, but much stronger against other adversaries which may be more important to your users simply because of the larger number of users, the larger anonymity set. So the trick there is that that's very counterintuitive to the research world because we look at papers and we say, well, that's not secure.

Look at this attack I can do. And the answer from the legitimate world, from the real world, from the people actually using it, is I don't really need to deal with that attack because if I implement the fix that you describe, I will have no more users, and my anonymity will be gone for that reason. All of this is about tradeoffs. Every decision we make usually involves some sort of tradeoff, and we need to keep in mind that usability is a very important security factor, and anything that decreases usability needs to be justified for that tradeoff. So for the economists out there, I've got an interesting result, or the P2P enthusiasts out there, I've got an interesting result.

Free Rider Problem and Incentives

Anybody working on peer-to-peer systems like Gnutella and so on, they say, they're these free riders. They keep using our system and they're not running nodes themselves or something, and they're not providing service themselves. They're free riding, they're bad. How do I get rid of the free riding problem? And in our case we say, we must have some free riding.

We must have some users who don't really care and don't want to provide anything to the system using the nodes of people who do. So this is sort of a unique case in the peer-to-peer world where we require free riding or we don't get our anonymity. And we don't want too much, obviously, because then you won't have enough nodes to handle the traffic. But you need some optimal amount where you've got the last point. Yeah, with the free riding issue, it comes back to, we were talking about the synchronous and asynchronous batching and delivery.

Actually in that case, if you want synchronous, you want more free riding because you have more messages per each batch, per each remailer delivered. Again, how much resources do each individual node want to contribute is a question in how many nodes per how many users. Really the best thing is you want to have the minimal number of nodes possible that still provide a trusted chain and will each have a high number of users. Of course that means somebody's donating resources into the system and that's an issue we'll get into in part two as well. But you have certain people who particularly have a strong reason to want anonymity.

They're the ones who have the incentive to run nodes. They're either users themselves, they can be certain that the first node in their path is good. They basically can avoid the global adversary knowing when they're sending. They just leak it in with all the noise coming out of their node. And of course they have covered traffic from other users coming in through their nodes.

Also it could be somebody who is not actually using the anonymity system but wants to receive anonymous correspondence from somebody elsewhere who can't run a node, set it up and invite them perhaps human rights organizations that want to encourage people to report abuses. They'll run nodes. So you come back to the issue of who's willing to donate what resources. There are certain key users who may be willing to put more into the system than they get out because they do get it back in anonymity by inviting other users to use it for free. So in part two, after lunch, we're going to start looking at which of these attacks do we actually see and which do we not see.

How much do they matter, how much damage do they do, and which ones should we actually try to fix rather than just saying, well, if we fix that then we're going to lose all our users. So stay tuned for that, and we've got some time for questions, I think. Okay, we will get into that more in part two, but hold that a second. Are there any other questions? I'll come back to that at the end of this so the other people can... Yes?

... Well, you would have to do certain things to defeat censorship within China. That's sort of an out-of-band issue. I could talk to you more about that afterward. Anonymizer is actually doing things in that particular arena. But it is a big issue because if you've got people in China saying, oh, no, no, no, you don't want to use that one, you want to use this one over here, then... And that's also covered extensively in part two of the talk, so come back after lunch.

P-F-I-T-Z. The question about spam basically is if you're using the re-mailer network, you also have to burn CPU cycles. You need to create these messages. You need to send them. An entrance node will flag you if you're sending too many messages.

They will see that this is... They'll think it's a flooding attack. They have countermeasures in there because sending lots of messages, they don't know or care if you're a spammer. You could be trying to flood the network to compromise anonymity. So you will have to trickle your messages in. Even if there's 50 nodes and you spread it out across those 50 nodes, most spammers send a blast of a million messages.

Each of those nodes is going to notice. So you need to trickle it in over time. And really, you could get your message out much quicker to many more people just by using an open relay in Korea. That's a good question. The spammer could then be introducing more messages into the network and they would look like legitimate traffic.

But again, this is a lot of work. Well, here's the... If it was so economical, why have they not done it? Yeah. Honestly, spam is cheap. Remailer nodes are not cheap.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
bh-us-03-sassaman-conference-slides.pdf		bh-us-03-sassaman-conference-slides.pdf
remailers_illustration.webp		remailers_illustration.webp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

bh-us-03-sassaman-conference-slides.pdf

bh-us-03-sassaman-conference-slides.pdf

remailers_illustration.webp

remailers_illustration.webp

Repository files navigation

Attacks on Anonymity Systems: The Theory

By Len Sassaman and Roger Dingledine

Presented at the Blackhat Conference in 2003

Introduction

Adversaries and Threat Models

Remailer Goals and Design

Directory Servers and Reply Blocks

Monitoring and Attacks

Resource and Blending Attacks

Dummy Messages

Spam and Abuse

Improvements to Mixmaster

Esoteric Attacks

Trickier Attacks: Tagging

Reply Blocks in Mixmaster

Mixminion Design

Subtle Problems with the Design

Reputation Based Attacks

Different Approaches

The China Issue

Internal Messages versus External Messages

Free Rider Problem and Incentives

About

Releases

Packages

Dicklesworthstone/sassaman_and_dingledine_on_remailers_at_blackhat_2003

Folders and files

Latest commit

History

Repository files navigation

Attacks on Anonymity Systems: The Theory

By Len Sassaman and Roger Dingledine

Presented at the Blackhat Conference in 2003

Introduction

Adversaries and Threat Models

Remailer Goals and Design

Directory Servers and Reply Blocks

Monitoring and Attacks

Resource and Blending Attacks

Dummy Messages

Spam and Abuse

Improvements to Mixmaster

Esoteric Attacks

Trickier Attacks: Tagging

Reply Blocks in Mixmaster

Mixminion Design

Subtle Problems with the Design

Reputation Based Attacks

Different Approaches

The China Issue

Internal Messages versus External Messages

Free Rider Problem and Incentives

About

Resources

Stars

Watchers

Forks