Hidden Services Specifications for anonymous seeding

Rob Ruigrok edited this page Aug 21, 2015 · 14 revisions

Hidden Seeding Services Specifications

The design specification described is mostly based on the ideas and excellent work of the people behind the Tor Project. Tor Hidden services are the leading solution for anonymous webhosting, but unsuitable for video streaming - like YouTube, because it is too slow. Tor also depends on a number of `trusted' central directory servers. Our approach uses the UDP protocol and does not rely on central directory servers.

Circumventing central directory servers

In the original Tor design, central directory servers called HSDirs are used for retrieving information about a hidden service, like the service-key and public keys of the introduction points. In a peer-to-peer environment such a central server does not exist, the protocol works in a fully decentralized network of BitTorrent peers. Our solution for retrieving the essential information to connect to a hidden seeding service is by adding an extra message in the protocol: key-request. When an anonymous torrent got announced in the DHT, the introduction point should be asked for e.g. the public key of the session of the seeder by means of a key-request message. Both the seeder and downloader use circuits for this key requesting mechanism, but with the current implementation the introduction point on itself knows which info hash is shared and what the rendezvous point will be. This is a known weakness in the protocol, but is to be solved later on in future work, when opportunistic encryption in a web of trust is reality.

Circuit setup

The introduction points and rendezvous point for downloading over hidden seeding services should always be connectible, to allow a downloader to build a circuit and connect to the introduction point of the seeder, and to allow the seeder to build a circuit to the rendezvous-point of the downloader. The current approach in the tunnel community is to require every exit-node in the network to be connectible. In this case, there is no doubt about the connectability of the introduction point, as the introduction point itself is in fact an exit-node of a circuit initiated by the seeder. This also solves the connectability problem for the rendezvous point, as the rendezvous point is in fact an exit-node of a circuit initiated by the downloader. Solving the connectability problem for the introduction and rendezvous points is not essentially the same: the introduction point always needs to be connectible for strangers, but an unconnectable rendezvous point can be punctured by letting the hop that needs to connect to the rendezvous point propagate its identity back to the seeder, via the circuit to the introduction point relayed to the downloader, than propagated to the rendezvous-point which on its turn sends a dispersy puncture to the last hop. This will only work for the rendezvous point because there is already an existing circuit around. For the introduction point it will always be necessary to be connectible for the outside world.

Dispersy message cells

Most message cells consists of a circuit_id and identifier in their payload, these are required to identify to which circuit a particular message belongs, and for acknowledgement of messages. The circuit_id field is 4 bytes long, and identifier field is 2 bytes long. The following sections describe a scenario where Bob owns some files, and wants to share them via peer to peer over the BitTorrent network. Alice is interested in downloading this file.

Setting up Introduction Points

In preparation of seeding files over the BitTorrent network, Bob builds up at least one anonymous circuit to let another node serve as introduction point for his seeding services. The original info hash of the torrent is prepended with a string tribler anonymous download, and the SHA1 hash is calculated over this string, resulting in a modified info hash to be used for finding hidden services. For each file Bob is seeding, a unique keypair is generated and stored in the session.
By sending an establish-intro message with the modified info hash of the torrent file to the last hop of a circuit, this last hop becomes an introduction point for the modified info hash. The payload byte format of the establish-intro message is shown in figure 4.3.

After receiving an establish-intro message, the introduction point responds with an intro-established acknowledgement message back to the seeder. The introduction point will also announce the torrents' info hash on its dispersy port into the Mainline DHT. This way, the DHT can be queried to return introduction points for a given info hash. The intro-established message does not have any additional payload. The payload is shown in figure 4.4.

Finding peers to download from

When Alice knows the info hash of the torrent file seeded by Bob, she can calculate the modified info hash by prepending tribler anonymous download and calculating the SHA1 hash on this string. By querying the DHT for the modified info hash, she finds Bob in the list of introduction points. A direct DHT lookup by Alice reveals her interest in the torrent file and leaks her privacy. To prevent this leakage, Alice asks for peers over any circuit from the pool. She sends a dht-request message with the modified info hash over this circuit. The byte format of the payload is shown in figure 4.5.

The last hop receiving the dht-request cell will do a DHT lookup for the info hash, and the returned peers are packed into a dht-response cell. The byte format of the payload is shown in figure 4.6.

When Alice receives the dht-response message, she will likely find the introduction point chosen by Bob in the list of peers.

Getting keys

When Alice knows the location of the introduction point of Bob, she needs to know the session key of the info hash seeded by Bob. She sends a key-request message to the introduction point. The introduction point will forward this message to the seeder. The byte format of the payload is shown in figure 4.7.

Note that the circuit-id is omitted in this cell. This is by design, the key-request cell is exited as a dispersy message over the exit socket of an existing circuit with the introduction point as its final destination. Therefore no circuit is involved, as the introduction point does not receive the message over a circuit. The byte format of the payload stays the same after the relaying by the introduction point. The only difference in payload is the cache identifier which is replaced by an identifier pointing to the relay message cache of the introduction point. On receiving the key-request cell, the seeder replies with his session key for the info hash in a key-response cell. This cell contains the session key of the seeding torrent chosen by the seeder, and an encoded list of other peers and keys the seeder already knows about. The byte format of the payload is shown in figure 4.8.

When Alice receives the key-response message, she has all the necessary information to create an end-to-end encrypted circuit to Bob, and if Bob appended more peers to the key-response, she is able to initiate more end-to-end encrypted circuits to other seeders if Bob supplied some exchanging peers in the payload (analogous with PeX).

Create end-2-end

Alice sends a create-e2e message over a circuit that will exit over the exiting socket into the introduction point of Bob. The byte format of the payload is shown in figure 4.9.

If the introduction point recognizes the info hash received by the create-e2e, the message is relayed onto the introduction circuit leading to Bob.

Requesting end-to-end connection

Bob establishes a rendezvous circuit to a rendezvous point (RP). This circuit is required to have a connectable hop at the end of the circuit, as the rendezvous point is required to accept inbound connections from the downloaders' circuit. After building the circuit, an establish-rendezvous message will be send over this circuit to the last hop, with a random chosen single-use rendezvous-cookie as payload. The rendezvous point is now waiting for an inbound connection with a valid cookie, to link the end-to-end circuit. The byte format of the payload is shown in figure 4.10.

After receiving the establish-rendezvous message, the node is marked as a rendezvous point, it will associate the circuit it is connected to the received rendezvous-cookie. It will then reply with a rendezvous-established message back to Bob. The byte format of the payload of this message is shown in figure 4.11.

When Bob receives the rendezvous-established, he can acknowledge the create-e2e message with a created-e2e message. This message contains all the information needed by Alice to build a circuit ending in the rendezvous point chosen by Bob. It is sent over the circuit ending in the introduction point of Bob. The introduction point will look up the corresponding circuit identifier in the payload, and relay the message downwards into the exiting socket from the exit-circuit initiated by Alice. The byte format of the payload for this message is shown in figure 4.12.

Linking end-to-end connection

When Alice receives the Alice builds a new circuit ending at the rendezvous-point of Bob, and sends a link-e2e message along this circuit. The byte format of the payload is shown in figure 4.13.

When the rendezvous point receives a link-e2e message and the rendezvous-cookie provided in the payload is the same as the cookie that the seeder communicated earlier, the two circuits are combined into 1 circuit, where inbound and outbound data is relayed into the circuits replacing the exit sockets. The linking of the circuits is acknowledged by a linked-e2e cell back to Alice, without additional payload. The byte format of the payload is shown in figure 4.14.

When Alice receives a linked-e2e message, the handshake is completed. Alice initiates the downloading via a libtorrent session that gets its data over the circuit, via the rendezvous point, from Bob. This circuit is end-to-end encrypted between Alice and Bob, making it impossible for outstanders to see what data is transferred over the circuit. Moreover, Bob and Alice are communicating with each other, but don't know each others real identity.

Dispersy cell types

To set up a hidden seeding service, tunnels from different entities have to be created in parallel with each other. Table 4.1 explains which messages are transferred for setting up a connection between a seeder and downloader.

Figure 4.15 provides a schematic overview of the message interaction of all dispersy message cells in the protocol.