Design Document

The Lethe Project

1. Purpose

The Lethe Project is the design and implementation of a decentralized network of highly-configurable nodes. Two primary services are provided by the network. The first is secure and durable data storage. The second is secure communication. While security is one of the core design considerations of this project, the full anonymization of users is considered outside its scope, and is a responsibility left to the user.

2. Design

The Lethe Project is a decentralized network of nodes. Each node has its own unique identity, established using OpenPGP. Following, each node has a Ed448 key pair, of which the private key is used to generate signatures for all transmitted data. Nodes can also sign the public key’s of other nodes, which is the processed that defines the network.

2.1 A Trust-Based Network

A node’s network is the set of nodes that are within its web of trust and that it has not flagged as excluded. It is with this network that the node can utilize the various functions of the Lethe Project, such as messaging and file storage.

2.1.1. Defining Trust

When a node signs the public key of another node, the former node is consider to trust the latter. Specifically, it directly trusts the other node. If the other node trusts a third node, then the first node is considered to indirectly trust the third node, assuming the first node is configured to accept such a degree of separation of trust. Trust can be revoked using revocation signatures.

Figure 1. An example of a network.

In the example in figure 1, Alice and Bob are in direct mutual trust with each other. Bob directly trusts Carol and Carol directly trusts Alice. Direct trust is considered to be a degree of separation of one. If Alice is to trust Carol, directly that is, and Carol is to trust Bob, then both Alice and Carol must accept a degree of separation of at least two. Assuming that is true, Alice's network consists of Bob and Carol; Bob's network consists of Alice and Carol; and Carol's network consists of Alice and Bob.

2.1.2. Network Communication

Nodes that directly trust each other send requests and responses to each other directly. If a node does so with a node it indirectly trusts, it does so indirectly. Specifically it tunnels the request or response through a path of nodes starting with one it directly trusts and ending with the target node. The purpose of this tunneling is to allow for some level of anonymity, in terms of IP addresses, between nodes of indirect trust. Determining the path for communication is done using Dijkstra’s algorithm.

2.1.3. Trusting a Node

The process of one node trusting another is as follows: (1) the first requests the second’s public key, (2) signs the key with its private key, and (3) sends the signature to the second node. This can be done directly, or through the nodes already in the network. The first node must retain the second node’s public key and the signature it generated and provide them too all nodes that request them. This allows for the discovery of other nodes.

2.1.4. Trust Revocation

A node can choose to revoke its trust of another node. The process is as follows: (1) generate a revocation certificate for the target node’s public key and (2) send it to the target node. The node doing the revocation must keep this certificate and provided to any node when requested.

2.1.5. Exclusion Flagging

A node can flag nodes in their network as excluded. This does not constitute as revoking trust. Rather, it is used to ensure the durability of data the node will be storing on their network.

Figure 2. A problematic network structure.

In example in figure 2, there are two network structures, A and B, which contain some set of nodes each. All of the nodes in the structures, Alice, and Bob trust each other. Let there be a node in structure A named Carol and that they have data stored in structure B. If either Alice and Bob were to revoke there trust of the other, Carol would lose access to their data in structure B. If instead, Carol had flagged either Alice or Bob as excluded, prior to either of them revoking their trust of the other, Carol would have not lost any data, as it could not have been stored in structure B prior. Alternatively, some of the nodes in structure A and B could trust some of the nodes in the other, increasing the number of paths between the two structures, if not merging them entirely.

2.2. File Storage

Nodes may store files on their network. These files processed using compression and/or encryption as configured by the user. They are fragmented into shards. These shards are duplicated for the sake of durability and then stored amongst the nodes of the network.

2.2.1. Shards

To store a file on the network, a node first processes its name and contents through some user-configured sequence of transformations; compression and encryption is suggested. The processed file content data is then split into fragments of a 100 MB in size. The last fragment will typically be small than 100 MB, as it will not be padded. The fragments are then encapsulated in shards.

name	file_name	contents	transformation_uuid	file_uuid	shard_uuid	created	signature	checksum
size	?	<= 100 MB	16 Bytes	16 Bytes	16 Bytes	64 Bytes	114 Bytes	32 Bytes

Figure 3. The Design of a shard.

All UUIDs in a shard are generated using UUID V4. The transformation_uuid corresponds to the user-defined sequence of transformations used to compress and encrypt the file_name and contents fields. This is to allow the node that generated the shard to identify what transformations were used, but to obfiscate such information from other nodes. The file_uuid field corresponds to the file and the shard_uuid corresponds to the shard. The created field describes the number of milliseconds between the creation of the shard and 1 January 1970. The signature, using the node’s Ed448 private key, is generated from the concatenation of the file_name, contents, transformation_uuid, file_uuid, shard_uuid, and created fields in that order. The checksum is generated from the concatenation of the file_name, contents, transformation_uuid, file_uuid, shard_uuid, created, and signature fields in that order.

2.2.2. Shard Duplication

The formula $f(x)=floor(min(4, x) + max(0, Glog_10(x) - G))$, where x is the number of nodes in the network and G is the growth factor, is used to determine the number of duplication.

Figure 4. A graph of the function for shard duplication from 1 node to 100, where G = 4.

The formula for determining the number of duplicate shards is completely arbitrary. What nodes are chosen is random, constrained by two conditions: will this exceed the storing node’s storage quota and will this exceed the storing node’s storage bandwidth. If either of these are true, then the node is not considered during selection.

2.2.3. Storage Quota

Every node provides some fraction of their storage capacity, a storage quota, to the nodes in their network. In terms of shards, the formula for the storage quota is $g(x) = S / x$, where x is the number of nodes and S is the total storage capacity of the node. To account for shard duplication, the functional storage quota of each node on the network is $h(x) = g(x) / f(x)$.

Figure 5. The total storage quota of a node in a network, $j(x)=Sx/x/f(x)$, where $S = 100$, $G = 4$, and x is the number of nodes.

In the example in figure 5, despite the total number of nodes increasing, the storage quota for each node decreases. This is because the growth factor G, is greater than 0. If it was equal to 0, the storage quota would flatten at x = 4 onward. The following section describes an approach to mitigate this.

2.2.4. User-less Nodes

A node has the ability to flag itself as user-less. This means it will not be capable of using the messaging or file storage services of the network. However, it will still provide itself for those services. Thus, for the formula for shard duplication, x refers to the number of nodes minus the number of user-less nodes. The formula for determining a node's file storage quota, in terms of shards, on a network should then be $j(x, x_u) = Sx / x_u / f(x_u)$, where x is the number of nodes, x_u is the number of user-nodes, and S is the storage capacity of every node.

Figure 6. Example of each node's storage quota, in terms of shards, where $S = 100$, $G = 4$, and $x_u = log_2(x) + 1$.

The example in figure 6 shows that if user-less node growth is linear and user node growth is logarithmic, then the storage of each node in a network will increase as the number of nodes increases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!