Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
To understand how Hive2Hive works under the hood, it is necessary to understand the overall concept. This page explains how the components are associated and interact.
Don't be shocked, it's actually pretty easy!
- Distributed Hash Table (DHT)
- User Management
- File Management
- Hive2Hive Model
Distributed Hash Table (DHT)
A DHT is a network to which different machines, called peers or nodes, can join. A DHT is a so-called structured overlay-network because all peers together form a ring (or a tree) structure. Every peer receives an ID that is used to find and locate it within the DHT. With this, data objects can be stored in a distributed manner by asking peers to store it. Thus, every peer can store and retrieve data from any other peer. So as to store a data object, its is assigned an ID within the range of all peer IDs. Having the same ID ranges helps assigning responsibilities over data objects: For any given data object ID, the peer with the closest peer ID is responsible for the storage and replication of this object.
To see how Hive2Hive uses such DHT technology, please refer to the TomP2P wiki section.
The central element of the user management in Hive2Hive is the user profile. This user profile contains all relevant information about a user in the network. Thus, a user’s profile has not much in common with those profiles on social platforms like Facebook. In contrast, it must be kept private!
A user profile holds the following information:
Although the user profile itself must remain private, the user ID can be made public, e.g., to retrieve invitations.
Every user using Hive2Hive can potentially possess several client machines. These clients may even be online at the same time. So in order to know what clients of a user are online, some lookup mechanism for their respective locations is required. For this reason, a per-user list, called user locations, is published publicly in the DHT, such that everyone can find it.
Thus, deriving the Location Key for these locations is easy and can be achieved by hashing the User ID.
The user location list contains a list of all online clients (IP address and port).
The user locations should always remain up-to-date. When a user's client logs in, a reference to it must be added to the list. When a user's client logs out, the reference must be removed (friendly logout). Since unfriendly logouts need to be considered as well, the user locations are checked and cleaned every time a client detects an inconsistent state.
The user locations are used for the following:
In order to keep track of a user's files, her User Profile contains a tree of indices. This index tree is equitable to the file tree on the user's local disk.
For every file or folder that gets stored with Hive2Hive, an index is created:
A file index comprises the following:
MD5 Version Hash
Each file index also contains a MD5 Hash of the newest version of the associated file. With the aid of this hash, the synchronization process is much easier. Comparing changes on the local disk and in the network is much faster than re-hashing all files when a comparison takes place. As a drawback, however, this hash needs to be updated as soon as the content of the associated file is updated.
In contrast to files, a folder can be shared. Thus, a folder index holds the following:
- An Authentication key pair:
- Sharer List
In order to keep track of all users that share the associated folder, a list of users having access to the folder is kept in the folder index. This list is considered when sending Notifications as soon as a file within the folder has been added, updated, moved or deleted.
A meta file is a separate object in the DHT. It contains the following meta information about the associated file:
A file version represents a single version in time of a file. It contains the following information:
- Version Counter
- File Size
- List of File Chunks
For a better distribution of all data in the network, files are chunked to a user-configurable size. All chunks of a file are encrypted with the same public key that is stored in the Meta File. The Location Key of each chunk is randomly generated, ensuring a uniform distribution among all peers in the DHT.
Chunks are the essential data parts in the DHT. To achieve an efficient replication, the chunk size should be rather small and proportional to the assumed bandwith. The bandwidth may differ for each application: Some may need to synchronize chunks over the Internet while others only act in a LAN with higher throughput, lower error rates and latency.
This class diagram shows the above mentioned components and shows their relations. All elements inherit from NetworkContent that ensures the proper handling of serialization, versioning, conflict handling and time-to-live.