New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce memory usage of bittorrent v2 merkle trees #4373
Comments
I'm leaning towards an experiment with (2). Also, change the This would also clean up the ownership of the merkle trees. Currently, they are owned by the |
Typically the vast majority of hash requests will be for the piece layer, so I agree option 2 is a good idea. I'm skeptical about moving the merkle trees out of |
Binary trees can be stored quite easily in breadth-first order in an array. Since the padding required in merkle trees is at the rightmost leaf nodes they don't need to be stored. Consider a worst-case example of 8+1 leaf nodes. Assuming 20 bytes/hash that gives a minimum of 180 bytes for just the leafs/pieces. 8 nodes would result in a perfect tree of depth 3 with N=15 or 300 bytes (+66% over minimum). The 9th node adds the same number of nodes plus a new root increasing the depth to 4. But since the remaining 7 leaf nodes are just padding we don't need to store them. That results in N=24 or 480 bytes (+166% over minimum). This goes towards +200% worst case as the number of pieces goes towards infinity. Would that really be that big of a problem? |
The benchmark is the memory usage of v1 torrents. A few things make this problem worse than with v1 torrents. In v2 the hashes are larger than v1 (sha-256 vs sha-1), so 64 bytes vs 20 bytes. The info-dictionary is always kept in RAM to support magnet links (sending the meta data to other peers). In v1, the piece hashes are part of the info-dictionary, so they are "free". As in, the hash checker can just look into the info-dictionary blob for those. In v2 the piece hashes (technically, the piece layer) is not part of the info-dictionary, but part of the outer torrent file. This means they have to be allocated separately. Not the end of the world, that memory would have been allocated as part of the info-dict blob anyway. The root hashes for the files are part of the info-dict though, so they can be referenced directly from the blob. In the current implementation, the root hashes are copied into the "filled" tree, wasting 64 bytes per (non-pad) file. Each file has its own tree, which means each file has its own padding. Empty files and pad files don't have a tree, but they still have an empty My ideas so far are:
|
a binary tree (whose all levels are full and only the highest level is partially filled but all empty leaves are grouped together at end) is very simple to represent as a vector, where item [0] is the root, items[1] and [2] are its immediate children. The index in the vector maps directly to the ordering of leaf nodes and the starting index for each level is always a power of 2 minus 1.
So you don't need any tree structure to store the tree, you can even keep the tree on disk instead of memory and a direct access is possible. For checking a file and locating an error, you can read jsut the minimum from the start of the vector of hash with minimal I/O, and esily locate which part has wrong signature, then recurse down by using a sequential I/O from this vector until you reach the leaf level. All this is easily and efficiently cached. So all you need to keep in memory is the root hash; when there are many torrents, this will just require a few bytes per torrent, just the Merkle Tree root hash for that file. The costs will be the same order as with Bittorrent v1 which just uses a single hash (20 bytes with SHA1) for the whole file. |
@verdy-p I don't understand what your point is. you are describing the data structure for a binary tree. why? |
No I describe the storage as a single indexed vector, without any linnks between nodes; the vector has a predefined size directly tied by the total filesize in blocks multiplied by the size of each block size. This just creates a single integer-indexed block. |
I still don’t see your point. You are describing the current implementation. My point with this issue is that it’s using too much RAM |
Your padding algorithm is incorrect: for a tree with 9 for 9 file blocks), you need 9 leaf nodes: ok there's no padding for that level that could store up to 16 nodes. You can save ALL paddings at every level ! This is a "worst case" (the worst cases are when there are (2^N+1) leaf nodes, the best cases are when there are 2^n leaf nodes) but still the total is 20 nodes, but NOT 24 as stated incorerctly above by xnoreq. When N grows to infinity, this worst case goes towards +100% for storing the parent nodes. You have then (for a tree degree=2):
And as I stated, leaf nodes (because they represent smaller sections of files) do not require the same hash length than parent nodes. If you want to reduce more the overhead of parent nodes, you can reduce the hash size of leaf nodes (SHA1, or truncated SHA2, is safe for small 4KB blocks for all files that are larger than 4KB, given then you still can use SHA2 for their parent nodes in the Merkle tree; this just requires that when indexing files blocks are hashed in parallel with SHA1 and SHA2, but you'll store only SHA1 in the leaf nodes, and will use SHA2 for all other parent nodes, or for the root node which is also a leaf node when the file size is lower than 4KB. The additional benefit is that you have two different hashing algorithms that is also increasing dramatically the resistance to collisions (better than when using SHA2 alone) if there are some known patterns that are attackable with a single algorithm, so you also gain in terms of security (this is when files are published witth two hashes: SHA1 and MD5: MD5 is known to have low resistance, SHA1 as well, but not their combination; and in a Merkle-tree you're ready for using two hashing algorithms in parallel, you just don't use the same algo for the different levels, you just have to use the same algo inside the same level of the tree, and don't need to store the extra parallel hashes at higher levels, because they are replaced by the hash at lower levels) Now if you have two different algorithms (SHA1 for leaves only, and SHA2 all other parents), it is very safe to increase the degree of the tree, and you can save more on the total size for parent nodes. If leaf nodes are 4KB and parent nodes are at least 64KB, the degree of subtrees for the level above leaf is 16: a single SHA2 hash is used in the blob, for each range of 16 leaf blocks hashed with SHA1. For more levels (if they are needed) I suggest keeping the degree at 2 (and still use SHA2 for combining hashes of nodes at the next level), to preserve the miminum size to download from sources if a corruption is detected; and the combination function to use should be a HMAC, possibly keyed by the starting position in file, based on the appropropriate degree and the hashing function (SHA2). |
Now we're getting somewhere. It sounds what you mean to say is "you can save some memory by not storing zero-hashes at the leaf layer". There are no padding at any of the internal levels. What do you mean by saving padding at every level? In either case, I still don't see how this addresses the problem. You seem to assume the RAM usage is dominated by padding nodes/hashes. I don't think that's the case. Yes, it doesn't make sense to store zeroes, but I don't think that's going to make a big difference. Do you see any reasons to not implement my ideas I outlined above? |
You mean in a world where BEP52 says something else, right? If you want to discuss BEP52, please do so here. |
And yes mean not storing any zeroes padding at EVERY level! there are NEVER needed. |
@very-p It's funny that you wrote pages upon pages to explain something trivial which I had already mentioned before and very succinctly in two sentences:
Also, yeah, intermediate nodes, if they were zero, would not need to be stored (which btw directly contradicts the suggested storage, confer with your own indexing examples) but this is not the case in bittorrent anyway as explained in BEP 30. |
This is the main piece of addressing this: #4626 |
libtorrent version (or branch): master
Currently,
m_merkle_trees
use more than 85% of total memory usage ofclient_test
, when loading many torrents. The more torrents, the larger share.I would like to come up with a more memory efficient representation. Some ideas:
consolidate the allocations of all the
sha256_hash
nodes in the trees, into a single vector, and make them_merkler_trees
reference that storage. The hope is that the overhead of many small memory allocations would be saved.change the data structure for the merkle trees where not every layer is pre-computed, and asking for a hash would sometimes require some computation. Maybe it would even be acceptable to always recompute nodes (say, all nodes but the root and piece layer).
@ssiloti do you have any ideas?
The text was updated successfully, but these errors were encountered: