-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Motivation
It is in this library's best interest to maximizing speed/throughput to be a viable alternative compared to other existing options by reducing roundtrip requests, reducing latency, integrating intelligent request patterns etc Below will be a list of possible options which can be pursued as viable alternatives which are non exhaustive.
The road to 1Gb/s
Some paths forward:
- [IN PROGRESS] [FETCH PARALLELIZATION] Request from multiple gateways at once. KuboCAS is a gateway store that accepts one gateway URL. It should be possible to provide multiple URLs and it can divide and conquer between all. Naive approach would be randomly selecting but in theory you can be "smart" about it and request different parts of the graph with a "race". Naive race here Add multi-gateway support to KuboCAS #63 from Codex
- [CID RESOLUTION] - As mentioned by @TheGreatAlgo an alternative index can be used instead of a HAMT. Right now if a HAMT is large (many key entries) and therefore deep, a request may require multiple traversals to get the CID you are interested in. In other words, if the HAMT node/bucket has many collisions, it will require another request to go "the next level down" and so forth for every chunk. This means that there could possibly be 3-4 Round Trip Requests which introduces significant latency for a request. An alternative index proposed is one in which chunks are grouped in logical order. A key insight @TheGreatAlgo had is, that unlike a traditional map, the zarr key chunks are ordered and therefore a predictable structure. As a result it is known beforehand which keys are where and a "shard" can be constructed which groups many chunk keys together. You can for example then simply concatenate many CIDs together.
(Ctrl-F Temporal Shard Index Implementation here https://claude.ai/share/b87c1c46-c1e2-413b-ba5b-e9e36a5e2bb7 )
currently the rood node cid of the py_hamt structure involves key value pairs which returns everything the zarr needs. it traverses the zarr for everything it needs. sometimes precip.json is at the first level (yay!) or its down at the 3rd level. So it traverse this to find it.
The nice thing with zarr though is that its a predictable structure and at its core all 3d structures are stored one dimensially in memory. Memory doesn't have "3d" structure. its just one realllllly long row of bytes. Sooooo knowing this we apply something similar.
we know the our zarr structure has 1440x770x365 for example. For simplicity lets do time slices.
We can chunk this small chunks. lets say 40x40x1. so storing it in a row, we can predict exactly where 0/34/4 will be. Then we can convert that.
{ 0/0/0: "qm...", 0/0/1" "qm....." } becomes "qm....qm..." because mathemtically we know where it will be. no longer need key value stores. instead if we want 0/0/0 we ask the first shard for the first 64 bytes of "qm...." which will be the hash for "qm"...... On the gateway it handles this quick. On bitswap it has to do a roundtrip on that 1 year. But now the full year is cached for everything. So gateway will be quicker since it can handle byte offsets, bitswap will be slower because it needs to cache. only one round trip thats predictable.
Py-hamt its random so two places nearby might need two round trips where as this doesnt.
So speed diference
Loads the zarr which will be 80 qm values (shards) and all its metadata. I suspect this is instant. Future appends work smoothly and perfectly since you just add a shard which costs nothing.
Appending doesn't modify the first 79 shards. only the 80th shard. Which in memory is only 2mb.
Asking for data from bitswap in the first year will load the first shard (2mb) and have all qm values for the first year. if over gateway you can byte offset. One roundtrip per year. Supporting alot of async work.
- [BLOCKED? - SUBCHUNKING] Consider grouping byte-range requests when reading adjacent chunks. Right now if multiple requests are different byte requests for the same block, it's possible this block may be requested multiple times? If so, then we should figure out what those byte ranges are, group them for the block and then cache it locally, flushing after the data is needed. Exploration here: https://chatgpt.com/share/683d4cba-2f44-8002-8a4e-719d0450b961 Mentioned in the ChatGPT O3 suggestion for Partial Reads in feat: get partials #51 (but can likely be done in general)
- Bitswap Store vs Gateway Content Addressed Store. Currently the content addressed store relies on requests going to a local or remote gateway which means there isn't any control of data transmitting over bitswap (where there could be QUIC support). There is no python IPFS node but there are ways to maybe "communicate" via local node w/o gateway and instead via direct commands for bitswap purposes. Tools such as https://pypi.org/project/ipfs-toolkit/ or https://github.com/endomorphosis/ipfs_kit_py can assist here. Difficulty: Medium/High
- [CID RESOLUTION - REDUCE RTT] Graphsync - Currently if a HAMT is deep, there are multiple back and forth round trips that must occur to get the next bucket. In other words, if you make a request for a key and it instead is a link, another request must be made to get the next level of the HAMT, where in the worst case scenario this can be 3-4 I/O requests in the case of a very large dataset or having an unlucky/deep request. As described here remove experimental graphsync server ipfs/kubo#9747 there were seemingly only 20-30 kubo nodes signaling graphsync support so it was removed at the time. However, creating a relatively simple IPLD selector which can provide the Node the logic to pull the deep key without needing any rountrips back to the client is perfect for this usecase. Difficulty: Medium(? Depends on IPFS to some degree as we would not want to run a Kubo Fork for this. The code already existed so it would seemingly be a matter of just re-integrating it, no new code necessary?)
- [FETCH PARALLELIZATION] RAPIDE - Bitswap like divide and conquer Presentation: https://www.youtube.com/watch?v=Cv01ePa0G58 POC Code: https://github.com/ipfs/go-libipfs-rapide Difficulty: High
- [FETCH PARALLELIZATION] Store that has support for S3 and CAS Store together? In theory could parallelize requests with this too. However would entail running an S3 service and losing data sharing that results from an IPFS infra. Need would only be if there are datasets already living on S3 that we would want to begin transferring over. Prerequisite: S3 Store Support #57 Difficulty: Medium
- [DATA TRANSPORT] For existing gateway requests, instead of current HTTP1.1 requests via aiohttp or upcoming http2 requests with httpx, make requests via HTTP3. Should be faster and more resilient (UDP vs TCP dropoff/waiting) Prerequisite: IPFS Kubo Gateway needs to have support for QUIC/HTTP3 https://chatgpt.com/share/683d475c-b814-8002-8ec1-47e75c97650d Need to check if this exists. Difficulty: High. Kubo
- [] Bitswap send a large wantlist to individual nodes?
Misc
- Blocksize discussions https://discuss.ipfs.tech/t/supporting-large-ipld-blocks/15093 Increase blocksizes near the ~2mb limit that can still be transmitted by Bitswap?
libraries
- aiohttp (only https/1.1 support) vs httpx (http2 support)
HTTP2 support aio-libs/aiohttp#10185
Gateway
- enable http2 in gateway https://chatgpt.com/share/684a92f0-cd54-8002-99a4-725c4b1ed5dd
- seems like even requests to local gateway with cached byte data caps out at 120-200mbps (higher end with aiohttp lower end with http 1.1 with httpx) maybe there's some tuning to be done here on the gateway and Kubo side?