-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse files gets de-sparsified #409
Comments
@lukts30 We're looking into this one. |
I'm trying to focus on the root-problem and how we can support this kind of feature without breaking other API requirements (like NFS filesystems or other filesystems/stores that are not sparse supported) and I want to pose the problem to you to see if this is the underlying issue: Problem: If it is the problem, possible solution: As a side note to this problem, you may be interested to also use |
Thanks for looking into this issue.
That is an accurate summary of this issue.
Indeed hardlinking/moving files or "copying" via reflink/copy_file_range (on supported FS) all would similarly address the expensive copy operation. I am wondering if the suggested hardlink approach would also work in a scenario where the Workers are distributed, but the storage for both the CAS and worker filesystems is managed through the same shared NFS/SMB file system share? |
Adds `inner_store()` function to all stores that enables the resolution of inner stores recursively to get an underlying store. This is mostly for places that can perform optimizations when specific code paths can be optimized with specific stores. towards: #409
NFS/SMB as a shared medium for distributed workers to write too is not currently supported. This is because we evict items and currently don't have the ability to have external sources notify of things changing in the FilesystemStore. Instead of using NFS, the preferred model is to use a remote store and xfer the data through some kind of pipe (eg: TCP). The complexity of adding network filesystem support sounds like it'd be very difficult to write and possibly add a lot of technical debt. Given that, we do currently plan on supporting a Fuse filesystem that will materialize the files on demand. This would allow compression of data over the network & deduplication, at the cost of latency. In the case of NFS I would suspect the latency to be about the same, so it "might" be what you are looking for. |
Adds `inner_store()` function to all stores that enables the resolution of inner stores recursively to get an underlying store. This is mostly for places that can perform optimizations when specific code paths can be optimized with specific stores. towards: #409
Adds `inner_store()` function to all stores that enables the resolution of inner stores recursively to get an underlying store. This is mostly for places that can perform optimizations when specific code paths can be optimized with specific stores. towards: #409
Commit a0788fa did not change anything in this regard right? Still copies and does not use move/hard link to relocate the file. I tested again a bit and RE through nativelink is still noticeably slower then what I would hope for. My testing involved using a buck2 BUILD file first building
Because the file is read twice the speed is in practice only 150MB/s. But even the 300MB/s is rather slow compared to other tools.
|
Yes, the commit I made was just lining things up to support the ability to support special filesystem calls like sparse copy. Can you confirm that you compiled with I saw that you are using |
I did some local testing and yes I do see it taking much longer than it should be. Here's my local results:
Nativelink results (modified source to capture timing):
If I change this line to 1Mib:
I then wanted to see if I put the hashing function onto a different spawn (thread) than the spawn (thread) that reads the file contents how might it improve:
I then checked to see if I put the uploading/copying part onto different spawns/threads if it would help:
So the obvious low hanging fruit here is to simply increase the default I'm torn on if we should support multi-spawn/thread in this section of code, since we intentionally try to not create any spawns per grpc connection in order to help keep users in a single thread. This keeps extreme parallelism in a fairly cooperative manner, otherwise one user could create millions of small files and use lots of threads computing digests and such, starving everyone else. I'll think about this one a bit more. |
I can confirm that using a 1 MiB buffer helped significantly and brought down build times from 17m45.5s down to 10m50.7s (includes negligible few seconds for RE client to fetch artifacts). Based on these findings I would recommend making Also is there a way to log how long execution, hashing & uploading took? |
Yes. @aaronmondal or @blakehatch, do one of you want to make this As for optimizing the threading, I think we should optimize it. Right now we are already paying a very high cost because tokio actually uses the synchronous filesystem API in a different threadpool. Since this is already a cost, we should instead stream the data from the synchronous filesystem API. One of the big advantages to doing it this way is that we could easily wire this up to mmap later, which would give even higher throughput. We do not currently separate out the hashing time vs other time in the worker/running_actions_manager... currently it's lumped into As an FYI, @lukts30, we are currently working on some tooling around using ReClient/Chromium as a benchmark metric that will help us understand where we need to improve. |
I spent a little more time on this. I have a local change I'll push up soon that fixes the hash time completely. It should bring the total hash time to be in parity with I'm currently looking at optimizing upload time. In doing so, there's a very high chance I'll also implement it as a file move. This should make local execution overhead nearly zero. |
How about this: allada@cf25b31 It still needs some cleanup and I decided to cleanup some code, so it'll be multiple PRs before it's in. It appears rayon + mmap and just mmap is about the same speed on my computer, so I may disable multi-threading.
I'm going to bet that we need to optimize localhost data xfer though, so this is likely only one step towards extreme speeds 😄 |
Computing the digest now happens using mmap when using blake3 and changes default read size to 16k instead of 4k. Towards: TraceMachina#409
Computing the digest now happens using mmap when using blake3 and changes default read size to 16k instead of 4k. Towards: TraceMachina#409
Computing the digest now happens using mmap when using blake3 and changes default read size to 16k instead of 4k. Towards: #409
When using nativelink with a local worker/CAS setup, adds optimizations which make it faster to upload files from the worker to the CAS. This is specifically useful for Buck2 for users that want to build hermetically. closes: TraceMachina#409
When using nativelink with a local worker/CAS setup, adds optimizations which make it faster to upload files from the worker to the CAS. This is specifically useful for Buck2 for users that want to build hermetically. closes: TraceMachina#409
When using nativelink with a local worker/CAS setup, adds optimizations which make it faster to upload files from the worker to the CAS. This is specifically useful for Buck2 for users that want to build hermetically. closes: TraceMachina#409
When using nativelink with a local worker/CAS setup, adds optimizations which make it faster to upload files from the worker to the CAS. This is specifically useful for Buck2 for users that want to build hermetically. closes: #409
I did not use the |
After execution when output files are copied from the worker to the filesystem CAS directory, they lose their sparse file properties and get fully de-sparsified. This obviously increases disk usage, and the entire process of copying the file & hashing takes considerably more time than a simple move operation followed by a sha256sum calculation on a sparse file.
(byte-by-byte copy and not moved or copied via reflink/copy_file_range).
https://github.com/TraceMachina/native-link/blob/f989e612715a7fe645e69c4c78a50e9b7262ad17/config/examples/basic_cas.json
The text was updated successfully, but these errors were encountered: