Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use block-based size for filecache entry size #6376

Merged
merged 3 commits into from
Apr 16, 2024
Merged

Conversation

bduffany
Copy link
Member

@bduffany bduffany commented Apr 16, 2024

When calling AddFile, stat the file to get the number of blocks consumed by the file, and compute the size based on the number of blocks consumed, rather than the content stored in the blocks. The extra stat() syscall does not significantly affect performance according to benchmarks:

goos: linux
goarch: amd64
cpu: AMD Ryzen 9 7950X 16-Core Processor            
                                   │ /tmp/bench_false.txt │        /tmp/bench_true.txt         │
                                   │        sec/op        │   sec/op     vs base               │
FilecacheLink/100%Link/0%Add/1K-32            15.52m ± 1%   15.57m ± 1%       ~ (p=0.631 n=10)
FilecacheLink/95%Link/5%Add/1K-32             14.77m ± 2%   14.75m ± 3%       ~ (p=0.436 n=10)
geomean                                       15.14m        15.15m       +0.07%

A future improvement could be to also account for the file metadata, not just the file contents. But just computing the block-based size should be good enough for now.

Related issues:

@bduffany bduffany changed the title WIP Use size on disk for filecache entry size Apr 16, 2024
@bduffany bduffany changed the title Use size on disk for filecache entry size Use block-based size for filecache entry size Apr 16, 2024
@bduffany bduffany force-pushed the filecache-real-size branch 3 times, most recently from a9c77a3 to e1ea841 Compare April 16, 2024 14:42
@bduffany bduffany marked this pull request as ready for review April 16, 2024 14:43
}
// Stat always returns number of 512 byte blocks, regardless of the FS
// settings.
sizeOnDisk = info.Sys().(*syscall.Stat_t).Blocks * 512
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this doesn't work on Windows. I'm not sure what the status of Windows executors are, but we may need to special case this calculation for this to compile on Windows @sluongng

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated so that we continue to use the content size on windows for now until we can figure out something more accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows executor should be functional on HEAD. If you guys need to test something, please ping me on Slack.

@bduffany bduffany merged commit 8fc5a88 into master Apr 16, 2024
16 of 18 checks passed
@bduffany bduffany deleted the filecache-real-size branch April 16, 2024 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants