-
Notifications
You must be signed in to change notification settings - Fork 128
Use platform specific read_at when available
#628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Adam Gutglick <adamgsal@gmail.com>
|
Running the following benchmark, I see a pretty solid improvement on my macbook pro, but I'll try and find the time to benchmark it on some AWS box. Criterion results (compared to most recent Code for the benchmark, mostly doing 8KB reads and some full file reads (64MB), happy to add more cases that might be interesting: use criterion::{BenchmarkId, Criterion, Throughput, criterion_group, criterion_main};
use object_store::local::LocalFileSystem;
use object_store::path::Path;
use object_store::{ObjectStore, ObjectStoreExt};
use rand::Rng;
use std::ops::Range;
use tempfile::TempDir;
const FILE_SIZE: u64 = 64 * 1024 * 1024; // 64 MB
const RANGE_SIZE: u64 = 8 * 1024; // 8 KB ranges
fn generate_random_ranges(file_size: u64, range_size: u64, count: usize) -> Vec<Range<u64>> {
let mut rng = rand::rng();
(0..count)
.map(|_| {
let start = rng.random_range(0..file_size - range_size);
start..start + range_size
})
.collect()
}
fn bench_read_ranges(c: &mut Criterion) {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.unwrap();
// Set up the test file
let temp_dir = TempDir::new().unwrap();
let store = LocalFileSystem::new_with_prefix(temp_dir.path()).unwrap();
let path = Path::from("bench_file");
// Create file with random data
let data: Vec<u8> = (0..FILE_SIZE).map(|i| (i % 256) as u8).collect();
rt.block_on(async {
store.put(&path, data.into()).await.unwrap();
});
let mut group = c.benchmark_group("read_ranges");
for num_ranges in [10, 100, 1000] {
let ranges = generate_random_ranges(FILE_SIZE, RANGE_SIZE, num_ranges);
let total_bytes = num_ranges as u64 * RANGE_SIZE;
group.throughput(Throughput::Bytes(total_bytes));
group.bench_with_input(
BenchmarkId::new("local_fs", num_ranges),
&ranges,
|b, ranges| {
b.to_async(&rt)
.iter(|| async { store.get_ranges(&path, ranges).await.unwrap() });
},
);
}
group.finish();
}
fn bench_get_opts_whole_file(c: &mut Criterion) {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.unwrap();
let temp_dir = TempDir::new().unwrap();
let store = LocalFileSystem::new_with_prefix(temp_dir.path()).unwrap();
let path = Path::from("bench_file");
let data: Vec<u8> = (0..FILE_SIZE).map(|i| (i % 256) as u8).collect();
rt.block_on(async {
store.put(&path, data.into()).await.unwrap();
});
let mut group = c.benchmark_group("get_opts_whole_file");
group.throughput(Throughput::Bytes(FILE_SIZE));
group.bench_function("local_fs", |b| {
b.to_async(&rt).iter(|| async {
store
.get_opts(&path, Default::default())
.await
.unwrap()
.bytes()
.await
.unwrap()
});
});
group.finish();
}
fn bench_get_range_sequential(c: &mut Criterion) {
let rt = tokio::runtime::Builder::new_current_thread()
.enable_all()
.build()
.unwrap();
let temp_dir = TempDir::new().unwrap();
let store = LocalFileSystem::new_with_prefix(temp_dir.path()).unwrap();
let path = Path::from("bench_file");
let data: Vec<u8> = (0..FILE_SIZE).map(|i| (i % 256) as u8).collect();
rt.block_on(async {
store.put(&path, data.into()).await.unwrap();
});
let mut group = c.benchmark_group("get_range_sequential");
for num_ranges in [10, 100, 1000] {
let ranges = generate_random_ranges(FILE_SIZE, RANGE_SIZE, num_ranges);
let total_bytes = num_ranges as u64 * RANGE_SIZE;
group.throughput(Throughput::Bytes(total_bytes));
group.bench_with_input(
BenchmarkId::new("local_fs", num_ranges),
&ranges,
|b, ranges| {
b.to_async(&rt).iter(|| async {
for range in ranges {
store.get_range(&path, range.clone()).await.unwrap();
}
});
},
);
}
group.finish();
}
criterion_group!(
benches,
bench_read_ranges,
bench_get_opts_whole_file,
bench_get_range_sequential
);
criterion_main!(benches); |
| .map_err(|source| local::Error::UnableToReadBytes { source, path })?; | ||
|
|
||
| Ok(buffer.into()) | ||
| use crate::local::read_range; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes me think that maybe read_range belongs elsewhere, maybe a new file module?
| let mut buf_slice = &mut buf[..]; | ||
| let mut offset = range.start; | ||
|
|
||
| while !buf_slice.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is roughly the implementation of read_exact_at, just allows us to keep the same errors and track how many bytes we've read for the OutOfRange error branch.
src/local.rs
Outdated
| // Safety: | ||
| // Setting the buffer's length to its capacity is safe as it remains within its allocated memory. | ||
| // In cases where `read_exact_at` errors, the contents of the buffer are undefined, | ||
| // but we discard it without using it. | ||
| unsafe { | ||
| buf.set_len(to_read as usize); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first use of unsafe in the crate, which I'm not thrilled about. The alternative seems to be to pre-fill it with Vec::resize, but that seems wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the later &mut buf[..] makes this UB.
What is the performance when initializing buf with vec![0u8; 8]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative would be reimplementing/vendoring part of this API https://doc.rust-lang.org/std/os/unix/fs/trait.FileExt.html#method.read_buf_at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or a suggestion by Gemini (raw file IO):
use std::os::unix::io::AsRawFd;
let fd = file.as_raw_fd();
let mut buf = Vec::with_capacity(to_read);
let bytes_read = unsafe {
// libc::pread takes a raw pointer (*mut c_void), avoiding the reference creation UB
libc::pread(
fd,
buf.as_mut_ptr() as *mut _,
to_read,
offset as i64
)
};
if bytes_read >= 0 {
unsafe { buf.set_len(bytes_read as usize) };
} else {
// Handle error
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see it even triggers https://rust-lang.github.io/rust-clippy/rust-1.93.0/index.html#uninit_vec now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try it out with pre-allocated data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just preallocated the buffer with 0s (buf.resize(to_read as usize, 0_u8)), results are:
read_ranges/local_fs/10 time: [24.271 µs 24.605 µs 24.979 µs]
thrpt: [3.0543 GiB/s 3.1008 GiB/s 3.1434 GiB/s]
change:
time: [−9.2694% −7.9576% −6.6172%] (p = 0.00 < 0.05)
thrpt: [+7.0861% +8.6456% +10.216%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
read_ranges/local_fs/100
time: [91.591 µs 92.250 µs 92.981 µs]
thrpt: [8.2054 GiB/s 8.2703 GiB/s 8.3299 GiB/s]
change:
time: [−19.988% −19.050% −18.121%] (p = 0.00 < 0.05)
thrpt: [+22.131% +23.532% +24.982%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low mild
5 (5.00%) high mild
5 (5.00%) high severe
read_ranges/local_fs/1000
time: [877.00 µs 891.02 µs 907.75 µs]
thrpt: [8.4047 GiB/s 8.5626 GiB/s 8.6994 GiB/s]
change:
time: [−32.377% −30.518% −28.468%] (p = 0.00 < 0.05)
thrpt: [+39.797% +43.921% +47.879%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
4 (4.00%) high mild
4 (4.00%) high severe
get_opts_whole_file/local_fs
time: [8.2127 ms 8.2487 ms 8.2891 ms]
thrpt: [7.5400 GiB/s 7.5770 GiB/s 7.6101 GiB/s]
change:
time: [−3.7223% −2.3673% −1.0502%] (p = 0.00 < 0.05)
thrpt: [+1.0613% +2.4247% +3.8663%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
get_range_sequential/local_fs/10
time: [223.74 µs 225.81 µs 228.25 µs]
thrpt: [342.28 MiB/s 345.97 MiB/s 349.18 MiB/s]
change:
time: [−8.4366% −7.0562% −5.8343%] (p = 0.00 < 0.05)
thrpt: [+6.1958% +7.5919% +9.2139%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
get_range_sequential/local_fs/100
time: [2.2283 ms 2.2480 ms 2.2706 ms]
thrpt: [344.07 MiB/s 347.54 MiB/s 350.61 MiB/s]
change:
time: [−8.9263% −7.8699% −6.7098%] (p = 0.00 < 0.05)
thrpt: [+7.1924% +8.5421% +9.8012%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe
get_range_sequential/local_fs/1000
time: [23.699 ms 23.843 ms 24.011 ms]
thrpt: [325.37 MiB/s 327.66 MiB/s 329.65 MiB/s]
change:
time: [+1.8221% +2.7832% +3.8109%] (p = 0.00 < 0.05)
thrpt: [−3.6710% −2.7079% −1.7895%]
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
6 (6.00%) high severe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a bit worse for get_opts but still shows an improvement, otherwise very similar within the noise of a laptop.
read_at when availableread_at when available
| // extents of the file | ||
| if range.start >= file_len { | ||
| // extents of the file, or if its at the end of the file and wants to read a non-empty range. | ||
| if range.start > file_len || (range.start == file_len && !range.is_empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the added condition equivalent to range.end > file_len ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from line 994-995:
// Don't read past end of file
let to_read = range.end.min(file_len) - range.start;I think its actually technically ok to read with a range that ends past the end of the file (I think HTTP RANGE headers allow that too), the only issue is when trying to start at the end and reading more. I'll try and figure out if this error is actually tested, and add a test if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
|
seems like I need some more work on the windows size |
Which issue does this PR close?
Closes #622.
Rationale for this change
Saves on at least one seek syscall per range
What changes are included in this PR?
FileExt::read_atfunctions.LocalStore::read_rangesand for local files inGetResult::bytes, so this codepath is shared across both implementation which are currently very similar with one minor change.Are there any user-facing changes?
No