[skunkworks] persist: Don't copy into a single Vec for S3 Blob #18692

do-not-use-parker-timmerman · 2023-04-10T14:50:01Z

Motivation

Today the implementation of Blob for S3 will copy all of the multi-part responses into a single contiguous Vec. We need to do this because of the return type on Blob::get, but practically it's unnecessary because no consumers of Blob::get need the bytes to be in a single contiguous Vec. Ideally we don't do this copy, and the API specifies some return type that is an iterable, but possibly non-contiguous collection of bytes. There's a TODO that describes this!

This PR adds a new type to ore called SegmentedBytes, which is a collection of individual Byte segments. This new type implements bytes::Buf and std::io::{Read, Seek} which are the two APIs used by consumers of Blob::get. We then update the Blob::get method to return SegmentedBytes instead of Vec<u8>. Finally we update the S3 Blob impl to no longer copy bytes from the responses into a single Vec, and as part of this refactor, we also request each part's body immediately after receiving its header, instead of waiting for all the headers to come back first, which fixes another TODO.

Note: Ideally we don't have to define our own "segmented bytes" type, but I couldn't find anything else that really gave us what we wanted. I tried the following:

Making the return type of Blob::get generic with Box<dyn bytes::Buf>, but it appears part of the returned blobs are encoded with arrow2, and the decoder requires the buffer to implement std::io::Read and std::io::Seek, which bytes::Buf does not. I also couldn't find any other impls of arrow2 that used bytes::Buf instead of the std::io traits.
bytes-utils has a SegmentedBuf type, but it similarly doesn't implement Read and Seek. So we'd also have to wrap this type.
aws-smithy-http has an AggregatedBytes type, which is what the S3 client returns. But again, it doesn't implement Read and Seek, so we'd need to wrap the type ourselves.

So because we'd at least need to add a wrapper type to make any existing types work, I implemented our own "segmented bytes" type since it wasn't much more work, and we'd have total control over the impl.

Currently we depend on a personal fork of Amazon's smithy-rs just because they have yet to release smithy-lang/smithy-rs#2525. Also I'm more than happy to make a Materialize fork of smithy-rs, I didn't do that yet though because I wanted to get feedback on this change first.

Tips for reviewer

There are multiple stacked changes in this PR, it's best if you view each of the commits independently. The changes are:

Create a new type called SegmentedBytes in the ore crate
Update the trait Blob in persist to use SegmentedBytes
Update the S3 client to no longer copy bytes into a single Vec

I see we have an open loop benchmark for our persist-client, @danhhz would you know how to run this?

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

danhhz

cool! persist changes LGTM modulo the following comments. I didn't look closely at the impl of SegmentedBytes yet

danhhz · 2023-04-14T18:45:14Z

src/ore/src/bytes.rs

+    }
+}
+
+impl From<&[u8]> for SegmentedBytes {


dunno how others feel, but I dislike From impls that are potentially this costly. my preferences in order:

remove this and make the user call to_vec

a non-From impl function (which has the opportunity to declare the allocation and copy in the rustdoc)

ditto for From<&str>

Totally makes sense! I ended up removing these impls as they weren't actually needed, they were an artifact from an attempt to make testing more seamless

danhhz · 2023-04-14T18:47:10Z

src/ore/src/bytes.rs

+    impl io::Read for SegmentedReader {
+        fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
+            // Ugh. Map from u64 to usize.
+            let pointer = usize::try_from(self.pointer).map_err(|_| {


materialize declares in one place that it only runs on 64-bit platforms, so we don't have this sort of thing littered everywhere: mz_ore::cast::CastFrom :). import that and use usize::cast_from(self.pointer) here

Sweet! Updated :)

danhhz · 2023-04-14T18:50:53Z

src/persist-client/src/internal/encoding.rs


        // We can read it back using persist code v2 and v3.
        assert_eq!(
-            UntypedState::<u64>::decode(&v2, &buf)
+            UntypedState::<u64>::decode(&v2, bytes.clone())


persist (and I think all of mz now?) has a lint that forces things like Arc::clone to be called as Arc::clone(&foo) instead of foo.clone(), which makes it much easier to see at a glance that certain clones are cheap. I wonder if we can opt Bytes into that lint?

Good thought! I don't think we can opt Bytes into that, but I did file an issue with Clippy to see if we can extend the current lint, rust-lang/rust-clippy#10657

danhhz · 2023-04-14T18:53:04Z

Cargo.toml

@@ -115,3 +115,4 @@ postgres-protocol = { git = "https://github.com/MaterializeInc/rust-postgres" }
 tokio-postgres = { git = "https://github.com/MaterializeInc/rust-postgres" }
 serde-value = { git = "https://github.com/MaterializeInc/serde-value.git" }
 vte = { git = "https://github.com/alacritty/vte" }
+aws-smithy-http = { git = "https://github.com/parker-timmerman/smithy-rs.git", branch = "v/0.53.2" }


what's the story here?

I merged an upstream PR, smithy-lang/smithy-rs#2525, into smithy-rs to add some functionality into their AggregatedBytes type, but they haven't yet released a new version of aws-smithy-http, so I needed to make a fork. But before I forked the crate into Materialize, I wanted to get feedback on this PR, so I forked it to my own account

danhhz · 2023-04-14T19:03:02Z

src/persist/src/s3.rs

+        self.min.fetch_min(nanos, atomic::Ordering::SeqCst);
+
+        // Trace if our provided duration was much larger than our minimum.
+        let new_min = self.min.load(atomic::Ordering::Relaxed);


this is the same as std::cmp::min(fetch_min_result, nanos), no?

Ah ha, it is! I updated this

danhhz · 2023-04-14T19:03:26Z

src/persist/src/s3.rs

+
+        // Trace if our provided duration was much larger than our minimum.
+        let new_min = self.min.load(atomic::Ordering::Relaxed);
+        let current_duration = Duration::from_nanos(nanos);


may as well use x here? you don't get the u64::max truncation, but seems fine

danhhz · 2023-04-14T19:10:32Z

src/persist/src/s3.rs

+                .body
+                .collect()
+                .map_err(|err| Error::from(format!("s3 get first body err: {}", err)));
+            let elapsed = body_start.elapsed();


is this the same? I think you need to await the collect() future before calling elapsed(), no?

benesch · 2023-04-17T13:42:14Z

Neat!

bytes-utils has a SegmentedBuf type, but it similarly doesn't implement Read and Seek. So we'd also have to wrap this type.

Do you think they'd accept a PR that added Read and Seek implementations? Long term, I'd love to not have to maintain our own SegmentedBuf, if we can get the upstream crate to do everything we need.

do-not-use-parker-timmerman · 2023-04-17T14:02:45Z

bytes-utils has a SegmentedBuf type, but it similarly doesn't implement Read and Seek. So we'd also have to wrap this type.

Do you think they'd accept a PR that added Read and Seek implementations? Long term, I'd love to not have to maintain our own SegmentedBuf, if we can get the upstream crate to do everything we need.

Totally agree, for now I submitted an issue against the repo, vorner/bytes-utils#16, and once I get a free cycle I'll put up a PR, we'll see what they say!

benesch · 2023-04-17T17:54:19Z

bytes-utils has a SegmentedBuf type, but it similarly doesn't implement Read and Seek. So we'd also have to wrap this type.

Do you think they'd accept a PR that added Read and Seek implementations? Long term, I'd love to not have to maintain our own SegmentedBuf, if we can get the upstream crate to do everything we need.

Totally agree, for now I submitted an issue against the repo, vorner/bytes-utils#16, and once I get a free cycle I'll put up a PR, we'll see what they say!

Perfect, thanks @parker-timmerman! If you could drop a TODO comment at the top of ore/bytes.rs that links to that issue, that'd be ideal. That way the next person who wonders "why did we write all this code?" can go check out that issue.

danhhz

okay looked over the SegementedBytes stuff and that that lgtm, too

once normal CI is passing (do you know what's going on there?), I'll want us to run the nightly feature benchmarks before this goes in, just as a sanity check. lemme know if you don't know how to kick that off

danhhz · 2023-04-17T18:02:10Z

src/ore/src/bytes.rs

+
+        /// The total number of bytes this [`SegmentedReader`] is mapped over.
+        pub fn len(&self) -> usize {
+            self.len


is self.len redundant with the greatest/last key in segments?

Ah it is redundant, would you prefer if we got rid of len?

I'm a slight lean toward removing it, but this is well within the bounds of what I'd leave to the discretion of the author :)

danhhz · 2023-04-17T18:10:04Z

src/ore/src/bytes.rs

+                    self.pointer = n;
+                    return Ok(n);
+                }
+                SeekFrom::End(n) => (u64::cast_from(self.len), n),


wow TIL that SeekFrom::End is kind of confusing. I was completely expecting positive to seek backwards, but I check the docs and you're correct here

Yeah I expected the same, that it would be a positive seek backwards, but the existing API also does make some sense

danhhz · 2023-04-17T18:11:14Z

src/ore/src/bytes.rs

+            // The contract for io::Seek denotes that seeking beyond the end
+            // of the stream is allowed. If we're beyond the end of the stream
+            // then we won't read back any bytes, but it won't be an error.
+            match base.checked_add_signed(offset) {


wow checked_add_signed really does all the work here, nice!

…ntedBytes' add more tests to bytes remove From impls for SegementedBytes ignore tests in miri because they're slow add TODO comment to bytes mod fix bug

more fixes for non-S3 fixes for examples update clients fmt

…futures immediately after header futures update S3 persist impl S3 fixes update S3 client update git deps s3 fmt

do-not-use-parker-timmerman · 2023-04-19T15:05:55Z

@benesch

Perfect, thanks @parker-timmerman! If you could drop a TODO comment at the top of ore/bytes.rs that links to that issue, that'd be ideal. That way the next person who wonders "why did we write all this code?" can go check out that issue.

Done :)

do-not-use-parker-timmerman · 2023-04-19T15:12:33Z

once normal CI is passing (do you know what's going on there?), I'll want us to run the nightly feature benchmarks before this goes in, just as a sanity check. lemme know if you don't know how to kick that off

CI is passing (there was a small bug in SegmentedBytes). I'm pretty sure I can start a new build on the Nightlies pipeline, is there a way for me to run just the single "Feature benchmarks" job, or do I have to run the whole pipeline?

danhhz · 2023-04-19T15:14:07Z

yep! if a nightlies run is triggered manually, the first step is a selection box that makes you select which sub-jobs you want

do-not-use-parker-timmerman · 2023-04-19T15:20:25Z

yep! if a nightlies run is triggered manually, the first step is a selection box that makes you select which sub-jobs you want

Nice! I kicked off the benchmarks and the maelstrom tests, they both seemed applicable here :)

https://buildkite.com/materialize/nightlies/builds/2232#_

do-not-use-parker-timmerman force-pushed the ore/segmented-bytes branch from 1d0e563 to b4bde29 Compare April 14, 2023 18:11

do-not-use-parker-timmerman requested review from danhhz and guswynn April 14, 2023 18:20

do-not-use-parker-timmerman force-pushed the ore/segmented-bytes branch 2 times, most recently from 6a5b737 to 3760ac7 Compare April 14, 2023 18:49

do-not-use-parker-timmerman changed the title ~~persist: Don't copy into a single Vec for S3 Blob~~ [skunkworks] persist: Don't copy into a single Vec for S3 Blob Apr 14, 2023

do-not-use-parker-timmerman marked this pull request as ready for review April 14, 2023 18:57

do-not-use-parker-timmerman requested review from benesch and a team as code owners April 14, 2023 18:57

danhhz reviewed Apr 14, 2023

View reviewed changes

do-not-use-parker-timmerman force-pushed the ore/segmented-bytes branch from 3760ac7 to c2fc239 Compare April 17, 2023 13:53

do-not-use-parker-timmerman force-pushed the ore/segmented-bytes branch from c2fc239 to fc09f17 Compare April 17, 2023 14:05

danhhz approved these changes Apr 17, 2023

View reviewed changes

do-not-use-parker-timmerman force-pushed the ore/segmented-bytes branch from fc09f17 to fb45941 Compare April 18, 2023 14:55

ParkMyCar added 3 commits April 19, 2023 09:40

add new module to 'ore' called 'bytes', create new type called 'Segme…

c61ba68

…ntedBytes' add more tests to bytes remove From impls for SegementedBytes ignore tests in miri because they're slow add TODO comment to bytes mod fix bug

update Blob trait to use SegmentedBytes, update all implementors

2879a3b

more fixes for non-S3 fixes for examples update clients fmt

update S3 client to stop copying bytes into a single vec, chain body …

eb85cbb

…futures immediately after header futures update S3 persist impl S3 fixes update S3 client update git deps s3 fmt

do-not-use-parker-timmerman force-pushed the ore/segmented-bytes branch from fb45941 to eb85cbb Compare April 19, 2023 13:40

do-not-use-parker-timmerman merged commit 14ca1b9 into MaterializeInc:main Apr 20, 2023

materialize-bot mentioned this pull request Apr 21, 2023

release: v0.52.0 required reviews #18907

Closed

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skunkworks] persist: Don't copy into a single Vec for S3 Blob #18692

[skunkworks] persist: Don't copy into a single Vec for S3 Blob #18692

do-not-use-parker-timmerman commented Apr 10, 2023 •

edited

Loading

danhhz left a comment

danhhz Apr 14, 2023

do-not-use-parker-timmerman Apr 17, 2023

danhhz Apr 14, 2023

do-not-use-parker-timmerman Apr 17, 2023

danhhz Apr 14, 2023

do-not-use-parker-timmerman Apr 17, 2023

danhhz Apr 14, 2023

do-not-use-parker-timmerman Apr 17, 2023

danhhz Apr 14, 2023

do-not-use-parker-timmerman Apr 17, 2023

danhhz Apr 14, 2023

do-not-use-parker-timmerman Apr 17, 2023

danhhz Apr 14, 2023

do-not-use-parker-timmerman Apr 17, 2023

benesch commented Apr 17, 2023

do-not-use-parker-timmerman commented Apr 17, 2023

benesch commented Apr 17, 2023

danhhz left a comment

danhhz Apr 17, 2023

do-not-use-parker-timmerman Apr 19, 2023

danhhz Apr 19, 2023

danhhz Apr 17, 2023

do-not-use-parker-timmerman Apr 19, 2023

danhhz Apr 17, 2023

do-not-use-parker-timmerman commented Apr 19, 2023 •

edited

Loading

do-not-use-parker-timmerman commented Apr 19, 2023

danhhz commented Apr 19, 2023

do-not-use-parker-timmerman commented Apr 19, 2023

[skunkworks] persist: Don't copy into a single Vec for S3 Blob #18692

[skunkworks] persist: Don't copy into a single Vec for S3 Blob #18692

Conversation

do-not-use-parker-timmerman commented Apr 10, 2023 • edited Loading

Motivation

Tips for reviewer

Checklist

danhhz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benesch commented Apr 17, 2023

do-not-use-parker-timmerman commented Apr 17, 2023

benesch commented Apr 17, 2023

danhhz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

do-not-use-parker-timmerman commented Apr 19, 2023 • edited Loading

do-not-use-parker-timmerman commented Apr 19, 2023

danhhz commented Apr 19, 2023

do-not-use-parker-timmerman commented Apr 19, 2023

do-not-use-parker-timmerman commented Apr 10, 2023 •

edited

Loading

do-not-use-parker-timmerman commented Apr 19, 2023 •

edited

Loading