Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate S3Store from rusoto to AWS SDK for Rust #369

Merged
merged 1 commit into from Nov 20, 2023

Conversation

aaronmondal
Copy link
Contributor

@aaronmondal aaronmondal commented Nov 1, 2023

Fixes #131


This change is Reviewable

@aaronmondal aaronmondal force-pushed the s3-aws-sdk branch 7 times, most recently from 4d21a19 to 4fee1ef Compare November 7, 2023 00:43
Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 10 files at r1, all commit messages.
Reviewable status: 6 of 10 files reviewed, 16 unresolved discussions (waiting on @aaronmondal)


flake.nix line 30 at r1 (raw file):

              openssl_static # Required explicitly for cargo test support.
              bazel
              pkgs.awscli2

This is needed?


cas/store/BUILD line 406 at r1 (raw file):

)

# rust_test(

reminder to fix this.


cas/store/s3_store.rs line 15 at r1 (raw file):

// limitations under the License.

use std::{

nit: The rest of the repo only uses {} style for the end names. I personally find it easier to read spelled out (but last one can be joined together if the rest of the prefix matches).


cas/store/s3_store.rs line 67 at r1 (raw file):

            .with_webpki_roots()
            .https_only()
            .enable_http2()

hmmm, might be worth benchmarking http1 vs http2. From what I remember http2 will handle small objects better, but http1 will handle large objects better because large objects will have multiple streams opened in this implementation.


cas/store/s3_store.rs line 75 at r1 (raw file):

    async fn simple_retry(
        &mut self,
        req: Uri,

nit, pass a ref, since we are making a copy internally.


cas/store/s3_store.rs line 85 at r1 (raw file):

            match self.connector.call(req.clone()).await {
                Ok(stream) => return Ok(stream),
                Err(_) if retries < max_retries => {

nit: This is hard to read, can we make this more traditional logic instead of very-rust-specific style?


cas/store/s3_store.rs line 85 at r1 (raw file):

            match self.connector.call(req.clone()).await {
                Ok(stream) => return Ok(stream),
                Err(_) if retries < max_retries => {

What about 404? Isnt' that an error? This would be a do-not-retry.


cas/store/s3_store.rs line 119 at r1 (raw file):

    fn call(&mut self, req: Uri) -> Self::Future {
        const MAX_RETRIES: usize = 10;

nit: These should be configs.


cas/store/s3_store.rs line 350 at r1 (raw file):

                let stream = ByteStream::new(SdkBody::from(write_buf));
                let s3_client = self.s3_client.clone();
                let inner_s3_path = s3_path.clone();

nit: Just inline these.


cas/store/s3_store.rs line 355 at r1 (raw file):

                completed_part_futs.push(
                    JoinHandleDropGuard::new(tokio::spawn(async move {

nit: While we are here lets remove these spawns As a bonus we won't need JoinHandleDropGuard.


cas/store/s3_store.rs line 398 at r1 (raw file):

            Ok(())
        };
        if complete_result.is_err() {

nit: This should probably be in:
https://docs.rs/scopeguard/latest/scopeguard/fn.guard.html

So if the future ever gets dropped without complete_result ever finishing it also cleans up.


cas/store/s3_store.rs line 458 at r1 (raw file):

                            // doesn't require a bunch of hackery around poor UX
                            // in the s3 sdk.
                            RetryResult::Err(make_err!(Code::NotFound, "Couldn't get object.")),

nit: add in s3 store, so it's string searchable in the code.


cas/store/s3_store.rs line 487 at r1 (raw file):

                                if let Err(e) = writer.send(bytes).await {
                                    return Some((
                                        RetryResult::Err(make_input_err!("Error sending bytes to consumer: {e}")),

nit: In all these errors say in s3 store.


cas/store/s3_store.rs line 494 at r1 (raw file):

                    // Copy data from s3 input stream to the writer stream.
                    let result = writer
                        .forward(s3_in_stream, true /* Forward EOF */)

nit: We might be able to remove the .forward() function. Not sure, but i'd be suprised if it's used anywhere else.


config/stores.rs line 412 at r1 (raw file):

    /// Default: 20.
    #[serde(default, deserialize_with = "convert_numeric_with_shellexpand")]
    pub additional_max_concurrent_requests: usize,

we could add this in the global config now.


tools/cargo_shared.bzl line 72 at r1 (raw file):

        "version": "0.8.5",
    },
    "rusoto_s3": {

Reminder: Remove these.

Copy link
Contributor Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 13 files reviewed, 7 unresolved discussions (waiting on @allada)


flake.nix line 30 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

This is needed?

This is only used when entering the dev environment. This way users don't need to install the aws cli manually.


cas/store/BUILD line 406 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

reminder to fix this.

Done.


cas/store/s3_store.rs line 67 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

hmmm, might be worth benchmarking http1 vs http2. From what I remember http2 will handle small objects better, but http1 will handle large objects better because large objects will have multiple streams opened in this implementation.

I agree, but propose to postpone this to a later commit when we have better tooling/setup for benchmarking these things. I don't think there is an easy way to keep both http1 and http2 happy, so I'd argue that the performance difference needs to be significant for this to justify the additional complexity.

The current variant works for AWS and if it works for GCP as well I think this should be fine.


cas/store/s3_store.rs line 85 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

What about 404? Isnt' that an error? This would be a do-not-retry.

Interestingly, this is the thing we're catching here. S3 tends to lose connections quite frequently with a 404 error that resolves itself after a few seconds. An unrelated instance where you can notice this is when running aws s3 rm s3://somebucket --recursive on a sufficiently large bucket. You'll get connection errors after a while. This retry mechanism counteracts that.


cas/store/s3_store.rs line 398 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: This should probably be in:
https://docs.rs/scopeguard/latest/scopeguard/fn.guard.html

So if the future ever gets dropped without complete_result ever finishing it also cleans up.

I double-checked with clippy, and I think this is fine. There are no panics raised in (the latest version). Or maybe I misunderstood the guard?


config/stores.rs line 412 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

we could add this in the global config now.

Let's do this in a different PR. There are some other things we can do with S3 specifically since it has it's own retry mechanisms (including exponential backoff-type retries) in a way that is potentially better than what we have at the moment. This seems to require additional efforts to evaluate though.


tools/cargo_shared.bzl line 72 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

Reminder: Remove these.

Done.

@aaronmondal aaronmondal marked this pull request as ready for review November 11, 2023 07:03
@aaronmondal aaronmondal force-pushed the s3-aws-sdk branch 4 times, most recently from f2ceed5 to 1128b9f Compare November 13, 2023 23:54
Copy link
Contributor

@blakehatch blakehatch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 10 files at r1, 8 of 10 files at r2, 2 of 2 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @aaronmondal, @allada, and @MarcusSorealheis)


cas/store/s3_store.rs line 398 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

I double-checked with clippy, and I think this is fine. There are no panics raised in (the latest version). Or maybe I misunderstood the guard?

From my understanding there are cases like a panic where this or if the execution stops unexpectedly this cleanup logic might not be triggered. The guard will make sure that the cleanup operation runs no matter how this scope is exited even with a panic. If a panic here were ever re-introduced accidentally at least we wouldn't have to worry about this cleanup being affected.


cas/store/tests/s3_store_test.rs line 28 at r4 (raw file):

use traits::StoreTrait;

// TODO(aaronmondal): Figure out how to test the connector retry mechanism.

nit: Seems like this is being tested in fn simple_has_retries()? Could be incorrect but if it is just remove the TODO

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 10 files at r2, 1 of 2 files at r3, 5 of 6 files at r5, all commit messages.
Reviewable status: 12 of 13 files reviewed, 8 unresolved discussions (waiting on @aaronmondal, @blakehatch, and @MarcusSorealheis)


cas/store/s3_store.rs line 67 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

I agree, but propose to postpone this to a later commit when we have better tooling/setup for benchmarking these things. I don't think there is an easy way to keep both http1 and http2 happy, so I'd argue that the performance difference needs to be significant for this to justify the additional complexity.

The current variant works for AWS and if it works for GCP as well I think this should be fine.

To be honest, I'd rather not take the risk. I'd rather stick to http1 and support http2 after we have benchmarking. I have had some bad experiences with http2 when dealing with high throughput & lots of streams.

Later we can add http2 as a config option once we have time to benchmark the differences.


cas/store/s3_store.rs line 85 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Interestingly, this is the thing we're catching here. S3 tends to lose connections quite frequently with a 404 error that resolves itself after a few seconds. An unrelated instance where you can notice this is when running aws s3 rm s3://somebucket --recursive on a sufficiently large bucket. You'll get connection errors after a while. This retry mechanism counteracts that.

Hmmm, did you get the 404 on S3? Because AWS-S3 & GCP-buckets should no longer be eventual-consistent, but other providers are.


cas/store/s3_store.rs line 74 at r5 (raw file):

        let connector = hyper_rustls::HttpsConnectorBuilder::new()
            .with_webpki_roots()
            .https_only()

nit: I don't think we should require https. I could see someone wanting to use their own internal S3 api that is inside their network on their non-ssl port.


cas/store/s3_store.rs line 80 at r5 (raw file):

        let connector_max_retries = config
            .connector_max_retries
            .map_or(DEFAULT_CONNECTOR_MAX_RETRIES, |retries| retries);

nit: I think you want .unwrap_or()


config/stores.rs line 416 at r5 (raw file):

    /// Maximum number of concurrent UploadPart requests per MultipartUpload.
    /// Default: 10.

nit: I usually put a full new line between the comment and the Default line. (just for consistency)


config/stores.rs line 421 at r5 (raw file):

    /// Maximum number of retries in case the connection to S3 drops.
    /// Default: 10.
    pub connector_max_retries: Option<usize>,

Can the connector logic be part of the retry config above? Like make them part of the same internal-retry object?

Copy link
Contributor Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 13 files reviewed, 6 unresolved discussions (waiting on @allada, @blakehatch, and @MarcusSorealheis)


cas/store/s3_store.rs line 67 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

To be honest, I'd rather not take the risk. I'd rather stick to http1 and support http2 after we have benchmarking. I have had some bad experiences with http2 when dealing with high throughput & lots of streams.

Later we can add http2 as a config option once we have time to benchmark the differences.

I've now enabled both HTTP1 and HTTP2. This way the descision is handled by the aws sdk.


cas/store/s3_store.rs line 85 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

Hmmm, did you get the 404 on S3? Because AWS-S3 & GCP-buckets should no longer be eventual-consistent, but other providers are.

This is not a 404 "object not found in bucket" error, but a DNS resolution error as in "I can't find the bucket". Sometimes the endpoint for the entire bucket fails to resolve.


cas/store/s3_store.rs line 398 at r1 (raw file):

Previously, blakehatch (Blake Hatch) wrote…

From my understanding there are cases like a panic where this or if the execution stops unexpectedly this cleanup logic might not be triggered. The guard will make sure that the cleanup operation runs no matter how this scope is exited even with a panic. If a panic here were ever re-introduced accidentally at least we wouldn't have to worry about this cleanup being affected.

Seems to add unnecessary complexity, and with the current state of the code it's not an issue. If we want to handle such things we should add global clippy configs to forbid undocumented panics.


cas/store/s3_store.rs line 74 at r5 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: I don't think we should require https. I could see someone wanting to use their own internal S3 api that is inside their network on their non-ssl port.

We have to default to HTTPS-only because of "safe default" requirements. I've added an insecure_allow_http flag to make HTTP opt-in.


cas/store/tests/s3_store_test.rs line 28 at r4 (raw file):

Previously, blakehatch (Blake Hatch) wrote…

nit: Seems like this is being tested in fn simple_has_retries()? Could be incorrect but if it is just remove the TODO

This is still correct. We're only testing the HTTP traffic here but not the underlying TCP connector. I still didn't figure out how to test it properly and we probably want to put the entire TlsConnector into some separate file and make it handle all connections of native link globally.


config/stores.rs line 421 at r5 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

Can the connector logic be part of the retry config above? Like make them part of the same internal-retry object?

I tried this, but it turned out to be misleading since we're configuring two different retriers for the S3 store. At the moment, the Retry config is reusable (for instance in some grpc services) and its max_retries configures global retries in some cases. This is not true for the S3 implementation which has distinct limits for Http failures and tcp failures.

This is certainly not optimal, but I think we should keep it this way and improve the retry logic globally in a way that doesn't lead to different "meanings" in different stores.

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 8 files at r6, all commit messages.
Reviewable status: 9 of 13 files reviewed, 4 unresolved discussions (waiting on @aaronmondal, @blakehatch, and @MarcusSorealheis)


cas/store/s3_store.rs line 96 at r6 (raw file):

    }

    async fn simple_retry(&mut self, req: &Uri) -> Result<MaybeHttpsStream<TcpStream>, Error> {

After thinking about it and hacking a bit, I really do think we should use Retrier. Even though the API may not be super beautiful. It has the advantage that it is tested, and other parts of the code use it.

In the future if/when we upgrade it to use an external version, finding all parts of the code that need retry ability is easy, but if we use custom logic like this, it makes it harder to code search.

I created a commit on top of this branch that shows how it can be added without too much mess...

The only down side is it requires us to use more Arcs instead of Box, but I think its a minimal cost (~2x atomic add/sub calls on average).

6c2a27a

@aaronmondal aaronmondal force-pushed the s3-aws-sdk branch 2 times, most recently from 0679a05 to 6fa4f0a Compare November 19, 2023 05:14
Copy link
Contributor Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 17 files reviewed, 2 unresolved discussions (waiting on @allada, @blakehatch, and @MarcusSorealheis)


cas/store/s3_store.rs line 96 at r6 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

After thinking about it and hacking a bit, I really do think we should use Retrier. Even though the API may not be super beautiful. It has the advantage that it is tested, and other parts of the code use it.

In the future if/when we upgrade it to use an external version, finding all parts of the code that need retry ability is easy, but if we use custom logic like this, it makes it harder to code search.

I created a commit on top of this branch that shows how it can be added without too much mess...

The only down side is it requires us to use more Arcs instead of Box, but I think its a minimal cost (~2x atomic add/sub calls on average).

6c2a27a

Done.

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 17 files reviewed, 11 unresolved discussions (waiting on @aaronmondal, @blakehatch, and @MarcusSorealheis)


cas/store/s3_store.rs line 30 at r6 (raw file):

use aws_sdk_s3::primitives::{ByteStream, SdkBody};
use aws_sdk_s3::types::builders::{CompletedMultipartUploadBuilder, CompletedPartBuilder};
// use aws_sdk_s3::types::CompletedPart;

nit: Remove.


cas/store/s3_store.rs line 273 at r6 (raw file):

        // for multipart upload requests.
        if max_size < MIN_MULTIPART_SIZE {
            let (body, content_length) = if let UploadSizeInfo::ExactSize(sz) = upload_size {

By removing this code path we are forcing a copy of our data into a temp buffer when the content size is known.

Keep in mind that most uploads are small and size is known, so most uploads do hit this code path.


cas/store/s3_store.rs line 316 at r6 (raw file):

            .create_multipart_upload()
            .bucket(&self.bucket)
            .key(s3_path.clone())

super nit: I don't think you need to call clone() here, since it's an Into<String> and will do it internally.


cas/store/s3_store.rs line 331 at r6 (raw file):

            // the worst case.
            let mut completed_parts = Vec::with_capacity((max_size / bytes_per_upload_part) + 1);
            // let mut completed_part_futs = Vec::with_capacity(completed_parts.len());

nit: Remove.


cas/store/s3_store.rs line 333 at r6 (raw file):

            // let mut completed_part_futs = Vec::with_capacity(completed_parts.len());

            let max_concurrent_requests = self.multipart_max_concurrent_uploads;

nit: Inline this.


cas/store/s3_store.rs line 335 at r6 (raw file):

            let max_concurrent_requests = self.multipart_max_concurrent_uploads;

            let mut futures = futures::stream::FuturesUnordered::new();

nit: Import this and use the name directly.


cas/store/s3_store.rs line 367 at r6 (raw file):

                    }

                    let current_part_number = part_number;

nit: This is redundant, just use part_number below


cas/store/s3_store.rs line 396 at r6 (raw file):

            // Await remaining results.
            {
                use futures::StreamExt;

nit: Move this to top of file.


cas/store/s3_store.rs line 334 at r8 (raw file):

            // CompletedMultipartUploadBuilder keeps track of the response
            // order and lets S3 know about it.
            loop {

I don't think we should complicate things here. Instead we should just pub-sub into a stream that is bounded then in another future use buffered_unordered() to extract them into a vector then sort them.

I created a sample of what I mean here based on this PR:
0dc0133

Copy link
Contributor Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 17 files reviewed, 6 unresolved discussions (waiting on @allada, @blakehatch, and @MarcusSorealheis)


cas/store/s3_store.rs line 30 at r6 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

nit: Remove.

I think there was some commented out code here, but it's no longer present, potentially due your changes I merged in.


cas/store/s3_store.rs line 273 at r6 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

By removing this code path we are forcing a copy of our data into a temp buffer when the content size is known.

Keep in mind that most uploads are small and size is known, so most uploads do hit this code path.

I've verified that the removed branch indeed was used when added back. I added it back but had to change it a bit to make it compatible with SdkBody. I think this directly pipes our data through now.


cas/store/s3_store.rs line 316 at r6 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

super nit: I don't think you need to call clone() here, since it's an Into<String> and will do it internally.

Done.


cas/store/s3_store.rs line 334 at r8 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

I don't think we should complicate things here. Instead we should just pub-sub into a stream that is bounded then in another future use buffered_unordered() to extract them into a vector then sort them.

I created a sample of what I mean here based on this PR:
0dc0133

Interesting. I think I wasn't aware of the difference between the future-based and stream-based concurrency models. I'll need to look into the stream-based approach more.

Merged your proposed changes into the branch.

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 7 files at r7, 5 of 8 files at r9.
Reviewable status: 10 of 17 files reviewed, 1 unresolved discussion (waiting on @blakehatch and @MarcusSorealheis)


cas/store/s3_store.rs line 398 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Seems to add unnecessary complexity, and with the current state of the code it's not an issue. If we want to handle such things we should add global clippy configs to forbid undocumented panics.

Sorry, I should elaborate. What this branch is doing is telling AWS it can discard all the uploaded parts, so we don't get billed for it.

It has minimally to do with panics and more around if a stream is interrupted (upload or download) and the upload gets cancelled somewhere (all those ? can trigger a cancel). ScopeGuard is an inline wrapper for of Drop, similar to go's defer statement, so you can run a function when the variable goes out of scope. This is often used so if a future is ever dropped for any reason it'll run the logic. In this case it should just launch a spawn to abort the upload unless the upload succeeded.

An example of it being used for similar stuff:
https://github.com/TraceMachina/native-link/blob/aea37682dbed261c401e5025ffd77dff2711f699/cas/worker/running_actions_manager.rs#L746

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Sorry it took so long on this one. Was a bit of logic to review and was pretty busy last week.

Reviewed 1 of 10 files at r2, 1 of 8 files at r6, 3 of 7 files at r7, 3 of 8 files at r9, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @aaronmondal and @MarcusSorealheis)


cas/store/s3_store.rs line 498 at r9 (raw file):

                                // Handle the error, log it, or break the loop
                                return Some((
                                    RetryResult::Err(make_input_err!("Bad bytestream element in S3: {e}")),

nit: Shouldn't this be a RetryResult::Retry case?

This is a breaking change.

- The S3 store now uses the aws-sdk for Rust to handle higher level
  interactions with the S3 API.
- It now also supports HTTP/2 via hyper/rustls.
- The concurrency limits have been removed and are now only configurable
  for multipart uploads.
- The default behavior is now HTTPS. To enable unencrypted
  communication, an `insecure_allow_http` flag has been added.

Fixes TraceMachina#131
Copy link
Contributor Author

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 17 files reviewed, all discussions resolved (waiting on @allada and @MarcusSorealheis)


cas/store/s3_store.rs line 398 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

Sorry, I should elaborate. What this branch is doing is telling AWS it can discard all the uploaded parts, so we don't get billed for it.

It has minimally to do with panics and more around if a stream is interrupted (upload or download) and the upload gets cancelled somewhere (all those ? can trigger a cancel). ScopeGuard is an inline wrapper for of Drop, similar to go's defer statement, so you can run a function when the variable goes out of scope. This is often used so if a future is ever dropped for any reason it'll run the logic. In this case it should just launch a spawn to abort the upload unless the upload succeeded.

An example of it being used for similar stuff:
https://github.com/TraceMachina/native-link/blob/aea37682dbed261c401e5025ffd77dff2711f699/cas/worker/running_actions_manager.rs#L746

The only time we can call the abort, is when we know the upload id. so the only thing we can guard is the scope constructing the complete_result. Since all ? occurrences would cause the complete_result to contain an Err value (and not exit the entire function) the current logic seems sound to me. So the ? might trigger stream cancelling, but it doesn't matter because the error value will be captured in the complete_result and handled by the (IMO easier to understand) if complete_result.is_err() block.

If this is actually an issue we can fix it in a future PR.

@aaronmondal aaronmondal merged commit 6ce11ab into TraceMachina:main Nov 20, 2023
14 of 15 checks passed
@MarcusSorealheis
Copy link
Collaborator

I was too late and was still reviewing this one. Let's discuss in the next ticket. Large PR. Nice work @aaronmondal and @allada (on the review)

Copy link
Collaborator

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 17 files reviewed, all discussions resolved


cas/store/s3_store.rs line 398 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

The only time we can call the abort, is when we know the upload id. so the only thing we can guard is the scope constructing the complete_result. Since all ? occurrences would cause the complete_result to contain an Err value (and not exit the entire function) the current logic seems sound to me. So the ? might trigger stream cancelling, but it doesn't matter because the error value will be captured in the complete_result and handled by the (IMO easier to understand) if complete_result.is_err() block.

If this is actually an issue we can fix it in a future PR.

Yes, but if the client stream disconnects this future will abruptly be destroyed and no error will be returned, resulting in a partial upload. ScopeGuard would protect against that case.

TripleKai pushed a commit to TripleKai/turbo-cache that referenced this pull request Nov 29, 2023
This is a breaking change.

- The S3 store now uses the aws-sdk for Rust to handle higher level
  interactions with the S3 API.
- It now also supports HTTP/2 via hyper/rustls.
- The concurrency limits have been removed and are now only configurable
  for multipart uploads.
- The default behavior is now HTTPS. To enable unencrypted
  communication, an `insecure_allow_http` flag has been added.

Fixes TraceMachina#131
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Look for rusoto alternatives
4 participants