Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require that early-return compressed-blobs bytestream uploads set committed_size -1 #213

Conversation

mostynb
Copy link
Contributor

@mostynb mostynb commented Jan 30, 2022

We require that uncompressed bytestream uploads specify committed_size set to the size of the blob when returning early (if the blob already exists on the server).

We also require that for compressed bytestream uploads committed_size refers to the initial write offset plus the number of compressed bytes uploaded. But if the server wants to return early in this case it doesn't know how many compressed bytes would have been uploaded (the client might not know this ahead of time either). So let's require that the server set committed_size to -1 in this case.

For early return to work, we also need to ensure that the server does not return an error code.

Resolves #212.

…mitted_size -1

We require that uncompressed bytestream uploads specify committed_size
set to the size of the blob when returning early (if the blob already
exists on the server).

We also require that for compressed bytestream uploads committed_size
refers to the initial write offset plus the number of compressed bytes
uploaded. But if the server wants to return early in this case it doesn't
know how many compressed bytes would have been uploaded (the client might
not know this ahead of time either). So let's require that the server
set committed_size to -1 in this case.

For early return to work, we also need to ensure that the server does
not return an error code.

Resolves bazelbuild#212.
@bduffany
Copy link

bduffany commented Jan 31, 2022

IIUC, based on the reasoning in #212 (comment) we don't want to return AlreadyExists in this case because it is backwards incompatible with existing clients (notably, all versions of Bazel that currently exist in the wild).

But it seems like this approach of returning -1 is also incompatible, no? Clients currently treat committed_size == -1 as an error. Bazel, in particular, is effectively checking for committed_size == total length of client-compressed stream, whenever the server has sent back a WriteResponse, and Bazel has finished sending all bytestream chunks to the server, irrespective of whether the server sent back the WriteResponse early or not. If this check fails, then it returns the following error: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/ByteStreamUploader.java#L474

@bduffany
Copy link

bduffany commented Jan 31, 2022

Just spitballing here, but another option might be to make use of the new /optional_metadata field that was added to the ByteStream resource name specification, and allow the client to specify that they support short-circuiting, maybe like /?shortCircuit=true. If they do, then the server can safely return AlreadyExists.

@mostynb
Copy link
Contributor Author

mostynb commented Jan 31, 2022

To clarify, I think we should maintain backwards compatibility for uncompressed uploads (ie clients can check for committed_size == digest size), but we need to make a breaking change for compressed uploads because some existing clients are always going to be broken in the current setup unless we forbid servers from doing an early-return for compressed uploads (which doesn't sound great to me).

This feature is still experimental in bazel, so IMO we can decide on a reasonable behaviour that supports early-return for compressed uploads without changing the requirements for uncompressed uploads, and then submit a fix for bazel. I think specifying -1 for early-returned compressed uploads is simple, but I would also be ok with returning AlreadyExists (though it would be a slightly larger bazel-side fix).

Re using the bytestream resource metadata field, it's generally disliked (and anyway it would require a client-side fix, which which case we can pick another client side fix).

@mostynb
Copy link
Contributor Author

mostynb commented Feb 4, 2022

CC @AlessandroPatti who implemented this functionality in bazel- it looks like you tried to account for this scenario. Maybe there's a corner case where the uploader sends all the data but the server "returns early" in the last chunk? What do you think about the server returning -1 in this case (or alternatively returning an AlreadyExists error)?

@bergsieker
Copy link
Collaborator

My first impulse here is to make compressed and uncompressed uploads match. Both should return the total number of uncompressed bytes captured the blob (effectively, the size field in the blob's digest). This means that the server must uncompress the uploaded data to calculate the size, but servers must already do this to validate the digest. Of course, then we get back to the issue of returning compressed vs. uncompressed bytes that plagues our implementation of Bytestream.Write.

I think that if we're going to make a breaking change--which it does indeed seem like this situation requires--we should aim for the "correct" semantic of returning AlreadyExists.

@EricBurnett for any relevant context on the history of why we didn't do this in the first place.

@coeuvre for review on the Bazel side.

@bergsieker bergsieker removed the request for review from buchgr February 7, 2022 22:46
@EricBurnett
Copy link
Collaborator

EricBurnett commented Feb 7, 2022 via email

@mostynb
Copy link
Contributor Author

mostynb commented Feb 7, 2022

My first impulse here is to make compressed and uncompressed uploads match. Both should return the total number of uncompressed bytes captured the blob (effectively, the size field in the blob's digest). This means that the server must uncompress the uploaded data to calculate the size, but servers must already do this to validate the digest. Of course, then we get back to the issue of returning compressed vs. uncompressed bytes that plagues our implementation of Bytestream.Write.

I don't think there is necessarily a mapping of "N compressed bytes transferred" to "M uncompressed bytes transferred" for arbitrary compression formats, unless you allow for possibly confusing scenarios like "The client wrote a bunch of compressed data successfully, but not enough for the server to decode another frame, so the uncompressed bytes written did not increase". This is why we chose a kind of fake definition of committed_size for compressed blobs (initial uncompressed offset + compressed bytes written).

+1 for setting Committed Size to the uncompressed size of the blob for
early returns. That also has the advantage of matching
QueryWriteStatusResponse, which should be returning the uncompressed size
for a complete blob irrespective of encoding.

The downside of that approach is that committed_size might decrease during a successful upload, which feels super confusing (eg it increases monotonically as expected, for ztsd encoded data which turns out to be larger than the uncompressed data, and then the server does an early-return near the end of the transfer).

@bergsieker
Copy link
Collaborator

+1 for setting Committed Size to the uncompressed size of the blob for
early returns. That also has the advantage of matching
QueryWriteStatusResponse, which should be returning the uncompressed size
for a complete blob irrespective of encoding.

The downside of that approach is that committed_size might decrease during a successful upload, which feels super confusing (eg it increases monotonically as expected, for ztsd encoded data which turns out to be larger than the uncompressed data, and then the server does an early-return near the end of the transfer).

Hmmm, looking at this again, I think that returning uncompressed size contradicts the general guidance that we've given that committed_bytes should be (uncompressed offset) + (sum of compressed data bundles). So in that case successful, full compressed uploads should return the compressed data size, while successful, early-terminated compressed uploads would return the uncompressed data size. That makes no sense.

Side note: I thought we'd settled on returning the uncompressed data size for compressed uploads, for the reason that Eric mentioned--it's the only value that makes sense to provide as an offset to future calls. Am I misremembering?

I continue to believe that returning ALREADY_EXISTS for early termination of compressed uploads is the right approach. I think the only reason we didn't do that for uncompressed uploads is that it would have been a breaking change for Bazel, but if Bazel support for compressed uploads is experimental and we're OK with breaking it, we might as well break it in the right way.

@mostynb
Copy link
Contributor Author

mostynb commented Feb 8, 2022

Side note: I thought we'd settled on returning the uncompressed data size for compressed uploads, for the reason that Eric mentioned--it's the only value that makes sense to provide as an offset to future calls. Am I misremembering?

We decided on this mostly useless value in @nodirg's PR #193 but we missed the early-return corner case.

As I see it we probably have two kinds of clients that support compressed-blobs bytestream uploads:

  1. Clients like bazel which perform the wrong committed_size check and will randomly consider early-returned compressed-blobs uploads as failed. These will need updating regardless of whether we switch to an AlreadyExists error or a -1 sentinel value.

  2. Clients which ignore the committed_size check, but will consider any bytestream upload as failed if any non-EOF error is returned (since we specify no other behaviour for error values and streaming grpc is underdocumented IMO). These work now and will continue working if we use a -1 sentinel value, but will break if servers change to use an AlreadyExists error.

@bergsieker
Copy link
Collaborator

Per comments in the meeting: we believe that taking the less-disruptive course of returning -1 is a good idea here. The semantics of compressed vs. uncompressed bytes are already a mess, but we can't meaningfully clean them up until we move away from Bytestream.

Note that returning -1 still breaks Bazel's (experimental) support for compressed uploads.

@bergsieker
Copy link
Collaborator

Adding Sven for approval from Bazel before merging. He can stand in for Chi, who is having trouble with his Github account.

@AlessandroPatti
Copy link

I see there has been a consensus towards using -1, so I might be late to the party. FWIW, I'd also lean towards using AlreadyExists, which seems the most semantically correct solution here.

Re using the bytestream resource metadata field, it's generally disliked

@mostynb Could you elaborate on why is that? It seems like it could allow using AlreadyExists while keeping backward comptibility with existing clients

@mostynb
Copy link
Contributor Author

mostynb commented Feb 10, 2022

@mostynb Could you elaborate on why is that? It seems like it could allow using AlreadyExists while keeping backward comptibility with existing clients

Using the bytestream metadata came up early in the compressed-blobs feature planning discussion for other purposes, but IIRC it was considered too free-form. We don't know what if anything existing clients use it for so we can't avoid causing potential conflicts.

Another point against using the AlreadyExists error code came up in the meeting- apparently it was used in an early (pre v1?) version of the spec, and was removed for some reason that nobody can quite remember, but it was significant enough to make people accept a breaking change.

mostynb added a commit to mostynb/bazel-remote that referenced this pull request Feb 19, 2022
This is an implementation of this REAPI spec update:
bazelbuild/remote-apis#213

Which is part of the solution to this issue:
bazelbuild/bazel#14654
mostynb added a commit to mostynb/bazel that referenced this pull request Feb 19, 2022
This is an implementation of this REAPI spec update:
bazelbuild/remote-apis#213

Here's a bazel-remote build that can be used to test this change:
buchgr/bazel-remote#527

Fixes bazelbuild#14654
@mostynb
Copy link
Contributor Author

mostynb commented Feb 19, 2022

Here's a bazel-remote PR that can be used to test this change: buchgr/bazel-remote#527
And a bazel PR: bazelbuild/bazel#14870

Copy link
Member

@coeuvre coeuvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. My GitHub account was suspended for no reasons so I lost all the notifications during that period.

LGTM from Bazel side.

mostynb added a commit to mostynb/bazel that referenced this pull request Feb 21, 2022
This is an implementation of this REAPI spec update:
bazelbuild/remote-apis#213

Here's a bazel-remote build that can be used to test this change:
buchgr/bazel-remote#527

Fixes bazelbuild#14654
bazel-io pushed a commit to bazelbuild/bazel that referenced this pull request Feb 22, 2022
This is an implementation of this REAPI spec update:
bazelbuild/remote-apis#213

Here's a bazel-remote build that can be used to test this change:
buchgr/bazel-remote#527

Fixes #14654

Closes #14870.

PiperOrigin-RevId: 430167812
brentleyjones pushed a commit to brentleyjones/bazel that referenced this pull request Feb 22, 2022
This is an implementation of this REAPI spec update:
bazelbuild/remote-apis#213

Here's a bazel-remote build that can be used to test this change:
buchgr/bazel-remote#527

Fixes bazelbuild#14654

Closes bazelbuild#14870.

PiperOrigin-RevId: 430167812
(cherry picked from commit d184e48)
Wyverald pushed a commit to bazelbuild/bazel that referenced this pull request Feb 22, 2022
This is an implementation of this REAPI spec update:
bazelbuild/remote-apis#213

Here's a bazel-remote build that can be used to test this change:
buchgr/bazel-remote#527

Fixes #14654

Closes #14870.

PiperOrigin-RevId: 430167812
(cherry picked from commit d184e48)

Co-authored-by: Mostyn Bramley-Moore <mostyn@antipode.se>
@mostynb
Copy link
Contributor Author

mostynb commented Feb 23, 2022

The fix has landed in bazel now, any objections to merging this so I can make a new bazel-remote release?

@bergsieker bergsieker merged commit 04784f4 into bazelbuild:main Feb 23, 2022
@mostynb mostynb deleted the clarify_compressed_bytestream_write_committed_size branch February 23, 2022 18:37
mostynb added a commit to buchgr/bazel-remote that referenced this pull request Feb 23, 2022
This is an implementation of this REAPI spec update:
bazelbuild/remote-apis#213

Which is part of the solution to this issue:
bazelbuild/bazel#14654
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compression: Further specify ByteStream WriteResponse committed_size field for compressed blobs
6 participants