-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify that stderr and stdout streams are resource names #116
Conversation
Could we go further and give an example? In particular, I'm interested in what protocol we use, and how things are formatted. I believe the scheme Bazel uses for bytestreams listed in the Build Event Stream is:
So for example for a RBE bytestream, we'd use
Would there be any objections to using the exact same scheme here? In particular, the protocol is 'bytestream', and the API version is not included in the URI (there's no /v1/ component), only protocol/endpoint/resource are concretely specified. Note that the resource itself is arbitrary - everyone is welcome to use any path resource they wish, no restrictions on components or use of We could also leave the protocol off the URI, since it's listed in the API as being a bytestream URI anyways - that'd look I guess like |
I am not particularly experienced with gRPC, so please bear with me. I definitely like the idea of having the My idea would be that someone could execute a remote command with an RE client then hand off to Now, Bytestream doesn't actually support that at all because it is missing the So, I would suggest that the worker is allowed to return any protocol (http, https, grpc, etc.) and it be up to the client to do its best to understand it (knowing that it will at least be bytestream). Additionally, how does this play with the stripping of |
Roughly speaking, there's two directions you could go with a URI/URL field:
The first one is not really viable though, for reasons which I'll give you a longwinded explanation :). So it really has to be an abstract URI, and then the question is mostly just details of formatting, where "what bazel already standardized" is a pretty good choice. The URI I listed above is an example of (2), the abstract specification - it tells you of a bytestream resource and the endpoint it's available from, but it's up to you how to call it. In practice you'd probably translate this URI to a gRPC request (details below), but you could in theory access it over REST instead. That does come with the cost that you have to translate the URI before you can use it, but that's actually going to be true for gRPC anyways so only really a cost if you'd be returning HTTPS URLs that would be accessed by HTTPS. As for using one of these URIs via gRPC, it breaks down as (approximately):
I was mistaken in mentioning V1 - ByteStream as currently defined is a versionless protocol, so there's no version to include anywhere. But in the abstract, if you imagined there being a V2 of ByteStream, where this would come in is as a client you'd have to decide if you were calling ByteStream V1 or ByteStream V2, and link in the appropriate gRPC library. When you went to call it, it'd be adding the appropriate version to the underlying HTTP URL under the hood. In practice, clients are usually bound to one specific version of a protocol, so whether a client could talk to a server would be a question of whether they speak compatible versions of the protocol. That's independent of the resources being referenced. But as I said, academic for ByteStream - it's a versionless API with no plans to ever have a V2, so there should be no cases where the client and server ever can speak incompatible versions. I should note that By contrast then, going back the idea of (1) - a fully-specified URL that might be gRPC or HTTP - then means the resource can be only accessed over one specific protocol, and if you want to access it over something else you either can't, or have to translate one URL into a different URL which is a more hacky solution and requires essentially the same assumption. It also has a glaring flaw: grpc URLs are not well defined. I mean, gRPC when transported over HTTPS is defined to have a URL like REST translations do have a well-defined URL, but as you pointed out, the ByteStream API doesn't actually embed a REST translation. So we'd have to define our own URLs, but then we're really just back to the "abstract" pattern except trying to embed both the fact that it should be accessed through ByteStream and which specific protocol (gRPC or HTTP or other) that should be done through - which to me is not actually better than just specifying the resource. |
Ah ha, thank you for the explanation! If the client has to deconstruct the URL in some cases to determine the components and the "endpoint" is likely to be the already existing configured bytestream service: should we just leave this as a resource name and allow an optionally specified endpoint addition to override the "default" Bytestream service being used elsewhere in the spec. The |
I'd also be fine defining it to be a resource available on the same endpoint as the CAS Bytestream interface, and so only the relative component. That's how we'd use it, at any rate (for us, I guess specifying that it may be either
or
would serve that purpose, though I'd want to confirm there's anyone expecting to use the endpoint form before we commit to having two variants specified. |
I think it was me who wanted to have a different endpoint :) I understand that not everyone would need that, so separating them into two components doesn't change the spec significantly. The implication being that the resource is available at the same place as your already configured CAS. |
@eric Burnett <ericburnett@google.com> I wanted to clarify the language so
that clients know what level of configurability they should support - if
CAS, Execute, ActionCache and Remote Asset Fetch/Push and potentially log
streams can live on different endpoints, they would need to be able to be
configured. Otherwise servers that implement these services to live on
separate endpoints won't be able to be used with clients that expect
everything on the same endpoint.
In short, I wanted to remove the ambiguity around service endpoints.
…On Fri, Feb 21, 2020 at 7:57 PM Eric Burnett ***@***.***> wrote:
I'd also be fine defining it to be a resource available on the same
endpoint as the CAS Bytestream interface, and so only the relative
component. That's how we'd use it, at any rate (for us,
projects/foo/instances/bar/logstreams/00aabb or similar.) I remember
there being some objection to that assumption though in the monthly
meeting? I think from @sstriker <https://github.com/sstriker>, though I
may be mis-remembering.
I guess specifying that it may be either
`resource` - implied to be a complete resource path relative to the same endpoint as the CAS Bytestream
or
`//endpoint/resource` - a fully specified resource path with endpoint.
would serve that purpose, though I'd want to confirm there's anyone
expecting to use the endpoint form before we commit to having *two*
variants specified.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#116?email_source=notifications&email_token=AAQVXZ7T4WAVYWERXGXK6M3REAPZ5A5CNFSM4KTDMNP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMTXGUY#issuecomment-589787987>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQVXZ5PF2JMN3FCBC6DL2TREAPZ5ANCNFSM4KTDMNPQ>
.
|
Ok, thanks both! Ed: ahh, apologies! I have a terrible memory sometimes. Can you clarify if you're voting for
I'm having trouble parsing from your response which you'd currently prefer to standardize. Sander: I think one risk here is that if we standardize both 'resource' and '//endpoint/resource', but most clients only end up seeing 'resource' paths in practice, that's all they'll support in practice until updated to know the other form explicitly. Not sure how you feel about that. |
Thanks @eric Burnett <ericburnett@google.com>. I was thinking more of
either saying:
1. We expect all of the services defined within the Remote Execution
API and within the Remote Asset API to be exposed via a single endpoint, or
2. We expect each of the services defined within the Remote Execution
API and within the Remote Asset API to potentially be exposed via separate
endpoints, or
3. We expect services X, Y and Z defined within the Remote Execution
API to be exposed via a single endpoint, and service A and B...
We can take this to a separate issue though.
With respect to streams, I agree with you that there is a risk of clients
only really implementing 1, if we choose 2. In other words, if we want to
ability to direct clients to specific endpoints for log streams, we need to
make it 3.
…On Fri, Feb 21, 2020 at 10:22 PM Eric Burnett ***@***.***> wrote:
Ok, thanks both!
Ed: ahh, apologies! I have a terrible memory sometimes. Can you clarify if
you're voting for
1. 'resource'
2. resource' OR '//endpoint/resource'
3. '//endpoint/resource'
I'm having trouble parsing from your response which you'd currently prefer
to standardize.
Sander: I think one risk here is that if we standardize both 'resource'
and '//endpoint/resource', but most clients only end up seeing 'resource'
paths in practice, that's all they'll support in practice until updated to
know the other form explicitly. Not sure how you feel about that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#116?email_source=notifications&email_token=AAQVXZZM6Z73BXFNEUNBQJLREBAZ5A5CNFSM4KTDMNP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMUEBPQ#issuecomment-589840574>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQVXZZHYF2WNOP6HHMKUDDREBAZ5ANCNFSM4KTDMNPQ>
.
|
Just to clarify, you are talking about stdout and stderr, but will the API
support streaming other log files too? My long running actions sometimes
write other debug log files which would be useful to stream. They are not
written to stdout as that would spam the console output in Bazel.
/Fredrik
|
@moroten : The current API only covers stdout and stderr, but in principle it could be extended to more, and if it were I'm comfortable saying they'd be expressed in a consistent way as these ones. I don't know of anyone working on the spec changes for that at present though, as we'd need to discuss how clients request files to be streamed back and whatnot. @sstriker : The current API doesn't specify whether Services are hosted together or separately, and in principle they could all be different endpoints. No client I'm aware of has that flexibility at present however - bazel can currently support only two iirc, and we ourselves always set them to the same value. I wonder if that suggests another path forward here: specify in this change that all URIs are relative form ( I realize it entirely sidesteps the question of what we expect, but I think that's in part because expectations vary. On the API side, it's possible today to host every API on a different endpoint; on the client side, bazel at least expects use of only one or two, and other clients have potentially different expectations but probably not full variability. @edbaunton : does that also satisfy your use-case? Relative resources, but relative to an endpoint specified out-of-band and explicitly allowed to be something other than the CAS endpoint? And then, in terms of cleaning up the broader endpoint confusion so client expectations can be made more consistent with each other and with services, there are two directions we could go as a community. One is to say that clients should expect all APIs available at a single endpoint, period. The other is to define a clear set of different 'virtual' endpoints (including the concrete Services themselves, like the Execution service, plus 'CAS blob bytestreams', 'streamed bytestreams', etc), specify how they're bucketed in terms of which are allowed to vary independently and which should be co-hosted, and recommend a consistent way to configure clients and/or let them discover it (via Capabilities probably). To me that feels worthwhile to settle, but also a big enough thing we should probably make it its own bug and target it at V3. WDYT? |
FWIW, currently Pants allows specifying the CAS and Execution servers differently, and Scoot, Twitter's server implementation, currently assumes they will be set distinctly. |
@illicitonion Scoot expecting them to be different is interesting, but otherwise I think that fits the idea that we should hash out what are and independently variable outside the scope of this PR. Clearly there are interesting and different constraints to collect! @edbaunton : if you don't have any concerns, mind updating the PR in line with the suggestion above? Copied for convenience:
|
In lieu of a more fully fledged service discovery layer, explicitly specify that the stderr and stdout stream names are in fact resource names from which one can Bytestream.Read. The actual endpoints are to be configured out of band. Signed-off-by: Ed Baunton <ebaunton1@bloomberg.net>
Thanks Ed! LGTM from me. |
I've pulled out the discussion of "what to do in V3?" into its own bug: #132 |
In lieu of a more fully fledged service discovery layer, explicitly specify that the stderr and stdout stream names are in fact URIs from which one can Bytestream.Read.