Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operation stream termination is not an error #18785

Closed
wants to merge 1 commit into from

Conversation

werkt
Copy link
Contributor

@werkt werkt commented Jun 27, 2023

According to the GrpcRemoteExecutor when it occurs after a !done operation response. Remove the error from the
ExperimentalRemoteGrpcExecutor and reinforce both with tests.

Update the FakeExecutionService to generate compatible error responses that appear in the ExecuteResponse, rather than the operation error field, per the REAPI spec. Made required adjustments to ExGRE Test invocations to avoid the ExecutionStatusException interpretation of DEADLINE_EXCEEDED -> FAILED_PRECONDITION in ExecuteResponse.

@werkt werkt requested a review from a team as a code owner June 27, 2023 04:26
@google-cla
Copy link

google-cla bot commented Jun 27, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@github-actions github-actions bot added awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team labels Jun 27, 2023
@werkt werkt changed the title ExGRE Operation stream termination is not an error Operation stream termination is not an error Jun 27, 2023
@werkt
Copy link
Contributor Author

werkt commented Jun 29, 2023

@coeuvre any chance of a review here?

Copy link
Member

@coeuvre coeuvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, the initial notification somehow got lost in my inbox.

@werkt werkt force-pushed the exgre-stream-finish branch 3 times, most recently from 5cc12c8 to 461a531 Compare June 30, 2023 20:01
@werkt werkt requested a review from coeuvre June 30, 2023 20:01
@werkt
Copy link
Contributor Author

werkt commented Jul 2, 2023

I followed up on the asynchronous breakage here in bazelbuild/bazel-central-registry#725, @coeuvre no rush, but it could use some googler clout to get fixed up.

According to the GrpcRemoteExecutor when it occurs after a !done
operation response. Remove the error from the
ExperimentalRemoteGrpcExecutor and reinforce both with tests.

Update the FakeExecutionService to generate compatible error responses
that appear in the ExecuteResponse, rather than the operation error
field, per the REAPI spec. Made required adjustments to ExGRE Test
invocations to avoid the ExecutionStatusException interpretation of
DEADLINE_EXCEEDED -> FAILED_PRECONDITION in ExecuteResponse.
Copy link
Member

@coeuvre coeuvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@coeuvre coeuvre added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Jul 5, 2023
@werkt
Copy link
Contributor Author

werkt commented Jul 10, 2023

Can this be merged?

@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Jul 10, 2023
@werkt werkt deleted the exgre-stream-finish branch July 10, 2023 18:31
werkt added a commit to werkt/bazel that referenced this pull request Jul 17, 2023
According to the GrpcRemoteExecutor when it occurs after a !done operation response. Remove the error from the
ExperimentalRemoteGrpcExecutor and reinforce both with tests.

Update the FakeExecutionService to generate compatible error responses that appear in the ExecuteResponse, rather than the operation error field, per the REAPI spec. Made required adjustments to ExGRE Test invocations to avoid the ExecutionStatusException interpretation of DEADLINE_EXCEEDED -> FAILED_PRECONDITION in ExecuteResponse.

Closes bazelbuild#18785.

PiperOrigin-RevId: 546925894
Change-Id: I7a489c8bc936a83cfd94e0138437f3fe6d152da8
werkt added a commit to werkt/bazel that referenced this pull request Jul 17, 2023
According to the GrpcRemoteExecutor when it occurs after a !done operation response. Remove the error from the
ExperimentalRemoteGrpcExecutor and reinforce both with tests.

Update the FakeExecutionService to generate compatible error responses that appear in the ExecuteResponse, rather than the operation error field, per the REAPI spec. Made required adjustments to ExGRE Test invocations to avoid the ExecutionStatusException interpretation of DEADLINE_EXCEEDED -> FAILED_PRECONDITION in ExecuteResponse.

Closes bazelbuild#18785.

PiperOrigin-RevId: 546925894
Change-Id: I7a489c8bc936a83cfd94e0138437f3fe6d152da8
@werkt werkt mentioned this pull request Jul 18, 2023
werkt added a commit to werkt/bazel that referenced this pull request Jul 24, 2023
According to the GrpcRemoteExecutor when it occurs after a !done operation response. Remove the error from the
ExperimentalRemoteGrpcExecutor and reinforce both with tests.

Update the FakeExecutionService to generate compatible error responses that appear in the ExecuteResponse, rather than the operation error field, per the REAPI spec. Made required adjustments to ExGRE Test invocations to avoid the ExecutionStatusException interpretation of DEADLINE_EXCEEDED -> FAILED_PRECONDITION in ExecuteResponse.

Closes bazelbuild#18785.

PiperOrigin-RevId: 546925894
Change-Id: I7a489c8bc936a83cfd94e0138437f3fe6d152da8
keertk pushed a commit that referenced this pull request Jul 25, 2023
* Include stack trace in all gRPC errors when --verbose_failures is set.

Also refactor a couple places where the stack trace was included in an ad-hoc
manner, and force Utils.grpcAwareErrorMessage callers to be explicit to avoid
future instances.

Closes #16086.

PiperOrigin-RevId: 502854490
Change-Id: Id2d6a1728630fffea9399b406378c7f173b247bd

* Avoid discarding SRE state for IO cause

Unwrapping all StatusRuntimeExceptions in in ReferenceCountedChannel when caused by IOException will discard critical tracing and retriability. The Retrier evaluations may not see an SRE in the causal chain, and presume it is invariably an unretriable exception. In general, IOExceptions as SRE wrappers are unsuitable containers and are routinely misued either for identification (grpc aware status), or capture (handleInitError).

Partially addresses #18764 (retries will occur with SSL handshake timeout, but the actual connection will not be retried)

Closes #18836.

PiperOrigin-RevId: 546037698
Change-Id: I7f6efcb857c557aa97ad3df085fc032c8538eb9a

* Operation stream termination is not an error

According to the GrpcRemoteExecutor when it occurs after a !done operation response. Remove the error from the
ExperimentalRemoteGrpcExecutor and reinforce both with tests.

Update the FakeExecutionService to generate compatible error responses that appear in the ExecuteResponse, rather than the operation error field, per the REAPI spec. Made required adjustments to ExGRE Test invocations to avoid the ExecutionStatusException interpretation of DEADLINE_EXCEEDED -> FAILED_PRECONDITION in ExecuteResponse.

Closes #18785.

PiperOrigin-RevId: 546925894
Change-Id: I7a489c8bc936a83cfd94e0138437f3fe6d152da8

* Done operations must be reexecuted

Any operation with done == true as reported by the server is not expected to change its result on subsequent waitExecution calls. To properly retry, this action must be reexecuted, if it was truly transient, to achieve a definitive result. Submit a transient status for retry, disallow special behaviors for NOT_FOUND as covered by done observation, and consider method type when handling operation streams.

Closes #18943.

PiperOrigin-RevId: 548680656
Change-Id: Ib2c9887ead1fbd3de97761db6e8b4077783ad03c

---------

Co-authored-by: Tiago Quelhas <tjgq@google.com>
@iancha1992
Copy link
Member

The changes in this PR have been included in Bazel 6.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. If you're using Bazelisk, you can point to the latest RC by setting USE_BAZEL_VERSION=last_rc.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants