Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry GrpcStore write #326

Conversation

chrisstaite-menlo
Copy link
Collaborator

@chrisstaite-menlo chrisstaite-menlo commented Oct 19, 2023

Description

When writing to the GrpcStore retry was not implemented as there was no easy way to rewind the stream. However, most errors due to high concurrency occur before any data has been written. Therefore, refactor the GrpcStore::write such that it can retry if the stream has not yet been used.

Fixes #325

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

High concurrency use of the GrpcStore, no longer producing errors from the ByteStreamServer.

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@chrisstaite-menlo
Copy link
Collaborator Author

chrisstaite-menlo commented Oct 19, 2023

This still doesn't resolve the issue:

[2023-10-19T12:59:48.159Z ERROR bytestream_server] Write Resp: 0.5610117 None Status { code: Unknown, message: "status: Unknown, message: \"transport error\", details: [], metadata: MetadataMap { headers: {} } : in GrpcStore::write : in GrpcStore::update() : Error updating inner store : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.159Z ERROR bytestream_server] Write Resp: 0.56068933 None Status { code: Unknown, message: "status: Unknown, message: \"transport error\", details: [], metadata: MetadataMap { headers: {} } : in GrpcStore::write : in GrpcStore::update() : Error updating inner store : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.159Z ERROR bytestream_server] Write Resp: 0.561223 None Status { code: Unknown, message: "status: Unknown, message: \"transport error\", details: [], metadata: MetadataMap { headers: {} } : in GrpcStore::write : in GrpcStore::update() : Error updating inner store : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.160Z ERROR bytestream_server] Write Resp: 0.5613128 None Status { code: Unknown, message: "status: Unknown, message: \"transport error\", details: [], metadata: MetadataMap { headers: {} } : in GrpcStore::write : in GrpcStore::update() : Error updating inner store : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.160Z ERROR bytestream_server] Write Resp: 0.56141317 None Status { code: Unknown, message: "status: Unknown, message: \"transport error\", details: [], metadata: MetadataMap { headers: {} } : in GrpcStore::write : in GrpcStore::update() : Error updating inner store : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.160Z ERROR bytestream_server] Write Resp: 0.5612801 None Status { code: Unknown, message: "status: Unknown, message: \"transport error\", details: [], metadata: MetadataMap { headers: {} } : in GrpcStore::write : in GrpcStore::update() : Error updating inner store : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.161Z ERROR bytestream_server] Write Resp: 0.56190264 None Status { code: Unknown, message: "status: Unknown, message: \"transport error\", details: [], metadata: MetadataMap { headers: {} } : in GrpcStore::write : in GrpcStore::update() : Error updating inner store : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.161Z ERROR bytestream_server] Write Resp: 0.56241786 None Status { code: Unknown, message: "status: Unknown, message: \"transport error\", details: [], metadata: MetadataMap { headers: {} } : in GrpcStore::write : in GrpcStore::update() : Error updating inner store : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.161Z ERROR bytestream_server] Write Resp: 0.5626644 None Status { code: Unknown, message: "status: Unknown, message: \"transport error\", details: [], metadata: MetadataMap { headers: {} } : in GrpcStore::write : in GrpcStore::update() : Error updating inner store : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.162Z ERROR bytestream_server] Write Resp: 0.5632102 None Status { code: Unknown, message: "status: Unknown, message: \"transport error\", details: [], metadata: MetadataMap { headers: {} } : in GrpcStore::write : in GrpcStore::update() : Error updating inner store : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.468Z ERROR bytestream_server] Write Resp: 0.000052174 None Status { code: InvalidArgument, message: "Received out of order data. Got 0, expected 491 : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.472Z ERROR bytestream_server] Write Resp: 0.00005941 None Status { code: InvalidArgument, message: "Received out of order data. Got 0, expected 75 : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.487Z ERROR bytestream_server] Write Resp: 0.000068874 None Status { code: InvalidArgument, message: "Received out of order data. Got 0, expected 76 : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.504Z ERROR bytestream_server] Write Resp: 0.000075709 None Status { code: InvalidArgument, message: "Received out of order data. Got 0, expected 160 : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.525Z ERROR bytestream_server] Write Resp: 0.000096294 None Status { code: InvalidArgument, message: "Received out of order data. Got 0, expected 94 : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.527Z ERROR bytestream_server] Write Resp: 0.000058501 None Status { code: InvalidArgument, message: "Received out of order data. Got 0, expected 335 : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.533Z ERROR bytestream_server] Write Resp: 0.000061099 None Status { code: InvalidArgument, message: "Received out of order data. Got 0, expected 81 : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.533Z ERROR bytestream_server] Write Resp: 0.000042423 None Status { code: InvalidArgument, message: "Received out of order data. Got 0, expected 78 : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.680Z ERROR bytestream_server] Write Resp: 0.000070509 None Status { code: InvalidArgument, message: "Received out of order data. Got 0, expected 79 : In ByteStreamServer::write()", source: None }
[2023-10-19T12:59:48.711Z ERROR bytestream_server] Write Resp: 0.000055012 None Status { code: InvalidArgument, message: "Received out of order data. Got 0, expected 306 : In ByteStreamServer::write()", source: None }

The remote logs:

[2023-10-19T12:59:35.565Z ERROR cas] Failed running service : hyper::Error(Http2, Error { kind: GoAway(b"[p]req HEADERS: max concurrency reached", PROTOCOL_ERROR, Remote) })
[2023-10-19T12:59:36.456Z ERROR cas] Failed running service : hyper::Error(Http2, Error { kind: GoAway(b"[p]req HEADERS: stream session alloc failed", INTERNAL_ERROR, Remote) })
[2023-10-19T12:59:37.509Z ERROR cas] Failed running service : hyper::Error(Http2, Error { kind: GoAway(b"[p]req HEADERS: stream session alloc failed", INTERNAL_ERROR, Remote) })
[2023-10-19T12:59:38.404Z ERROR cas] Failed running service : hyper::Error(Http2, Error { kind: GoAway(b"[p]req HEADERS: stream session alloc failed", INTERNAL_ERROR, Remote) })
[2023-10-19T12:59:38.706Z ERROR cas] Failed running service : hyper::Error(Http2, Error { kind: GoAway(b"[p]req HEADERS: max concurrency reached", PROTOCOL_ERROR, Remote) })

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@aaronmondal , do you want to take this one?

Reviewed all commit messages.
Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @aaronmondal)

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisstaite-menlo LGTM, just needs rebase onto your other PR ❤️

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @chrisstaite-menlo)

@chrisstaite-menlo chrisstaite-menlo force-pushed the grpc_store_write_retry branch 2 times, most recently from d837348 to ad20db9 Compare October 30, 2023 10:30
@chrisstaite-menlo chrisstaite-menlo merged commit 6006e23 into TraceMachina:main Nov 9, 2023
11 of 14 checks passed
Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@chrisstaite-menlo chrisstaite-menlo deleted the grpc_store_write_retry branch November 15, 2023 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GrpcStore doesn't retry on write
3 participants