-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix trailers race condition #26
Conversation
See branch reproduce/unknown-status-issue. We assume that trailers only responses are not an issue, and we should never hang when there is no body in the response. This is consistent with expectations of HTTP2, so any hangs would probably indicate an issue with the underlying HTTP2 library. Should be tested with the etcd example.
Thanks, great catch! This definitely makes sense. I will test it for our main use case (streaming). |
It works well for our use case, however your intuition to check with the |
You are right. I just tested in our codebase as well, and it hanged on a non-existing method. I'll also have a look. |
Tested to not hang on unknown methods, but may need some improvements.
I adapted the solution, and at least in my test it won't hang when the method is changed to a non-registered one. The server does not return gprc headers in that case so I may have to refine the solution a bit, let me know how it works for you. |
We should probably also probe for the presence of I added a commit that implements this solution. |
Thanks again @doctor-pi, it all looks good in my tests now. I'll review and merge soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. Thanks again for your contribution!
Nice. I'll merge. |
I guess I should have squashed the commits? |
Don't worry about that, we don't have any such policy (yet). |
We recently noticed that we were getting Status = Unknown with the default message "Server did not return grpc-status", when a server was in fact responding properly with a non-default value.
We now reproduced the issue in both
grpc-async
andgrpc-lwt
, using a variation of the greeter client example, with repeated requests. It seems that we randomly fail to get the real status, and we are filling in the default.This seems to be a race condition due to the way this is implemented, by checking if a future is filled when in fact it is simply not filled yet.
How to reproduce
See branch
reproduce/unknown-status-issue
for a reproduction of the issue.(https://github.com/dialohq/ocaml-grpc/tree/reproduce/unknown-status-issue.)
Run this in one terminal:
dune exec -- examples/greeter-server-lwt/greeter_server_lwt.exe
Then run either of the following and it should blow up at some random (numbered) message:
dune exec -- examples/greeter-test-async/test_grpc_status_async.exe test
dune exec -- examples/greeter-test-lwt/test_grpc_status_lwt.exe test
Check solution
See branch
test/unknown-status-issue
, which contains the same commits that are in this branch (the solution), on top of the code to reproduce the issue.(https://github.com/dialohq/ocaml-grpc/tree/test/unknown-status-issue.)
Run the same commands as above and after 100_000 iterations the programs should terminate without error.
(Feel free to remove or increase the limit, I tested without it but it's convenient.)
Assumptions
We assume that trailers only responses are not an issue, and we should never hang when there is no body in the response. This is consistent with expectations of HTTP2, so any hangs would probably indicate an issue with the underlying HTTP2 library and should probably be addressed there.
Should be tested with the etcd example, see #1.
Could it be an etcd / proxy issue? I think there might be a missing trailer handler, but the situation needs some confirmation.
Update (13-Apr-2023)
Refactored the code a bit.