-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HttpClient: Improve logging and error messages when mismatch between client and server protocol/TLS #31132
Comments
We do have System.Net tracing already in place. See Can you be more specific about where logging or error messages need to be added or changed? |
I'm going to create a test app and give the errors a developer sees in each of these examples. |
Thanks. That should help us figure out what is happening regarding "errors" which I think you mean exceptions generated and exception message text. But what about "logging"? We do have extensive System.Net tracing available that can be enabled and viewed with tools like PerfView. Are there things there that are missing? We expect developers to use these tracing mechanisms as well as rely on looking at exception message text. |
I'm doing the same tests on the server side in ASP.NET. I copy and pasted most of the issue. Ignore logging in the client. I'm more interested in the exception message for HttpClient. |
Test app: https://github.com/JamesNK/HttpConnectionErrors HttpClient output:
```
Client: HTTP/1.1 - No TLS
Server: HTTP/1.1 - No TLS
Success!
============================= Client: HTTP/1.1 - No TLS ============================= Client: HTTP/1.1 - No TLS ============================= Client: HTTP/1.1 - No TLS ============================= Client: HTTP/1.1 - No TLS ============================= Client: HTTP/1.1 - No TLS ============================= Client: HTTP/1.1 - No TLS ============================= Client: HTTP/1.1 - No TLS ============================= Client: HTTP/1.1 - No TLS ============================= Client: HTTP/1.1 - TLS ============================= Client: HTTP/1.1 - TLS ============================= Client: HTTP/1.1 - TLS ============================= Client: HTTP/1.1 - TLS ============================= Client: HTTP/1.1 - TLS ============================= Client: HTTP/1.1 - TLS ============================= Client: HTTP/1.1 - TLS ============================= Client: HTTP/1.1 - TLS ============================= Client: HTTP/1.1 - TLS ============================= Client: HTTP/2 - No TLS ============================= Client: HTTP/2 - No TLS ============================= Client: HTTP/2 - No TLS ============================= Client: HTTP/2 - No TLS ============================= Client: HTTP/2 - No TLS ============================= Client: HTTP/2 - No TLS ============================= Client: HTTP/2 - No TLS ============================= Client: HTTP/2 - No TLS ============================= Client: HTTP/2 - No TLS ============================= Client: HTTP/2 - TLS ============================= Client: HTTP/2 - TLS ============================= Client: HTTP/2 - TLS ============================= Client: HTTP/2 - TLS ============================= Client: HTTP/2 - TLS ============================= Client: HTTP/2 - TLS ============================= Client: HTTP/2 - TLS ============================= Client: HTTP/2 - TLS ============================= Client: HTTP/2 - TLS =============================
|
I'm going to list the error messages that I think are unclear:
I understand that you may not be able to perfectly understand how the server is configured, but I think there are probably error messages here that could be improved by inspecting returned packets and using them to say why they are invalid. |
That one is also misleading. Include the response version in your results? It's silently downgrading. |
Ah, I just went through the server logs and I was curious about that one. I felt like it shouldn't have been able to succeed, yet the server didn't report an error. I didn't know HTTP/2 had the ability to silently downgrade. Edit:
|
btw, I'm doing this same exercise on ASP.NET Core server - dotnet/aspnetcore#14884 |
That's a client feature. It's probably sending ALPN values for HTTP/1.1 and HTTP/2, and when the server returns only HTTP/1.1 then the client silently downgrades to it, even though you asked for HTTP/2. They would need a different flag on the client to prohibit downgrading (via not including HTTP/1.1 in the ALPN list). |
I don't believe this is going to add enough value to be worthwhile. But you're of course welcome to submit a PR if you believe otherwise. (Part of the reason HTTP/2 is so wishy-washy about non-TLS support, and why our client doesn't officially support non-TLS, is because you need a priori knowledge of whether the server is doing TLS or not, and these failure modes just highlight that.) |
Don't underestimate how difficult it is to get this right. I would guess at least 50% of gRPC support issues are related to TLS and protocol. I think there are two causes:
This is impacting people. We can't solve 1 but we can try to make 2 better. Would it help if I narrowed down the most common reported errors? |
Exactly. |
Do you want me to stop? Narrowing things down, most errors fall under three categories:
|
Pasting all of these call stacks and cases doesn't really help, though I appreciate your efforts to try to improve things. There are generally three categories of changes I think you're hinting at:
I'm against (1). It'd require, for example, having SslStream store data it read and surface it for a higher-level component to interpret, or pre-read the content (understanding the TLS format outside of SslStream) and then pass it off to SslStream. Regardless, it adds a ton of complication, likely adds expense, has non-trivial development/testing/maintenance cost, and all to support a feature that is hidden and unsupported and there only because the argument was made it was useful for testing. (2) is fine in general, but it's also very broad and requires concrete recommendations to make forward progress. If you have such feedback, ideally pointing to the exact code in question and highlighting what change you think should be made to the text, please make them. For example, you've highlighted "The response ended prematurely, with at least 9 additional bytes expected."; at this point in the code, all it knows is that it's waiting for another frame (which has a 9 byte header), and the connection was closed before another frame arrived. What would you like it to say instead? Passing down some kind of context from the call site several frames higher such that it could instead say "The response ended prematurely while waiting for another frame."? I don't think that addresses your concerns, but it's the kind of change that potentially could be made. Please open specific issues for the concrete recommendations. (3) encompasses both your suggestions to a) disable downgrading and b) using http + HTTP/2. If you believe these should be pursued, please open separate issues for each.
I'm of course open to others opinions, cc: @geoffkizer, @scalablecory.
If you want to be robust, you should do that, anyway. |
That's fair. I'm not pushing for large changes, but exploring what opportunities there are for quick wins.
a & b together should solve protocol downgrading issues. Mismatched TLS settings between client and server will still challenge devs, but it has caused less support issues. I'll investigate what feedback is in ASP.NET Core logging for those. |
Re the "9 additional bytes" message, I agree that this is not the most useful message and we should surface a better message here. I have a pending change to improve this that will at least make the message slightly more useful -- something like "the server unexpectedly closed the connection", along with the GOAWAY error code we received, if any. I'm not sure how much this actually improves things in general, but it's something. In particular I think it's useful to expose the error code from GOAWAY. But these sorts of errors mainly indicate a bug in the client or server protocol handling, as opposed to user error. As far as the h2c issue, I think we could do better here by differentiating between failures that occur before the connection is really established -- i.e. we never successfully receive the initial SETTINGS from the server -- and errors that occur later. The former should get surfaced with a better message, indicating that the connection was never really established, as opposed to being established and then failing later. Regarding "The request was aborted" from AcquireWriteLock, there was an issue here where the original exception causing the abort wasn't getting surfaced. This got fixed in dotnet/corefx#40785. |
Triage: Moving to Future |
There are really three separate issues here, I think: (1) How the HTTP1.1 logic deals with unexpected initial server responses We should look at all three of these. I'm specifically calling out "initial server responses" above because I think this is different than other bad server responses. Bad initial responses are generally an indication of mismatched configuration or expectations on either client or server. Unfortunately, currently we tend to treat a bad initial response the same as any other bad response/protocol error. I think the ask here is to be aware that unexpected initial responses should be treated differently, because they usually indicate a different problem. |
I want to call out one specific situation where a descriptive error message would be valuable: #39201 (comment) Edit: I've spun it out into its own issue - #40219 |
My experience with gRPC is that it is very common to run into trouble establishing a connection between the client and the server because of incorrect protocol/TLS.
The logging and error messages in HttpClient should do as much as possible to to help developers fix their own mistakes.
Common problem situations:
Previous issue: https://github.com/dotnet/corefx/issues/37431 (note: now considering making HTTP/2 without TLS a more first-class experience - https://github.com/dotnet/corefx/issues/41621)
The text was updated successfully, but these errors were encountered: