Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken pipe check not portable (i18n) #3348

Closed
rmannibucau opened this issue Jul 19, 2021 · 17 comments
Closed

Broken pipe check not portable (i18n) #3348

rmannibucau opened this issue Jul 19, 2021 · 17 comments

Comments

@rmannibucau
Copy link
Contributor

Hi,

Seems broken pipe error is tested without taking into account i18n so it is missed on most builds not using en:

(french example)
java.net.SocketException: Relais brisé (pipe)

Not sure the best fix (forcing the locale is likely not an option, whitelisting most of translation is maybe not)

@meltsufin
Copy link
Contributor

@rmannibucau Thanks for bringing it to our attention. Maybe there is a way to force en locale just for the purposes of running the tests. What would be the benefit of testing under various locales?

@rmannibucau
Copy link
Contributor Author

@meltsufin Locale.setDefault done very early (way too early for maven or gradle) will set it globally but it will also impact other parts of the application (if jib is embed it breaks rest of the app, same for maven). So forcing the locale can be doable with some JVM internals abuse but I wouldn't go that way, it would stay too fragile and will not solve the issue and let JIB not fix the issue this test was introduced for. I guess a more valid solution would be to dig why there is a broken pipe (Expect: 100-continue? which is testable without checking the exception) and only if needed, extend the http client to throw a more specific exception on such a case.

@meltsufin
Copy link
Contributor

If anyone can come up with a check that is not message-dependent, please make a contribution. Otherwise, I think we'll leave it as a known limitation.

@rmannibucau
Copy link
Contributor Author

The opposite can be worth checking, ie not handling it at all.
It is only about the error message and it is wrong regularly (whereas exception/cause does not lie) so maybe just drop the tentative to guess what happent?
The wrapping exception can point all potential cause instead of picking a random one.
Sounds like a good unification and solution to me.

@meltsufin
Copy link
Contributor

@chanseokoh WDYT?

@chanseokoh
Copy link
Member

chanseokoh commented Jul 19, 2021

I think the status quo is fine. It doesn't do any handling/processing/unrolling on the exception; it doesn't change the behavior of Jib at all, except that it only puts a couple more console output lines for the US locale. And actually this has been very useful, because the only message you get from the broken pipe SocketException is just the two words "broken pipe" and nothing else most of the time. Moreover, I believe the exception is almost all the time a terminal one without a wrapped exception.

@rmannibucau
Copy link
Contributor Author

@chanseokoh well fact is US locale is not the most common one even if common and when it fails with a broken pipe you can end up having a misleading error message (don't have it handy but think it was "check your credentials") which is worse.
side note: when i mentionned a wrapping exception it was about enriching the exception handling with a jib wrapping exception with more details about potential causes.
what is helping is to show the potential causes to let people check them, this is why I proposed to show them all (it is 2-3) in the error message and not try to be over clever until there is a way (which means enriching the http client to catch the connection error lower in the code layers).

@chanseokoh
Copy link
Member

a misleading error message (don't have it handy but think it was "check your credentials")

What we log (only in the case of the US locale, which is not the most common one, of course) is

broken pipe: the server shut down the connection. Check the server log if possible. This could also be a proxy issue. For example, a proxy may prevent sending packets that are too large.

which we believe is not misleading.

until there is a way (which means enriching the http client to catch the connection error lower in the code layers).

I don't think there is a way to reliably determine in Java at any layer (lower or higher) whether a socket exception is due to a broken pipe, until future Java versions introduce a dedicated exception class for that (which I think will not happen). The best effort is to check an exception message string, which can never be reliable, since future Java versions may change the string and messages are different on different locales.

@rmannibucau
Copy link
Contributor Author

I don't think there is a way to reliably determine in Java at any layer (lower or higher) whether a socket exception is due to a broken pipe, until future Java versions introduce a dedicated exception class for that (which I think will not happen).

Since you are not using java http client but apache http client which is at socket level I think it is quite doable to track the socket to know if the client or server cut the connection.
Overall if not reliable I wouldn't add the feature at all to have a reliable behavior whatever locale is used.

which we believe is not misleading.

it is because it happens when SSL is wrongly setup and or the server rejects it due to some overload pike (openshift there).
An interesting comment from the JDK can help to refine these cases if desired (./test/jdk/java/net/httpclient/CancelledResponse.java):

             // if SSL then we might get a "Broken Pipe", or a
             // RuntimeException wrapping an InvalidAlgorithmParameterException
             // (probably if the channel is closed during the handshake),
             // otherwise we get a "Socket closed".

So the isBrokenPipe method can need some refinement of the stack to be more accurate probably.

@chanseokoh
Copy link
Member

chanseokoh commented Jul 20, 2021

Thanks for your input, but I think we want to keep the current logging. It's just that the user additionally sees a bit more longer message "broken pipe: <more explanations>" than the two words "broken pipe" if in the US locale. Other than showing a bit more message, Jib doesn't do anything more than that; whether on US or not, Jib works exactly the same way, since there's no way to reliably determine if it's broken pipe or not.

And I still think the message is not misleading. "Broken pipe" means the server abruptly and unexpectedly "broke" the pipe for whatever reason while it shouldn't, whether intentionally or due to system overload, and the client (Jib) is only left uninformed and speculating. There's nothing much you can do from the client side. In the SSL case too, the gist is that, in some case where the handshake cannot be fulfilled, there is a chance that the client may see various kinds of errors including "broken pipe" depending on the situation and how the server reacts. But from the client side, it is still a broken pipe leading to the ultimate symptom of SSL handshake failure. Something went wrong unexpectedly on the server side (or in the middle) and no fault on the client side. (BTW, in this case, Jib will ultimately propagate SSLException or a subclass of it (which will wrap the lowest SocketException), whether US or not.) The nature of the error tells us that something to be fixed on the server side, which is what the current logging is trying to say.

I think it is quite doable to track the socket to know if the client or server cut the connection.

Not sure I follow "track the socket." In the broken pipe case, the client is abruptly left hanging without any preceding sign. The kernel just reports that the other end of a pipe is broken when a client attempts to write bytes to the pipe.

@rmannibucau
Copy link
Contributor Author

@chanseokoh well, don't want to fight a lot for that since it just misguide the user to the error cause (no proxy error or anything like that in the case I mentionned to be concrete and the different view between users just because they use a different locale does not help to solve the actual issue).
Yes the error is reported but the cause can be known adding more context in the http client, either SSL stage or before/after leading to a finer cause identification (I assume only 3-4 actual causes can be).
So options I see are:

  1. replace the proxy message by a longer one listing SSL, proxy, server issue (load or not) for example
  2. try to detect these cases (the SSL one can be identified in general but can require some HTTP client "instrumention")
  3. (not satisfying but at least enable to work between people more easily) at least document how to force a locale to let people reproduce issues and potentially enable to disable these US only handling (false for maven plugin for example)

@chanseokoh
Copy link
Member

chanseokoh commented Jul 20, 2021

I like the option 1, which is what we intended with the current logging (i.e., providing more context and explanation of what a "broken pipe" can mean for the user). Contributions are welcome. I've added the "good first issue" lable.

how to force a locale to let people reproduce issues and potentially enable to disable these US only handling

Hmm... I don't really get why one needs to disable it? Jib currently doesn't "handle" it, so whether US or not, the same issue and symptom will be reproducible. As I said, the current logic just adds a couple more console output lines to the existing error lines without interfering Jib's usual execution flow in any possible way.

@rmannibucau
Copy link
Contributor Author

Hmm... I don't really get why one needs to disable it? Jib currently doesn't "handle" it, so whether US or not, the same issue and symptom will be reproducible.

Take the case of a distributed team (with N>1 locales).
You get a slack message "hey buddy, I get this error xxxxx", "ok let me try[time]hmm, I get this error here yyyy" and indeed xxx and yyy are not only different due to the locale so it requires more work for nothing to identify it is the same case.
This is why I think that not detecting broken pipes is fine and helping.

@chanseokoh
Copy link
Member

I get what you're saying and also think you have a valid point. Anyways, I think the current message is reasonable and also has a value. The message starts with "broken pipe: ...", which is the most important part and doesn't change between different locales. In the past, we've got many issues opened with people reporting that Jib fails with a broken pipe; obviously many people don't know what it is and thought Jib should fix it.

@rmannibucau
Copy link
Contributor Author

@chanseokoh fully agree on that point. Maybe we could just stack the error messages of the stack (without the stack traces)? The most complicated part of this task is to reproduce several cases (the one I got is quite random on openshift and I dont own the cluster so cant play much with it :(). So maybe let track this issue and link it when we get connection errors (or related) to refine the message when we get enough "examples"?

@chanseokoh
Copy link
Member

chanseokoh commented Jul 20, 2021

Sounds good. Let's leave this issue open for tracking.

Maybe we should try to go simple with just enhancing the current message (which is only for US unfortunately). Whenever there's a failure with an exception, ultimately that is the exception that Jib (and hence Maven and Gradle) throws in the end. Maven and Gradle suggest to check the full stack trace (-X / --stacktrace), which most Java devs do. In the case of broken pipe SocketException, that SocketException will be the lowest one, and any layers on top of it (e.g., SSL) will wrap a lower level exception with an additional context (basically like, "SSLException: SSL handsake failure because of: SocketException: broken pipe"). So, for any exception, you can always have the full stack trace (which is not being altered in any way), the whole chain of events, and all the code locations where each of these chained exceptions is instantiated, so I feel there's no need to re-invent the wheel.

@elefeint
Copy link
Contributor

Closing the issue as part of cleanup -- the existence of this discussion is the documentation in itself. If anyone else encounters a similar issue, we can add an entry to the FAQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants