New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better visibility on transport errors #278
Conversation
__Motivation__ We want to surface the close reason from NettyConnection in the exception returned from the request. By exposing that, we also enable users to observe transport errors and drive connection state metrics using a HttpConnectionFilterFactory. We should also consider elaborating the message of the ClosedChannelExceptions produced by CloseHandler.CloseEvent __Modifications__ - expose transport level exception on `ConnectionContext` __Result__ Users have access to transport level `Exception` information to help diagnose failures and drive metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a couple questions.
@@ -89,7 +88,7 @@ public void onSubscribe(final Cancellable cancellable) { | |||
|
|||
@Override | |||
public void onSuccess(@Nullable final T result) { | |||
terminate(result); | |||
terminate(ThrowableWrapper.wrapIfThrowable(result)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there benefit to wrapping success (presumably the more common case) instead of wrapping errors?
I think in notifyListeners
, wrapping errors (and replacing terminalSignal instanceof Throwable
with terminalSignal instanceof ThrowableWrapper
) would reduce the number of instanceof
s in the success case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's indeed a great suggestion!
initCause(cause); | ||
} | ||
|
||
@Override | ||
public String getMessage() { | ||
return closeEventName; | ||
return String.format("%s(%s)", event.name(), event.description); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be simple string concatenation? (or StringBuilder
, but I understand that's what the compiler usually turns concatenation into anyway.) I believe format
is fairly slow. (at least until JEP-348 happens 😉)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/SingleProcessor.java
Outdated
Show resolved
Hide resolved
servicetalk-concurrent-api/src/main/java/io/servicetalk/concurrent/api/SingleProcessor.java
Outdated
Show resolved
Hide resolved
...y-internal/src/main/java/io/servicetalk/transport/netty/internal/DefaultNettyConnection.java
Outdated
Show resolved
Hide resolved
servicetalk-transport-api/src/main/java/io/servicetalk/transport/api/ConnectionContext.java
Outdated
Show resolved
Hide resolved
...ternal/src/main/java/io/servicetalk/transport/netty/internal/AbstractChannelReadHandler.java
Show resolved
Hide resolved
servicetalk-transport-api/src/main/java/io/servicetalk/transport/api/ConnectionContext.java
Outdated
Show resolved
Hide resolved
@@ -1,5 +1,5 @@ | |||
/* | |||
* Copyright © 2018 Apple Inc. and the ServiceTalk project authors | |||
* Copyright © 2019 Apple Inc. and the ServiceTalk project authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018-2019 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah Intellij is trying to be not smart enough here.
@@ -0,0 +1,167 @@ | |||
/* | |||
* Copyright © 2019 Apple Inc. and the ServiceTalk project authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018-2019
...ternal/src/main/java/io/servicetalk/transport/netty/internal/AbstractChannelReadHandler.java
Show resolved
Hide resolved
@@ -137,42 +138,52 @@ public static CloseHandler forPipelinedRequestResponse(boolean client) { | |||
/** | |||
* Outbound protocol close command observed eg. HTTP header: {@code Connection: close}. | |||
*/ | |||
PROTOCOL_CLOSING_OUTBOUND, | |||
PROTOCOL_CLOSING_OUTBOUND("Outbound protocol close command observed eg. HTTP header: Connection: close."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about:
The application protocol closed the write side of this connection. This maybe the result of sending an HTTP header such as Connection: close.
/** | ||
* User initiated close command, depends on the implementation but usually resembles outbound protocol close. | ||
*/ | ||
USER_CLOSING, | ||
USER_CLOSING("User initiated close command."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about:
The close* method was called in the local application.
/** | ||
* Outbound {@link SocketChannel} shutdown observed. | ||
*/ | ||
CHANNEL_CLOSED_OUTBOUND, | ||
CHANNEL_CLOSED_OUTBOUND("Outbound SocketChannel shutdown observed"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about:
The transport backing this connection has been shutdown
public ExecutionContext executionContext() { | ||
return context.executionContext(); | ||
public Single<Throwable> transportError() { | ||
return transportError.map(Function.identity()); // don't let SingleProcessor escape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary, or potentially a sign of a larger issue? my guess is this happens elsewhere we use CompletableProcessor
and SingleProcessor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't check but it's potentially a widespread issue.
When implementing this I was indeed wondering whether we had to prevent users from mucking with the internal control plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #285
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments then LGTM
assertConnectionClosed(); | ||
// Client inbound channel closed - should be same exception as above | ||
Throwable clientThrowable = ((NettyConnectionContext) streamingHttpConnection().connectionContext()) | ||
.transportError().toFuture().get(1000, MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests have a timeout (in AbstractNettyHttpServerTest
) so we can block indefinetly here.
The timeout rule provides other thread-stacks on timeout which this does not and is generally easier to disable when debugging locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I saw a 1s timeout used in other test-cases in the same file and didn't check why we weren't depending on the rule... I'll remove
public ExecutionContext executionContext() { | ||
return context.executionContext(); | ||
public Single<Throwable> transportError() { | ||
return transportError; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since subscribers of this Single
can be invoked from the eventloop, we have to offload this like we do for onClosing()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right let me fix!
@@ -84,6 +85,11 @@ public ExecutionContext executionContext() { | |||
return executionContext; | |||
} | |||
|
|||
@Override | |||
public Single<Throwable> transportError() { | |||
return Single.error(new UnsupportedOperationException("source to be provided by the associated connection")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't ideal as we pass the ConnectionContext
before creating a connection to the channel initializers which means that none of the entities inside the ChannelInitializer
s can use this method. A way to fix this would be to have the creator of DefaultNettyConnectionContext
instantiate SingleProcessor
and pass it both to DefaultNettyConnectionContext
and DefaultNettyConnection
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed not ideal, but was hesitant to change ordering of this with the upcoming channel init refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok if it will be automatically be "fixed" after #288 then it isn't required to be done here. Otherwise, just add a note to the other PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out DefaultNettyConnectionContext
is gone in #288 so this entire problem will be gone, SingleProcessor
can remain entirely inside DefaultNettyConnection
.
@@ -37,4 +38,11 @@ | |||
* connection to a default value. | |||
*/ | |||
Cancellable updateFlushStrategy(UnaryOperator<FlushStrategy> strategyProvider); | |||
|
|||
/** | |||
* Returns a {@link Single}<{@link Throwable}> that may complete when an error is observed at the transport. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we generally use the language terminate with an error, if an error is observed..
as complete
is overloaded for successful completion.
@NiteshKant PTAL re:offloading changes, I think it's ready to merge. |
@@ -13,11 +13,11 @@ | |||
* See the License for the specific language governing permissions and | |||
* limitations under the License. | |||
*/ | |||
package io.servicetalk.concurrent.api.internal; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jayv - one thing I forgot to ask was that does SingleProcessor
need to be part of the api
? We have a few internal packages now where I imagine this class could live.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasoning: CompletableProcessor
was already public
so we decided to make this one public
too. I think it's a useful API to have, what is your concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we had the same internal
package structure when CompletableProcessor
was first introduced/moved into the current package. I agree this class is useful, and we use it throughout ServiceTalk for our internal implementations. We should be restricting what goes in the api
package as what is necessary to use the api
, and I don't think SingleProcessor
or CompletableProcessor
are necessary for our users to interact with concurrent.api
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use CompletableProcessor
in concurrent-api hence we can't move it to concurrent-api-internal. However, SingleProcessor
isn't used in concurrent-api so can be moved to concurrent-api-internal if we are OK with one residing in concurrent-api and other doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Motivation
We want to surface the close reason from
NettyConnection
in the exception returned from the request. By exposing that, we also enable users to observe transport errors and drive connection state metrics using aHttpConnectionFilterFactory
. We should also consider elaborating the message of theClosedChannelException
produced byCloseHandler.CloseEvent
Modifications
ConnectionContext
SingleProcessor
frominternal
toapi
Result
Users have access to transport level
Exception
information to help diagnose failures and drive metrics