Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better visibility on transport errors #278

Merged
merged 6 commits into from Feb 8, 2019
Merged

Conversation

jayv
Copy link
Contributor

@jayv jayv commented Jan 29, 2019

Motivation

We want to surface the close reason from NettyConnection in the exception returned from the request. By exposing that, we also enable users to observe transport errors and drive connection state metrics using a HttpConnectionFilterFactory. We should also consider elaborating the message of the ClosedChannelException produced by CloseHandler.CloseEvent

Modifications

  • expose transport level exception on ConnectionContext
  • move SingleProcessor from internal to api

Result

Users have access to transport level Exception information to help diagnose failures and drive metrics

__Motivation__

We want to surface the close reason from NettyConnection in the exception returned from the request. By exposing that, we also enable users to observe transport errors and drive connection state metrics using a HttpConnectionFilterFactory. We should also consider elaborating the message of the ClosedChannelExceptions produced by CloseHandler.CloseEvent

__Modifications__

- expose transport level exception on `ConnectionContext`

__Result__

Users have access to transport level `Exception` information to help diagnose failures and drive metrics
Copy link
Contributor

@lewisd32 lewisd32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple questions.

@@ -89,7 +88,7 @@ public void onSubscribe(final Cancellable cancellable) {

@Override
public void onSuccess(@Nullable final T result) {
terminate(result);
terminate(ThrowableWrapper.wrapIfThrowable(result));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there benefit to wrapping success (presumably the more common case) instead of wrapping errors?
I think in notifyListeners, wrapping errors (and replacing terminalSignal instanceof Throwable with terminalSignal instanceof ThrowableWrapper) would reduce the number of instanceofs in the success case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's indeed a great suggestion!

initCause(cause);
}

@Override
public String getMessage() {
return closeEventName;
return String.format("%s(%s)", event.name(), event.description);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simple string concatenation? (or StringBuilder, but I understand that's what the compiler usually turns concatenation into anyway.) I believe format is fairly slow. (at least until JEP-348 happens 😉)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

@@ -1,5 +1,5 @@
/*
* Copyright © 2018 Apple Inc. and the ServiceTalk project authors
* Copyright © 2019 Apple Inc. and the ServiceTalk project authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2018-2019 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah Intellij is trying to be not smart enough here.

@@ -0,0 +1,167 @@
/*
* Copyright © 2019 Apple Inc. and the ServiceTalk project authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2018-2019

@@ -137,42 +138,52 @@ public static CloseHandler forPipelinedRequestResponse(boolean client) {
/**
* Outbound protocol close command observed eg. HTTP header: {@code Connection: close}.
*/
PROTOCOL_CLOSING_OUTBOUND,
PROTOCOL_CLOSING_OUTBOUND("Outbound protocol close command observed eg. HTTP header: Connection: close."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about:

The application protocol closed the write side of this connection. This maybe the result of sending an HTTP header such as Connection: close.

/**
* User initiated close command, depends on the implementation but usually resembles outbound protocol close.
*/
USER_CLOSING,
USER_CLOSING("User initiated close command."),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about:

The close* method was called in the local application.

/**
* Outbound {@link SocketChannel} shutdown observed.
*/
CHANNEL_CLOSED_OUTBOUND,
CHANNEL_CLOSED_OUTBOUND("Outbound SocketChannel shutdown observed"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about:

The transport backing this connection has been shutdown

public ExecutionContext executionContext() {
return context.executionContext();
public Single<Throwable> transportError() {
return transportError.map(Function.identity()); // don't let SingleProcessor escape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary, or potentially a sign of a larger issue? my guess is this happens elsewhere we use CompletableProcessor and SingleProcessor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check but it's potentially a widespread issue.
When implementing this I was indeed wondering whether we had to prevent users from mucking with the internal control plane.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #285

Copy link
Contributor

@lewisd32 lewisd32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

@NiteshKant NiteshKant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments then LGTM

assertConnectionClosed();
// Client inbound channel closed - should be same exception as above
Throwable clientThrowable = ((NettyConnectionContext) streamingHttpConnection().connectionContext())
.transportError().toFuture().get(1000, MILLISECONDS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests have a timeout (in AbstractNettyHttpServerTest) so we can block indefinetly here.
The timeout rule provides other thread-stacks on timeout which this does not and is generally easier to disable when debugging locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I saw a 1s timeout used in other test-cases in the same file and didn't check why we weren't depending on the rule... I'll remove

public ExecutionContext executionContext() {
return context.executionContext();
public Single<Throwable> transportError() {
return transportError;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since subscribers of this Single can be invoked from the eventloop, we have to offload this like we do for onClosing().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right let me fix!

@@ -84,6 +85,11 @@ public ExecutionContext executionContext() {
return executionContext;
}

@Override
public Single<Throwable> transportError() {
return Single.error(new UnsupportedOperationException("source to be provided by the associated connection"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't ideal as we pass the ConnectionContext before creating a connection to the channel initializers which means that none of the entities inside the ChannelInitializers can use this method. A way to fix this would be to have the creator of DefaultNettyConnectionContext instantiate SingleProcessor and pass it both to DefaultNettyConnectionContext and DefaultNettyConnection .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed not ideal, but was hesitant to change ordering of this with the upcoming channel init refactoring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok if it will be automatically be "fixed" after #288 then it isn't required to be done here. Otherwise, just add a note to the other PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out DefaultNettyConnectionContext is gone in #288 so this entire problem will be gone, SingleProcessor can remain entirely inside DefaultNettyConnection.

@@ -37,4 +38,11 @@
* connection to a default value.
*/
Cancellable updateFlushStrategy(UnaryOperator<FlushStrategy> strategyProvider);

/**
* Returns a {@link Single}&lt;{@link Throwable}&gt; that may complete when an error is observed at the transport.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we generally use the language terminate with an error, if an error is observed.. as complete is overloaded for successful completion.

@jayv
Copy link
Contributor Author

jayv commented Feb 7, 2019

@NiteshKant PTAL re:offloading changes, I think it's ready to merge.

@jayv jayv requested a review from NiteshKant February 7, 2019 03:42
@jayv jayv merged commit 8fa25f9 into apple:master Feb 8, 2019
@jayv jayv deleted the transporterr branch February 8, 2019 01:06
@@ -13,11 +13,11 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package io.servicetalk.concurrent.api.internal;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayv - one thing I forgot to ask was that does SingleProcessor need to be part of the api? We have a few internal packages now where I imagine this class could live.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reasoning: CompletableProcessor was already public so we decided to make this one public too. I think it's a useful API to have, what is your concern?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we had the same internal package structure when CompletableProcessor was first introduced/moved into the current package. I agree this class is useful, and we use it throughout ServiceTalk for our internal implementations. We should be restricting what goes in the api package as what is necessary to use the api, and I don't think SingleProcessor or CompletableProcessor are necessary for our users to interact with concurrent.api.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use CompletableProcessor in concurrent-api hence we can't move it to concurrent-api-internal. However, SingleProcessor isn't used in concurrent-api so can be moved to concurrent-api-internal if we are OK with one residing in concurrent-api and other doesn't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants