Allow custom content listener in HttpClient.executeAsync #707

mbasmanova · 2019-01-30T02:52:03Z

Extend HttpClient.executeAsync to allow callers to provide custom content listener. We'd like to use this to modify memory allocation currently happening in BufferingResponseListener.allocateCurrentBuffer to use shared pull of buffers to reduce GC activity.

CC: @oerling @nezihyigitbasi @tdcmeehan @elonazoulay @yingsu00

@electrum @dain David, Dain, do you think this would be a reasonable addition?

nezihyigitbasi

First, I am curious how much GC activity the current allocation code in BufferingResponseListener causes. I was thinking like this young garbage should be handled fairly well with the G1 collector, and it shouldn't be a major concern for the most part. Did you or others observe that this doesn't hold?

If we really need to do this, then instead of this approach, how about we add a new default interface method to ResponseHandler for buffer allocation? That would not disturb/change the existing interface as much. Something like:

default byte[] allocateResponseBuffer(int bufferSize) {
    return new byte[bufferSize];
}

and then we can use this custom allocator in BufferingResponseListener. IIRC, that's the main customization that this PR wants to make.

nezihyigitbasi · 2019-01-31T20:46:02Z

http-client/src/main/java/io/airlift/http/client/HttpClient.java

@@ -29,6 +29,8 @@

    <T, E extends Exception> HttpResponseFuture<T> executeAsync(Request request, ResponseHandler<T, E> responseHandler);

+    <T, E extends Exception> HttpResponseFuture<T> executeAsync(Request request, ResponseHandler<T, E> responseHandler, ResponseListener responseListener);


At a high level having both ResponseHandler and ResponseListener in HttpClient.executeAsync is not that intuitive given that the response listener currently calls the response handler, and it's also confusing that one argument handles response as a response handler, the other one listens to response as a response listener -- kind of confusing for the user of the API.

nezihyigitbasi · 2019-01-31T20:46:47Z

http-client/src/main/java/io/airlift/http/client/ResponseListener.java

+{
+    void onContent(ByteBuffer content);
+
+    InputStream onComplete();


It's not usual to have an event handler method such as ResponseListener.onComplete to return a reference, usually event handler methods take an argument and return void.

nezihyigitbasi · 2019-01-31T20:58:07Z

http-client/src/main/java/io/airlift/http/client/ResponseListener.java

+{
+    void onContent(ByteBuffer content);
+
+    InputStream onComplete();


Why is this change needed btw? The PR description talks about reducing the GC activity in BufferingResponseListener, and this seems not related to that.

mbasmanova · 2019-02-01T19:46:54Z

@nezihyigitbasi Nezih, thank you for the feedback. We are trying to accomplish 2 things here: (1) avoid extra allocations; (2) avoid unnecessary copying. (1) might be less important than (2). More specifically, currently we are allocating new byte arrays when receiving the response and copy that data when clients access it via InputStream interface. To avoid the copying, we need clients to have access to the list of byte arrays that hold the response. Adding allocateResponseBuffer method to ResponseHandler solves (1), but not (2).

Having (1) and (2) combined allows for 6% CPU gain for Queen of the Night query on a test cluster using ORC files compressed with GZIP and 12% CPU gain on a desktop.

The query is:

select count(*), sum(l.extendedprice * (1 - l.discount) - l.quantity * p.supplycost) 
from lineitem_sf1000_unpartitioned l, partsupp_sf1000_unpartitioned p 
where l.partkey = p.partkey 
   and l.suppkey = p.suppkey
   and p.availqty < 1000

electrum · 2019-02-02T01:08:08Z

This seems like the wrong approach. We should add a better async API that gets rid of InputStream, which is legacy from before we had async. InputStream is hard to use and makes no sense as we buffer the full response.

Note that as general philosophy, the HTTP client interface is supposed to be simple and high level. If you want full control over all the low level details, you can use the Jetty client directly.

dain · 2019-02-02T19:31:26Z

If this change is just for a dev branch, I'd just experiment in a branch for now.

mbasmanova · 2019-02-04T16:06:52Z

@dain @electrum Dain, David, thanks for the feedback.

If this change is just for a dev branch, I'd just experiment in a branch for now.

Currently, this is only needed for a dev branch, but we expect to move this to master in the next couple of months. However, even using it for a dev branch is challenging because Travis fails to download airlift artifact. Would it be possible to publish airlift snapshot artifact to allow Travis to succeed? I don't seem to have permissions.

We should add a better async API that gets rid of InputStream, which is legacy from before we had async. InputStream is hard to use and makes no sense as we buffer the full response.

That makes sense. Do you have an idea of what you'd like this API to look like? How about the following?

public interface ResponseListener
{
    void onHeaders(ListMultimap<HeaderName, String> headers);
    void onResponseChunk(ByteBuffer chunk);
    void onCompletion(int statusCode, String statusMessage);
}

HttpResponseFuture<Void> executeAsync(Request request, ResponseListener responseListener);

Note that as general philosophy, the HTTP client interface is supposed to be simple and high level. If you want full control over all the low level details, you can use the Jetty client directly.

I feel like there is a lot of value in re-using Airlift HTTP client and the only piece we'd like to have greater control of is processing of the response. This seems to be possible with a more async-friendly API which doesn't buffer, but allows the application to process data as it become available.

martint · 2019-02-04T16:50:49Z

However, even using it for a dev branch is challenging because Travis fails to download airlift artifact. Would it be possible to publish airlift snapshot artifact to allow Travis to succeed?

We prefer not to have snapshots in maven that don't reflect code that's in trunk. You could achieve that by publishing a snapshot to your internal maven repo with a custom version string, instead.

mbasmanova · 2019-02-04T16:52:53Z

@martint

You could achieve that by publishing a snapshot to your internal maven repo with a custom version string, instead.

I do this for local builds, but this doesn't work with Travis. Am I missing something?

martint · 2019-02-04T17:55:48Z

Ah, that's right. Travis can't see your private maven repository. Another alternative would be to publish public artifacts under com.facebook.

mbasmanova · 2019-02-04T18:01:42Z

Another alternative would be to publish public artifacts under com.facebook.

That's an option, but we were hoping to avoid going that route as it involves quite a lot of changes. Given that we'd like to put these changes into master within the next couple of months, what would be the best path forward? David suggested to modify the async API to make it truly async. Should I make PR for that? Does the API I proposed above makes sense?

mbasmanova · 2019-02-08T18:17:17Z

@dain @electrum @martint ping

dain · 2019-02-08T20:47:52Z

@mbasmanova I published the PR as 0.179-masha-SNAPSHOT for you.

mbasmanova · 2019-02-26T21:53:11Z

@dain Dain, thank you. It is helpful to have a version in Maven Central, but that's not quite exactly what I wanted. This snapshot didn't work as is. It turns out that I need to update other airlift components to avoid runtime errors due to mismatch versions of Guava. I'll work on that.

In the meantime, could you suggest how to make progress on implementing the changes we need in a way that would be acceptable to merge into master?

Allow custom content listener in HttpClient.executeAsync

9c81e36

mbasmanova mentioned this pull request Jan 30, 2019

Use the new ResponseListener from airlift prestodb/presto#12278

Closed

nezihyigitbasi reviewed Jan 31, 2019

View reviewed changes

mbasmanova mentioned this pull request Mar 5, 2019

Use airlift artifact from Maven Central prestodb/presto#12430

Merged

dain closed this Nov 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow custom content listener in HttpClient.executeAsync #707

Allow custom content listener in HttpClient.executeAsync #707

mbasmanova commented Jan 30, 2019

nezihyigitbasi left a comment

nezihyigitbasi Jan 31, 2019

nezihyigitbasi Jan 31, 2019

nezihyigitbasi Jan 31, 2019

mbasmanova commented Feb 1, 2019

electrum commented Feb 2, 2019

dain commented Feb 2, 2019

mbasmanova commented Feb 4, 2019 •

edited

martint commented Feb 4, 2019

mbasmanova commented Feb 4, 2019

martint commented Feb 4, 2019

mbasmanova commented Feb 4, 2019

mbasmanova commented Feb 8, 2019

dain commented Feb 8, 2019

mbasmanova commented Feb 26, 2019

		@@ -29,6 +29,8 @@

		<T, E extends Exception> HttpResponseFuture<T> executeAsync(Request request, ResponseHandler<T, E> responseHandler);

		<T, E extends Exception> HttpResponseFuture<T> executeAsync(Request request, ResponseHandler<T, E> responseHandler, ResponseListener responseListener);

Allow custom content listener in HttpClient.executeAsync #707

Allow custom content listener in HttpClient.executeAsync #707

Conversation

mbasmanova commented Jan 30, 2019

nezihyigitbasi left a comment

Choose a reason for hiding this comment

nezihyigitbasi Jan 31, 2019

Choose a reason for hiding this comment

nezihyigitbasi Jan 31, 2019

Choose a reason for hiding this comment

nezihyigitbasi Jan 31, 2019

Choose a reason for hiding this comment

mbasmanova commented Feb 1, 2019

electrum commented Feb 2, 2019

dain commented Feb 2, 2019

mbasmanova commented Feb 4, 2019 • edited

martint commented Feb 4, 2019

mbasmanova commented Feb 4, 2019

martint commented Feb 4, 2019

mbasmanova commented Feb 4, 2019

mbasmanova commented Feb 8, 2019

dain commented Feb 8, 2019

mbasmanova commented Feb 26, 2019

mbasmanova commented Feb 4, 2019 •

edited