Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom content listener in HttpClient.executeAsync #707

Closed
wants to merge 1 commit into from

Conversation

mbasmanova
Copy link

Extend HttpClient.executeAsync to allow callers to provide custom content listener. We'd like to use this to modify memory allocation currently happening in BufferingResponseListener.allocateCurrentBuffer to use shared pull of buffers to reduce GC activity.

CC: @oerling @nezihyigitbasi @tdcmeehan @elonazoulay @yingsu00

@electrum @dain David, Dain, do you think this would be a reasonable addition?

Copy link
Contributor

@nezihyigitbasi nezihyigitbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I am curious how much GC activity the current allocation code in BufferingResponseListener causes. I was thinking like this young garbage should be handled fairly well with the G1 collector, and it shouldn't be a major concern for the most part. Did you or others observe that this doesn't hold?

If we really need to do this, then instead of this approach, how about we add a new default interface method to ResponseHandler for buffer allocation? That would not disturb/change the existing interface as much. Something like:

default byte[] allocateResponseBuffer(int bufferSize) {
    return new byte[bufferSize];
}

and then we can use this custom allocator in BufferingResponseListener. IIRC, that's the main customization that this PR wants to make.

@@ -29,6 +29,8 @@

<T, E extends Exception> HttpResponseFuture<T> executeAsync(Request request, ResponseHandler<T, E> responseHandler);

<T, E extends Exception> HttpResponseFuture<T> executeAsync(Request request, ResponseHandler<T, E> responseHandler, ResponseListener responseListener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level having both ResponseHandler and ResponseListener in HttpClient.executeAsync is not that intuitive given that the response listener currently calls the response handler, and it's also confusing that one argument handles response as a response handler, the other one listens to response as a response listener -- kind of confusing for the user of the API.

{
void onContent(ByteBuffer content);

InputStream onComplete();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not usual to have an event handler method such as ResponseListener.onComplete to return a reference, usually event handler methods take an argument and return void.

{
void onContent(ByteBuffer content);

InputStream onComplete();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed btw? The PR description talks about reducing the GC activity in BufferingResponseListener, and this seems not related to that.

@mbasmanova
Copy link
Author

@nezihyigitbasi Nezih, thank you for the feedback. We are trying to accomplish 2 things here: (1) avoid extra allocations; (2) avoid unnecessary copying. (1) might be less important than (2). More specifically, currently we are allocating new byte arrays when receiving the response and copy that data when clients access it via InputStream interface. To avoid the copying, we need clients to have access to the list of byte arrays that hold the response. Adding allocateResponseBuffer method to ResponseHandler solves (1), but not (2).

Having (1) and (2) combined allows for 6% CPU gain for Queen of the Night query on a test cluster using ORC files compressed with GZIP and 12% CPU gain on a desktop.

The query is:

select count(*), sum(l.extendedprice * (1 - l.discount) - l.quantity * p.supplycost) 
from lineitem_sf1000_unpartitioned l, partsupp_sf1000_unpartitioned p 
where l.partkey = p.partkey 
   and l.suppkey = p.suppkey
   and p.availqty < 1000

@electrum
Copy link
Member

electrum commented Feb 2, 2019

This seems like the wrong approach. We should add a better async API that gets rid of InputStream, which is legacy from before we had async. InputStream is hard to use and makes no sense as we buffer the full response.

Note that as general philosophy, the HTTP client interface is supposed to be simple and high level. If you want full control over all the low level details, you can use the Jetty client directly.

@dain
Copy link
Member

dain commented Feb 2, 2019

If this change is just for a dev branch, I'd just experiment in a branch for now.

@mbasmanova
Copy link
Author

mbasmanova commented Feb 4, 2019

@dain @electrum Dain, David, thanks for the feedback.

If this change is just for a dev branch, I'd just experiment in a branch for now.

Currently, this is only needed for a dev branch, but we expect to move this to master in the next couple of months. However, even using it for a dev branch is challenging because Travis fails to download airlift artifact. Would it be possible to publish airlift snapshot artifact to allow Travis to succeed? I don't seem to have permissions.

We should add a better async API that gets rid of InputStream, which is legacy from before we had async. InputStream is hard to use and makes no sense as we buffer the full response.

That makes sense. Do you have an idea of what you'd like this API to look like? How about the following?

public interface ResponseListener
{
    void onHeaders(ListMultimap<HeaderName, String> headers);
    void onResponseChunk(ByteBuffer chunk);
    void onCompletion(int statusCode, String statusMessage);
}

HttpResponseFuture<Void> executeAsync(Request request, ResponseListener responseListener);

Note that as general philosophy, the HTTP client interface is supposed to be simple and high level. If you want full control over all the low level details, you can use the Jetty client directly.

I feel like there is a lot of value in re-using Airlift HTTP client and the only piece we'd like to have greater control of is processing of the response. This seems to be possible with a more async-friendly API which doesn't buffer, but allows the application to process data as it become available.

@martint
Copy link
Member

martint commented Feb 4, 2019

However, even using it for a dev branch is challenging because Travis fails to download airlift artifact. Would it be possible to publish airlift snapshot artifact to allow Travis to succeed?

We prefer not to have snapshots in maven that don't reflect code that's in trunk. You could achieve that by publishing a snapshot to your internal maven repo with a custom version string, instead.

@mbasmanova
Copy link
Author

@martint

You could achieve that by publishing a snapshot to your internal maven repo with a custom version string, instead.

I do this for local builds, but this doesn't work with Travis. Am I missing something?

@martint
Copy link
Member

martint commented Feb 4, 2019

Ah, that's right. Travis can't see your private maven repository. Another alternative would be to publish public artifacts under com.facebook.

@mbasmanova
Copy link
Author

Another alternative would be to publish public artifacts under com.facebook.

That's an option, but we were hoping to avoid going that route as it involves quite a lot of changes. Given that we'd like to put these changes into master within the next couple of months, what would be the best path forward? David suggested to modify the async API to make it truly async. Should I make PR for that? Does the API I proposed above makes sense?

@mbasmanova
Copy link
Author

@dain @electrum @martint ping

@dain
Copy link
Member

dain commented Feb 8, 2019

@mbasmanova I published the PR as 0.179-masha-SNAPSHOT for you.

@mbasmanova
Copy link
Author

@dain Dain, thank you. It is helpful to have a version in Maven Central, but that's not quite exactly what I wanted. This snapshot didn't work as is. It turns out that I need to update other airlift components to avoid runtime errors due to mismatch versions of Guava. I'll work on that.

In the meantime, could you suggest how to make progress on implementing the changes we need in a way that would be acceptable to merge into master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants