New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom content listener in HttpClient.executeAsync #707
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I am curious how much GC activity the current allocation code in BufferingResponseListener
causes. I was thinking like this young garbage should be handled fairly well with the G1 collector, and it shouldn't be a major concern for the most part. Did you or others observe that this doesn't hold?
If we really need to do this, then instead of this approach, how about we add a new default interface method to ResponseHandler
for buffer allocation? That would not disturb/change the existing interface as much. Something like:
default byte[] allocateResponseBuffer(int bufferSize) {
return new byte[bufferSize];
}
and then we can use this custom allocator in BufferingResponseListener
. IIRC, that's the main customization that this PR wants to make.
@@ -29,6 +29,8 @@ | |||
|
|||
<T, E extends Exception> HttpResponseFuture<T> executeAsync(Request request, ResponseHandler<T, E> responseHandler); | |||
|
|||
<T, E extends Exception> HttpResponseFuture<T> executeAsync(Request request, ResponseHandler<T, E> responseHandler, ResponseListener responseListener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level having both ResponseHandler
and ResponseListener
in HttpClient.executeAsync
is not that intuitive given that the response listener currently calls the response handler, and it's also confusing that one argument handles response as a response handler, the other one listens to response as a response listener -- kind of confusing for the user of the API.
{ | ||
void onContent(ByteBuffer content); | ||
|
||
InputStream onComplete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not usual to have an event handler method such as ResponseListener.onComplete
to return a reference, usually event handler methods take an argument and return void.
{ | ||
void onContent(ByteBuffer content); | ||
|
||
InputStream onComplete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed btw? The PR description talks about reducing the GC activity in BufferingResponseListener
, and this seems not related to that.
@nezihyigitbasi Nezih, thank you for the feedback. We are trying to accomplish 2 things here: (1) avoid extra allocations; (2) avoid unnecessary copying. (1) might be less important than (2). More specifically, currently we are allocating new byte arrays when receiving the response and copy that data when clients access it via InputStream interface. To avoid the copying, we need clients to have access to the list of byte arrays that hold the response. Adding Having (1) and (2) combined allows for 6% CPU gain for Queen of the Night query on a test cluster using ORC files compressed with GZIP and 12% CPU gain on a desktop. The query is:
|
This seems like the wrong approach. We should add a better async API that gets rid of InputStream, which is legacy from before we had async. InputStream is hard to use and makes no sense as we buffer the full response. Note that as general philosophy, the HTTP client interface is supposed to be simple and high level. If you want full control over all the low level details, you can use the Jetty client directly. |
If this change is just for a dev branch, I'd just experiment in a branch for now. |
@dain @electrum Dain, David, thanks for the feedback.
Currently, this is only needed for a dev branch, but we expect to move this to master in the next couple of months. However, even using it for a dev branch is challenging because Travis fails to download airlift artifact. Would it be possible to publish airlift snapshot artifact to allow Travis to succeed? I don't seem to have permissions.
That makes sense. Do you have an idea of what you'd like this API to look like? How about the following?
I feel like there is a lot of value in re-using Airlift HTTP client and the only piece we'd like to have greater control of is processing of the response. This seems to be possible with a more async-friendly API which doesn't buffer, but allows the application to process data as it become available. |
We prefer not to have snapshots in maven that don't reflect code that's in trunk. You could achieve that by publishing a snapshot to your internal maven repo with a custom version string, instead. |
I do this for local builds, but this doesn't work with Travis. Am I missing something? |
Ah, that's right. Travis can't see your private maven repository. Another alternative would be to publish public artifacts under |
That's an option, but we were hoping to avoid going that route as it involves quite a lot of changes. Given that we'd like to put these changes into master within the next couple of months, what would be the best path forward? David suggested to modify the async API to make it truly async. Should I make PR for that? Does the API I proposed above makes sense? |
@mbasmanova I published the PR as |
@dain Dain, thank you. It is helpful to have a version in Maven Central, but that's not quite exactly what I wanted. This snapshot didn't work as is. It turns out that I need to update other airlift components to avoid runtime errors due to mismatch versions of Guava. I'll work on that. In the meantime, could you suggest how to make progress on implementing the changes we need in a way that would be acceptable to merge into master? |
Extend HttpClient.executeAsync to allow callers to provide custom content listener. We'd like to use this to modify memory allocation currently happening in BufferingResponseListener.allocateCurrentBuffer to use shared pull of buffers to reduce GC activity.
CC: @oerling @nezihyigitbasi @tdcmeehan @elonazoulay @yingsu00
@electrum @dain David, Dain, do you think this would be a reasonable addition?