Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Druid HttpClient from Netty 3 to 4 #12032

Closed
wants to merge 84 commits into from

Conversation

xvrl
Copy link
Member

@xvrl xvrl commented Dec 7, 2021

  • HttpResponse no longer includes request content, this changes some
    HttpReponseHandler implementations to no longer assume content is
    generated as part of the handleResponse call, and instead expect all
    request content as part of the handleChunk

  • HttpResponseHandler changes uncovered a long-standing bug in
    AppendableByteArrayInputStream where calling exceptionCaught before
    appending any bytes would not throw the exception to the reader

  • Netty HTTP decoder now stores decoder errors in the message instead of
    triggering exceptionCaught, so we need to check the status of the
    decoder.

  • Netty Messages and underlying buffers are reference counted, so they
    need to be explicitly released after handling. As part of this
    ChannelBuffer became ByteBuf and are reference counted instead of
    garbage collected.

  • To support back-pressure, Channels need to disable AUTO_READ and call
    read() explicitly when needed instead of using setReadable(true/false)

  • ClientBootstrap no longer uses separate bosspool and workerpool sizes,
    instead relying on the workerpool.

  • Netty HTTP codec now supports resuming handling of HTTP messages after
    handling CONNECT, removing the need to manually add the codec back to
    the pipeline.

  • Remove deprecated Netty method calls and rename classes to align with
    Netty class name changes

  • Remove deprecated use of ExpectedException in favor of
    Assert.assertThrows for all tests that had to be touched.

@xvrl xvrl requested a review from gianm December 7, 2021 06:47
@xvrl xvrl force-pushed the remove-netty-3 branch 2 times, most recently from 490c15c to 5c704e6 Compare December 7, 2021 07:17
@lgtm-com
Copy link

lgtm-com bot commented Dec 7, 2021

This pull request introduces 1 alert when merging 5c704e6e962e72691a3a7791439591fcff33e10d into 65cadbe - view on LGTM.com

new alerts:

  • 1 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 7, 2021

This pull request introduces 3 alerts when merging f2794fdeac57763a8967109ed731a34d1f0aa3f2 into 0b3f0bb - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 7, 2021

This pull request introduces 3 alerts when merging 1d38c3a0b06750b0400d74be55f797fe41a9c84a into 1d3c8c1 - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 8, 2021

This pull request introduces 3 alerts when merging d4264269de51988dd8d3ed7e5b81249fcccb5b7d into 6ac4e2d - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 10, 2021

This pull request introduces 3 alerts when merging 46f61764a876f7225208d4197b3d25cad6ed88da into a8b9165 - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 10, 2021

This pull request introduces 3 alerts when merging 69cb01b71d851a930d763b224619b03e5416c633 into 761fe9f - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 11, 2021

This pull request introduces 3 alerts when merging 3e96349643cd3f52d05acbed7c240d4c8729226a into 761fe9f - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 12, 2021

This pull request introduces 3 alerts when merging 204aecff6d98bbd4254cf8505019bdc0ac673e8d into e53c3e8 - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@@ -192,4 +200,72 @@ public void testSyncListener()

updates.clear();
}

@Test
public void testAssign() throws Exception
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests were added to satisfy the code coverage requirements for the diff.

}

@Test
public void testCompressionCodecHeader() throws Exception
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test added to satisfy code coverage checks

@@ -252,6 +245,39 @@ public void run()
}
}

@Nonnull
private AtomicReference<String> acceptEncodingServer(ExecutorService exec, ServerSocket serverSocket)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test added to satisfy code coverage checks

@lgtm-com
Copy link

lgtm-com bot commented Dec 13, 2021

This pull request introduces 3 alerts when merging 5cec2c2e0c4f099a9dbaabbe75abb2483f684bd6 into e53c3e8 - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 15, 2021

This pull request introduces 3 alerts when merging cc334bbf6ba9a6af2209be6554254b1889a40456 into 4ede3bb - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 15, 2021

This pull request introduces 3 alerts when merging 04a526430865ffd0bca8a7f0862f65cfcf3be237 into 4ede3bb - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 21, 2021

This pull request introduces 3 alerts when merging a9bce507501916791b8474f76a018b85c91e43f8 into f345759 - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Dec 21, 2021

This pull request introduces 3 alerts when merging be46d9f0aa08165a53992c491e76e3c3f60abc74 into f345759 - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

Copy link
Contributor

@cheddar cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over this and nothing jumped out as immediately off. I'm not an expert on Netty 4, but it all seems meaningful. If/once the tests are passing, this seems like it should be good to go.

The http interactions are a pretty integral part of the system as a whole, so there's risk in merging this. Perhaps this can be a first thing to merge post release so that it has a good chunk of time to bake in various environments.

retVal.set((Final) response.getObj());
}

if (httpChunk instanceof LastHttpContent) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a reason not to merge the logic from this if with the if(response.isFinished()...) above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the past we would only call handleChunk() on chunks that were not the last chunk. Not sure if that was a mistake or if netty 3 happened to always finish with an empty chunk. With netty 4 the last chunk can contain data, so we do need to call handleChunk of all HttpContent messages, irrespective of whether they implemenent LastHttpContent or not.
isFinished() may also return true before we handle all the chunks (e.g. in the case of a streaming response) so we wouldn't want to call finishRequest until we handled all the chunks.

catch (IOException e) {
throw new RuntimeException(e);
try {
// add empty initial buffer since SequenceInputStream will peek the first element right away
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not telling me why it's bad that it's peeking the first element right away? Should we fix that instead of priming the queue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make the comment more explicit, the SequenceInputStream constructor peeking will block on the empty queue, causing the entire method to block. Previously we were guaranteed at least one chunk in this method, but that's no longer the case now.

To fix this, here are some options:

  1. write our own SequenceInputStream.
  2. change HttpResponseHandler to have an InitialType returned by handleReponse(), so that we can create the SequenceInputStream in handleResponse once we get the first chunk.

I would probably opt for the first one, but either one seems a bit complex and my inclinations is to defer those improvements to avoid further increasing the scope of this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of leaving this as-is and extending the comment to be more explicit. Something like:

SequenceInputStream constructor blocks if the queue is empty, but we won't have content available until the first chunk comes in. So, add an empty first element.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm I think 👍

i tried to run integration tests to make sure everything is still ok (i think we can ignore the missing remaining coverage) and it looks like a change in another PR has caused a compilation failure:

[ERROR] COMPILATION ERROR : 
[ERROR] /home/travis/build/apache/druid/server/src/test/java/org/apache/druid/client/indexing/HttpIndexingServiceClientTest.java:[281,58] cannot find symbol
  symbol:   class BigEndianHeapChannelBuffer
  location: class org.apache.druid.client.indexing.HttpIndexingServiceClientTest
[ERROR] /home/travis/build/apache/druid/server/src/test/java/org/apache/druid/client/indexing/HttpIndexingServiceClientTest.java:[281,29] cannot find symbol
  symbol:   method getContent()
  location: variable response of type io.netty.handler.codec.http.HttpResponse
[ERROR] /home/travis/build/apache/druid/server/src/test/java/org/apache/druid/client/indexing/HttpIndexingServiceClientTest.java:[286,9] cannot find symbol
  symbol:   variable StandardCharsets
  location: class org.apache.druid.client.indexing.HttpIndexingServiceClientTest
[ERROR] /home/travis/build/apache/druid/server/src/test/java/org/apache/druid/client/indexing/HttpIndexingServiceClientTest.java:[287,6] cannot find symbol
  symbol:   method addChunk(java.lang.String)
  location: class org.apache.druid.java.util.http.client.response.StringFullResponseHolder

if (request.hasContent()) {
channel.write(new DefaultHttpContent(request.getContent()));
}
channel.writeAndFlush(LastHttpContent.EMPTY_LAST_CONTENT).addListener(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 'flush' change was the one that stuck out to me in https://netty.io/wiki/new-and-noteworthy-in-4.0.html, looks like we got them all afaict 👍

@FrankChen021
Copy link
Member

@xvrl I see the router and the coordinator are using jetty as http client. Do you know why we need two http client libraries? Is there any special consideration?

@lgtm-com
Copy link

lgtm-com bot commented Jan 5, 2022

This pull request introduces 3 alerts when merging ac47cca into b53e7f4 - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Oct 12, 2022

This pull request introduces 3 alerts when merging 3befeaf into 3e13584 - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@lgtm-com
Copy link

lgtm-com bot commented Oct 31, 2022

This pull request introduces 3 alerts when merging a1ade92 into 675fd98 - view on LGTM.com

new alerts:

  • 3 for Potential input resource leak

@liam-verta
Copy link
Contributor

@xvrl We are are using the version of this change that I back ported to 0.23.0 in production. It appears to be working perfectly. Really hope we can get this final version of this merged soon. If there's anything more i can do, just say so.

@xvrl
Copy link
Member Author

xvrl commented Jan 6, 2023

@liam-verta glad to hear this is working! Very brave of you to try this in production :) By any chance do you have any data that would indicate an increate in heap or direct memory consumption after merging this patch?

try {
// SequenceInputStream constructor blocks if the queue is empty, however no content will be queued until
// the first chunk comes in. Adding an empty initial buffer to unblock.
queue.put(new ByteBufInputStream(Unpooled.EMPTY_BUFFER)); // lgtm [java/input-resource-leak]

Check warning

Code scanning / CodeQL

Potential input resource leak

This ByteBufInputStream is not always closed on method exit.
queue.put(channelStream);
try {
// input streams will be closed by the consumer as we iterate through them in SequenceInputStream
queue.put(new ByteBufInputStream(byteBuf.retain(), true)); // lgtm [java/input-resource-leak]

Check warning

Code scanning / CodeQL

Potential input resource leak

This ByteBufInputStream is not always closed on method exit.
@@ -136,19 +132,14 @@
try {
// An empty byte array is put at the end to give the SequenceInputStream.close() as something to close out
// after done is set to true, regardless of the rest of the stream's state.
queue.put(ByteSource.empty().openStream());
queue.put(new ByteBufInputStream(Unpooled.EMPTY_BUFFER, true)); // lgtm [java/input-resource-leak]

Check warning

Code scanning / CodeQL

Potential input resource leak

This ByteBufInputStream is not always closed on method exit.
);
OutputStream out = clientSocket.getOutputStream()
) {
while (!in.readLine().equals("")) {

Check notice

Code scanning / CodeQL

Inefficient empty string test

Inefficient comparison to empty string, check for zero length instead.
) {
StringBuilder request = new StringBuilder();
String line;
while (!"".equals((line = in.readLine()))) {

Check notice

Code scanning / CodeQL

Inefficient empty string test

Inefficient comparison to empty string, check for zero length instead.
requestContent.set(request.toString());
out.write("HTTP/1.1 200 OK\r\n\r\n".getBytes(StandardCharsets.UTF_8));

while (!in.readLine().equals("")) {

Check notice

Code scanning / CodeQL

Inefficient empty string test

Inefficient comparison to empty string, check for zero length instead.
) {
// Read headers
String header;
while (!(header = in.readLine()).equals("")) {

Check notice

Code scanning / CodeQL

Inefficient empty string test

Inefficient comparison to empty string, check for zero length instead.
@liam-verta
Copy link
Contributor

liam-verta commented Jan 25, 2023

@xvrl

By any chance do you have any data that would indicate an increate in heap or direct memory consumption after merging this patch?

We do not have comparison data at this time. We made the cut over a while ago.

@abhishekagarwal87
Copy link
Contributor

@xvrl - are you working on this PR? the PR was approved but now has a bunch of conflicts.

@xvrl
Copy link
Member Author

xvrl commented Apr 4, 2023

@abhishekagarwal87 I've been updating it once in a while to keep it in sync, but there are still some integration test failures that I haven't had time to elucidate. If someone else has time to look into this that would be great, since I don't anticipate to be ale to spend much time on it in the near future.

Copy link

github-actions bot commented Dec 8, 2023

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 8, 2023
Copy link

github-actions bot commented Jan 6, 2024

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jan 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants