Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avatica protobuf #10543

Merged
merged 1 commit into from
Mar 31, 2021
Merged

Avatica protobuf #10543

merged 1 commit into from
Mar 31, 2021

Conversation

lkm
Copy link
Contributor

@lkm lkm commented Oct 30, 2020

Description

Druid hadn't implemented the AvaticaProtobufHandler and only supported Json serialisation for Avatica connections. Protocol buffers have much less overhead than Json and thus I would expect the transfer rate to be increased when using it. I'll try to build some benchmarking tools to test this, circumstantial evidence from test suites seems to indicate a speed increase:

... Time elapsed: 14.753 s - in org.apache.druid.sql.avatica.DruidAvaticaJsonHandlerTest
... Time elapsed: 11.482 s - in org.apache.druid.sql.avatica.DruidAvaticaProtobufHandlerTest

As a side note, for the officially supported Avatica Golang driver there isn't any Json serialisation support and thus will only work with this patch.

I was forced to add org.apache.calcite.avatica to druid-server in order to parse the pb messages and retrieve the connection id. The connection id is used to stabilise the choice of brokers when the request goes through the router.

One further note, I've chosen to add another url (/druid/v2/sql/avatica-pb/) for protocol buffers, ideally it would simply be controlled with a content-type, that is unfortunately not implemented in the JDBC driver so it will not work for the time being, however, the aforementioned golang driver does send content-type as application/x-google-protobuf.

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lkm, thanks for the contribution. I left some comments around the query URL. In addition to them, would you please update doc here as well to explain how to use the new feature?

@@ -449,6 +463,95 @@ static String getAvaticaConnectionId(Map<String, Object> requestMap)
return (String) connectionIdObj;
}

static String getAvaticaProtobufConnectionId(Service.Request request)
{
if (request instanceof Service.CatalogsRequest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These if clause seems fragile from changes on Avatica request types (such as new request type), but I'm not sure if there is a better way..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it's a bit fragile, however, that, I'm afraid, is the nature of protobufs. I'm not sure how likely they are to add new request types without significant overhaul to Calcite though

@jihoonson
Copy link
Contributor

@lkm besides my comments above, would you please fix the conflicts?

@lkm lkm force-pushed the avatica-protobuf branch 2 times, most recently from 361fe79 to 14ebeeb Compare December 20, 2020 10:19
@lkm
Copy link
Contributor Author

lkm commented Dec 20, 2020

Thanks for you time, @jihoonson. I've added the spotbugs exclude and documentation for it now.

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest change LGTM. Thanks @lkm!

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review, overall LGTM 👍

We also have ITJdbcQueryTest, it would be useful I think to add coverage for the protobuf endpoint to the integration tests (and pretty easily) by adding additional connection strings, e.g. https://github.com/apache/druid/blob/master/integration-tests/src/test/java/org/apache/druid/tests/query/ITJdbcQueryTest.java#L56 but with the protobuf path and serialization=protobuf, which afaict should control the wire format: https://calcite.apache.org/avatica/docs/client_reference.html

@@ -457,6 +470,95 @@ static String getAvaticaConnectionId(Map<String, Object> requestMap)
return (String) connectionIdObj;
}

static String getAvaticaProtobufConnectionId(Service.Request request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note, since we now have avatica stuff as dependencies to this project, it probably makes sense in a follow-up to modify the JSON version to use the types like this is doing instead of deserializing to a map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much there is to gain from that, what are you thinking would be the advantage? Also, looking through it I can't find an easy to use class like ProtobufTranslationImpl for json.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I guess the reason would be defensive for the most part, to more strongly type the check so that whenever we upgrade the library, stuff would fail at compile time instead of just fail in strange ways at run time if any of the requests ever change, however unlikely.

Some of the unit tests for the json path are using the avatica types for json to test this area already, so was just thinking since its used as a runtime depend now we could use the expected types for non tests too. But yeah, its definitely not necessary and doesn't provide a lot of gain, nor do I think we should do it in this PR or anything.

@@ -104,8 +104,10 @@ public void openConnection(final ConnectionHandle ch, final Map<String, String>
{
// Build connection context.
final ImmutableMap.Builder<String, Object> context = ImmutableMap.builder();
for (Map.Entry<String, String> entry : info.entrySet()) {
context.put(entry);
if (info != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if this was causing some problem before... looking at other implementations some do check for null on this field, 👍

@@ -101,7 +101,7 @@
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadLocalRandom;

public class DruidAvaticaHandlerTest extends CalciteTestBase
public abstract class DruidAvaticaHandlerTest extends CalciteTestBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if this test would be better using @RunWith(JUnitParamsRunner.class) with the different handler implementations instead of making it an abstract class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that not make the test harder to read? You mean parameterising each test case? I was going for a lighter touch approach, the abstracted methods are called in setUp, which would then need rewriting

@clintropolis
Copy link
Member

very sorry for the delay @lkm, PR lgtm, could you fix up conflicts (again, sorry) and we can get this merged?

@lkm
Copy link
Contributor Author

lkm commented Mar 3, 2021

@clintropolis Sure. Pushed a bit too early there. Will try to get this fixed up before Friday

@lkm
Copy link
Contributor Author

lkm commented Mar 31, 2021

@clintropolis I apologise for dragging my feet on this one. I've properly rebased against most recent master now. So it should be good to go

@clintropolis clintropolis merged commit 782a1d4 into apache:master Mar 31, 2021
@lkm lkm deleted the avatica-protobuf branch April 15, 2021 15:03
@lkm
Copy link
Contributor Author

lkm commented Apr 15, 2021

@clintropolis will this be included in the upcoming 0.21 release? Should I provide a backport?

@clintropolis
Copy link
Member

@clintropolis will this be included in the upcoming 0.21 release? Should I provide a backport?

0.21 branch is already created and we only backport bug and security fixes after the branch has been cut, so this will have to wait until the next release. Since 0.21 was created a few months ago and was delayed by 0.20.1 and 0.20.2 releases, it should be released very soon, but we will probably begin with the 0.22 release within the next month so hopefully it won't be too long.

@lkm
Copy link
Contributor Author

lkm commented Apr 16, 2021

Alright, I understand. Thanks for the quick response.

@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
jon-wei added a commit to jon-wei/druid that referenced this pull request Nov 22, 2021
* IMPLY-6556 remove offending settings.xml for intellij inspections

* GCS lookup support (apache#11026)

* GCS lookup support

* checkstyle fix

* review comments

* review comments

* remove unused import

* remove experimental from Kinesis with caveats (apache#10998)

* remove experimental from Kinesis with caveats

* add suggested known issue

* spelling fixes

* Bump aliyun SDK to 3.11.3 (apache#11044)

* Update reset-cluster.md (apache#10990)

fixed Error: Could not find or load main class org.apache.druid.cli.Main

* Make imply-view-manager non-experimental (apache#316)

* Make druid.indexer.task.ignoreTimestampSpecForDruidInputSource default to true, for backwards compat (apache#315)

* Add explicit EOF and use assert instead of exception (apache#11041)

* Add Calcite Avatica protobuf handler (apache#10543)

* bump to latest of same version node and npm versions, bump frontend-maven-plugin (apache#11057)

* request logs through kafka emitter (apache#11036)

* request logs through kafka emitter

* travis fixes

* review comments

* kafka emitter unit test

* new line

* travis checks

* checkstyle fix

* count request lost when request topic is null

* IMPLY-6556 map local repository instead .m2

* remove outdated info from faq (apache#11053)

* remove outdated info from faq

* Add an option for ingestion task to drop (mark unused) all existing segments that are contained by interval in the ingestionSpec (apache#11025)

* Auto-Compaction can run indefinitely when segmentGranularity is changed from coarser to finer.

* Add option to drop segments after ingestion

* fix checkstyle

* add tests

* add tests

* add tests

* fix test

* add tests

* fix checkstyle

* fix checkstyle

* add docs

* fix docs

* address comments

* address comments

* fix spelling

* Allow list for JDBC connection properties to address CVE-2021-26919 (apache#11047)

* Allow list for JDBC connection properties to address CVE-2021-26919

* fix tests for java 11

* Fix compile issue from dropExisting in ingest-service (apache#320)

Co-authored-by: Slava Mogilevsky <triggerwoods91@gmail.com>
Co-authored-by: Parag Jain <pjain1@apache.org>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: frank chen <frank.chen021@outlook.com>
Co-authored-by: Tushar Raj <43772524+tushar-1728@users.noreply.github.com>
Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
Co-authored-by: Lasse Krogh Mammen <lkm@bookboon.com>
Co-authored-by: Clint Wylie <cwylie@apache.org>
Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants