Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add druid jdbc handler config for minimum number of rows per frame #10880

Merged
merged 6 commits into from
Feb 23, 2021

Conversation

clintropolis
Copy link
Member

Description

This PR adds a new config, druid.sql.avatica.minRowsPerFrame as a counterpart to druid.sql.avatica.maxRowsPerFrame, to give cluster operators some control over the size of frames (and so indirect control over the overall number of fetch requests required), as large result sets can require many thousands of frames to transfer at the default size of 100.

Related, this config also serves as a sort of workaround to a missing feature (or bug?) of Avatica, described in https://issues.apache.org/jira/browse/CALCITE-2322, where the fetchSize parameter of a Statement or PreparedStatement are ignored and hard coded to use a default value of 100. There is an associated PR that is a couple of years old, apache/calcite-avatica#49; with some minor fix-up to make things build I was able to locally patch my Druid and confirm it fixes the issue and allows us to pick up the user set fetchSize set on JDBC queries on our end. That PR seems to have lost momentum though, so I'm going to try to see if I can pick it up and get that issue fixed.

This configuration is still useful even if the Avatica issue is resolved, because that fix allows JDBC users to give hints to the size of result sets that they would like, but this PR gives operators control to prevent very small fetch sizes with large result sets from choking things with a large number of fetch requests, or generally optimally tune default jdbc result sizes for typical workloads and override user requests.

Frame size does have implications for JDBC performance as well. Using a simple utility, I did some tests with 500k total results using various fetch sizes (with the patched Avatica, but the same holds if done via the new server config).

$ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=100 > out.csv
Results processed: 2971 millis
$ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=1000 > out.csv
Results processed: 1479 millis
$ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=10000 > out.csv
Results processed: 1166 millis
$ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=100000 > out.csv
Results processed: 2221 millis
$ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=500000 > out.txt
Results processed: 2184 millis

There does appear to be a sort of goldilocks zone of not too small not too big, but that might vary per query and/or overall result set size, I haven't really played with it much.

Going through a router seems to exaggerate the overhead of small frame sizes as well:

$ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/ --path=./test.sql --fetch=100 > out.csv
Results processed: 7394 millis
$ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=100 > out.csv
Results processed: 2219 millis
$ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8888/druid/v2/sql/avatica/ --path=./test.sql --fetch=10000 > out.csv
Results processed: 1518 millis
$ ./druid-jdbc prepared-statement --jdbc=jdbc:avatica:remote:url=http://localhost:8082/druid/v2/sql/avatica/ --path=./test.sql --fetch=10000 > out.csv
Results processed: 1176 millis

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@@ -918,6 +918,107 @@ public Frame fetch(
);
}


@Test
public void testMinRowsPerFrame() throws Exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to fully understand this test. Is this testing that if a client supplied rows per frame is less than the configured minimum rows per frame, we stick with the minimum in the config?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the statement sets a fetch size of 2, but the min is 1000, so prior to this patch this test would fail because it would take multiple frames (and it also explicitly checks that fetch is called with the adjusted minimum value here https://github.com/apache/druid/pull/10880/files#diff-5430fa325aa463d4b6c34cad0fa3816e30e189ad2e3d7dbf56567c60e9c6c785R978)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the clarification

Copy link
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I'm mostly unfamiliar with this area of Druid so I'm not too privy to the details of Avatica outside of the information and links in the PR description. Nevertheless, the change makes sense to me and the tests make me confident it does what it is supposed to. I do see you left the integration tests box there but un-checked, not sure if that means you plan on writing one or not. Even without adding one, I'd approve.

}

/**
* coerce frame size to minimum number of rows per frame to the minimum of {@link AvaticaServerConfig#minRowsPerFrame}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is a little hard to read (for me at least).

maybe, coerce frame size to be, at minimum, {@link AvaticaServerConfig#minRowsPerFrame} ?

@capistrant
Copy link
Contributor

capistrant commented Feb 16, 2021

@clintropolis wanted to check in and get your thoughts on the integration tests checkbox being there but unchecked. I don't want to merge before you push a test if that is what you plan on doing! But I can merge if you feel that you are not going to push anymore testing or update the the nit comment on readability of the javadoc

@clintropolis
Copy link
Member Author

@clintropolis wanted to check in and get your thoughts on the integration tests checkbox being there but unchecked. I don't want to merge before you push a test if that is what you plan on doing! But I can merge if you feel that you are not going to push anymore testing or update the the nit comment on readability of the javadoc

Ah, sorry for the delay, have been swamped. I wasn't planning on adding an integration test at this time since I'm not entirely sure it would add much that the existing unit/functional-ish tests cover, but i was going to adjust the javadoc as well as rework the documentation to be a bit clearer, so will add the WIP label

@clintropolis clintropolis removed the WIP label Feb 17, 2021
Copy link
Contributor

@techdocsmith techdocsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some stylistic suggestions.

@@ -1644,7 +1644,8 @@ The Druid SQL server is configured through the following properties on the Broke
|`druid.sql.enable`|Whether to enable SQL at all, including background metadata fetching. If false, this overrides all other SQL-related properties and disables SQL metadata, serving, and planning completely.|true|
|`druid.sql.avatica.enable`|Whether to enable JDBC querying at `/druid/v2/sql/avatica/`.|true|
|`druid.sql.avatica.maxConnections`|Maximum number of open connections for the Avatica server. These are not HTTP connections, but are logical client connections that may span multiple HTTP connections.|25|
|`druid.sql.avatica.maxRowsPerFrame`|Maximum number of rows to return in a single JDBC frame. Setting this property to -1 indicates that no row limit should be applied. Clients can optionally specify a row limit in their requests; if a client specifies a row limit, the lesser value of the client-provided limit and `maxRowsPerFrame` will be used.|5,000|
|`druid.sql.avatica.maxRowsPerFrame`|Maximum value which can be supplied to the JDBC `Statement.setFetchSize` method. This setting determines the maximum sized JDBC `ResultSet` which Druid will populate in a single 'fetch'. Setting this property to -1 indicates that no row limit should be applied on the server-side, potentially returning the entire set of rows on the initial statement execution. If the JDBC client calls `Statement.setFetchSize` with a value other than -1, the lesser value of the client-provided limit and `maxRowsPerFrame` will be used. If `maxRowsPerFrame` is smaller than `minRowsPerFrame`, then the `ResultSet` size will be fixed. For handling queries which produce results with a large number of rows, consider increasing this value to reduce the number of fetches required to transfer the result set.|5,000|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.sql.avatica.maxRowsPerFrame`|Maximum value which can be supplied to the JDBC `Statement.setFetchSize` method. This setting determines the maximum sized JDBC `ResultSet` which Druid will populate in a single 'fetch'. Setting this property to -1 indicates that no row limit should be applied on the server-side, potentially returning the entire set of rows on the initial statement execution. If the JDBC client calls `Statement.setFetchSize` with a value other than -1, the lesser value of the client-provided limit and `maxRowsPerFrame` will be used. If `maxRowsPerFrame` is smaller than `minRowsPerFrame`, then the `ResultSet` size will be fixed. For handling queries which produce results with a large number of rows, consider increasing this value to reduce the number of fetches required to transfer the result set.|5,000|
|`druid.sql.avatica.maxRowsPerFrame`|Maximum acceptable value for the JDBC client `Statement.setFetchSize` method. This setting determines the maximum number of rows that will Druid will populate in a single 'fetch' for a JDBC `ResultSet`. Set this property to -1 to enforce no row limit on the server-side and potentially return the entire set of rows on the initial statement execution. If the JDBC client calls `Statement.setFetchSize` with a value other than -1, Druid uses the lesser value of the client-provided limit and `maxRowsPerFrame`. If `maxRowsPerFrame` is smaller than `minRowsPerFrame`, then the `ResultSet` size will be fixed. To handle queries that produce results with a large number of rows, you can increase value of `druid.sql.avatica.maxRowsPerFrame` to reduce the number of fetches required to completely transfer the result set.|5,000|

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied this suggestion with a couple of minor fixes (".. will Druid will populate .." and removed an extra space), thanks

@@ -1644,7 +1644,8 @@ The Druid SQL server is configured through the following properties on the Broke
|`druid.sql.enable`|Whether to enable SQL at all, including background metadata fetching. If false, this overrides all other SQL-related properties and disables SQL metadata, serving, and planning completely.|true|
|`druid.sql.avatica.enable`|Whether to enable JDBC querying at `/druid/v2/sql/avatica/`.|true|
|`druid.sql.avatica.maxConnections`|Maximum number of open connections for the Avatica server. These are not HTTP connections, but are logical client connections that may span multiple HTTP connections.|25|
|`druid.sql.avatica.maxRowsPerFrame`|Maximum number of rows to return in a single JDBC frame. Setting this property to -1 indicates that no row limit should be applied. Clients can optionally specify a row limit in their requests; if a client specifies a row limit, the lesser value of the client-provided limit and `maxRowsPerFrame` will be used.|5,000|
|`druid.sql.avatica.maxRowsPerFrame`|Maximum value which can be supplied to the JDBC `Statement.setFetchSize` method. This setting determines the maximum sized JDBC `ResultSet` which Druid will populate in a single 'fetch'. Setting this property to -1 indicates that no row limit should be applied on the server-side, potentially returning the entire set of rows on the initial statement execution. If the JDBC client calls `Statement.setFetchSize` with a value other than -1, the lesser value of the client-provided limit and `maxRowsPerFrame` will be used. If `maxRowsPerFrame` is smaller than `minRowsPerFrame`, then the `ResultSet` size will be fixed. For handling queries which produce results with a large number of rows, consider increasing this value to reduce the number of fetches required to transfer the result set.|5,000|
|`druid.sql.avatica.minRowsPerFrame`|Minimum value which can be supplied to the JDBC `Statement.setFetchSize` method. This property must be set to a value greater than 0. If a `Statement.setFetchSize` is specified with a smaller value, this row count will be used instead. If `maxRowsPerFrame` is set to a value lower than `minRowsPerFrame`, the minimum value of the two will be used. For handling queries which produce results with a large number of rows, consider increasing this value to reduce the number of fetches required to transfer the result set.|100|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.sql.avatica.minRowsPerFrame`|Minimum value which can be supplied to the JDBC `Statement.setFetchSize` method. This property must be set to a value greater than 0. If a `Statement.setFetchSize` is specified with a smaller value, this row count will be used instead. If `maxRowsPerFrame` is set to a value lower than `minRowsPerFrame`, the minimum value of the two will be used. For handling queries which produce results with a large number of rows, consider increasing this value to reduce the number of fetches required to transfer the result set.|100|
|`druid.sql.avatica.minRowsPerFrame`|Minimum acceptable value for the JDBC client `Statement.setFetchSize` method. The value for this property must greater than 0. If the JDBC client specifies a `Statement.setFetchSize` with a lesser value, Druid uses this row count instead. If `maxRowsPerFrame` is less than `minRowsPerFrame`, Druid uses the minimum value of the two. For handling queries which produce results with a large number of rows, you can increase this value to reduce the number of fetches required to completely transfer the result set.|100|

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied this suggestion with a couple of minor adjustments ("If the JDBC client specifies a Statement.setFetchSize with a lesser value, Druid uses this row count instead." -> "If the JDBC client calls Statement.setFetchSize with a lesser value, Druid uses minRowsPerFrame instead."), thanks

@clintropolis
Copy link
Member Author

thanks for review @capistrant and @techdocsmith 👍

@clintropolis clintropolis merged commit f34c6eb into apache:master Feb 23, 2021
@clintropolis clintropolis deleted the jdbc-min-frame-size branch February 23, 2021 10:11
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
@clintropolis
Copy link
Member Author

clintropolis commented Sep 20, 2021

it looks like someone has picked up the abandoned Avatica PR to fix the underlying issue about not being able to control this client-side, apache/calcite-avatica#148

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants