Async reads for JDBC #13196

paul-rogers · 2022-10-09T04:41:22Z

Prevents JDBC timeouts on long queries by returning empty batches when a batch fetch takes too long. Uses an async model to run the result fetch concurrently with JDBC requests.

Release Notes

Druid's Avatica-based JDBC handler is not the preferred way to run long-running queries in Druid.

Druid supports the Avatica JDBC driver. Avatica uses HTTP requests to communicate with the server. When using JDBC with long-running queries, the HTTP request can time out, producing an error. This PR uses an internal feature of JDBC to avoid timeouts by returning "empty batches" of rows when Druid takes too long to return the actual rows. The JDBC client automatically requests more rows, resulting in Druid queries running asynchronously with JDBC requests. The result is that JDBC queries no longer time out.

This feature is enabled by default with a timeout of 5 seconds. You can modify the time out by changing the druid.sql.avatica.fetchTimeoutMs property to the new timeout. Specify the timeout in milliseconds. Druid enforces a minimum of 1000 milliseconds to prevent hammering of the Broker.

Description

Druid's Avatica handler implementation already uses async execution to open and close the "yielder" for a query. This PR extends the idea by using the same executor for fetches. The fetch state (yielder, fetch offset) resides in a new ResultFetcher class that is invoked on each request to get results.

The async logic is simple:

If a future exists from the previous fetch exists, use it.
Otherwise, invoke the fetcher using the existing ExecutorService which returns a future.
Wait for the future to finish, but only up to the fetch timeout.
If the wait times out, save the future and return an empty batch.
If the results arrive in time, return them.

The close operation is modified to wait for any in-flight fetch to return before shutting down the result set. A ResultFetcherFactory creates the fetcher for each query. The factory is needed only to allow introducing artificial delays for testing.

The Avatica handler tests are cleaned up to eliminate some redundancy. This then allowed tests for async to be created with less copy/paste than with the existing code.

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a test Druid cluster.

Prevents JDBC timeouts on long queries by returning empty batches when a batch fetch takes too long. Uses an async model to run the result fetch concurrently with JDBC requests.

FrankChen021 · 2022-10-09T05:23:38Z

This PR uses an internal feature of JDBC to avoid timeouts by returning "empty batches" of rows when Druid takes too long to return the actual rows.

I always thought the HTTP timeout is a client behavior, and didn't know that it can be controlled by the server.
And I'm curious that is such internal feature implemented by Avatica JDBC driver only? Could you show me some materials to learn more about it?

FrankChen021

Overall LGTM with some minor comments

sql/src/main/java/org/apache/druid/sql/avatica/DruidJdbcResultSet.java

sql/src/main/java/org/apache/druid/sql/avatica/DruidMeta.java

sql/src/main/java/org/apache/druid/sql/avatica/DruidJdbcResultSet.java

paul-rogers · 2022-10-09T06:16:10Z

@FrankChen021, you get the Fastest Review Ever award! Thanks!

I always thought the HTTP timeout is a client behavior, and didn't know that it can be controlled by the server.
And I'm curious that is such internal feature implemented by Avatica JDBC driver only? Could you show me some materials to learn more about it?

You are right: the client (and proxies and Router) all determine timeout: the shortest timeout wins. The challenge is when that HTTP timeout is shorter than the amount of time Druid needs to compute the next batch of results. In a regular REST API via the /sql endpoint, we have to compute and return all the results within this timeout. This is usually not a problem because Druid is intended for simple queries: those that return in 100ms with a few hundred or thousand rows, and that exploit Druid's time partitioning, filters, etc.

The challenge is with the occasional "BI" style query that, for whatever reason, returns many rows, or takes a long time to compute. Maybe you've got billions of rows and need to find the one event, anywhere in that set, that represents access to some resource. There is nothing for it but to scan all the data, and that might take a while depending on the filter used. Or, maybe someone wants to grab several million rows of data to feed into an ML model. And so on.

Normally you would not want to run such a query on Druid: that;'s not what Druid is designed for. But, sometimes you just gotta do what you gotta do. In this case, if your query takes a minute to run, and one of the network elements in the chain has a 30 second timeout, you can't successfully run the query with request response.

But, JDBC is different. It works by sending multiple requests for each query, each returning some number of rows, generally in the thousands. The client keeps asking for more batches until the server returns EOF. So, if we had a long running query, but each batch of records could be computed in less time than the HTTP timeout, the query would succeed.

Most of the time in a big query, however, is the work before we get the first row. So, we want to generalize the above idea. JDBC allows us to return a batch of 0 rows. If so, the client just turns around and requests another batch until it gets rows or is told that an EOF occurred. Just to be clear, this polling is a feature of the Avatica client. The application just iterates though a ResultSet, blissfully ignorant of the goings-on under the covers.

That's what the async thing in this PR does for us. We tell Druid to go get a batch of rows. If this takes longer than the fetch timeout, we return an empty batch, and remember that Druid is busily working away to get us a batch. Maybe it will be ready next time. If, so, we return it. If not, we again return an empty batch and the whole thing repeats.

For this to work, the fetch timeout has to be less than the HTTP timeout. Given jitter, we might use the Nyquist rule and set the fetch timeout to half of the HTTP timeout. It doesn't have to be exact, any value less than 1/2 the network timeout should be fine.

Voila! We've decoupled query run time from network timeouts by running the query (or at least each batch fetch) asynchronously with the Avatica REST HTTP requests. So, you see, we don't set the HTTP timeout, we just work around it by always returning within the timeout, knowing that the client will do the right thing if we return 0 rows to stay within the timeout.

FrankChen021 · 2022-10-09T06:25:23Z

Just to be clear, this polling is a feature of the Avatica client.

I get it, it's a mechanism provided by the client. No wonder I didn't see such behaviour in other HTTP-based JDBC clients.
This is a good feature that ease client users to set proper timeout settings.

sql/src/main/java/org/apache/druid/sql/avatica/DruidJdbcResultSet.java

FrankChen021 · 2022-10-11T13:34:59Z

I was thinking what's the proper tag to label this PR.

We have a label Apache Avatica, but that seems more related to the Avatica client instead of our server side implementation.

And Area - Querying is more related to query planning, query engine, query execution etc. Maybe we need a tag Area - JDBC to tag any issues about our server side processing related to JDBC.

What do you think @paul-rogers @kfaraz

Fixed race condition in Druid's Avatica server-side handler Fixed issue with no-user connections

paul-rogers · 2022-10-11T22:31:02Z

@FrankChen021, thanks for taking a look at the PR and for working out the proper tag. You suggest:

Maybe we need a tag Area - JDBC to tag any issues about our server side processing related to JDBC.

When used for issues, it is hard for a user to know that an issue is in JDBC vs. the other areas you mention. Us developers know, but JDBC is a rather small area. Maybe Area - Clients or Area - API to encompass JDBC, REST and any other APIs we might invent? But, really, any of those tags are fine if it helps us sort PRs in a useful way.

docs/configuration/index.md

zachjsh

🚀

paul-rogers added 2 commits October 8, 2022 19:43

Async reads for JDBC

f4bc3ea

Prevents JDBC timeouts on long queries by returning empty batches when a batch fetch takes too long. Uses an async model to run the result fetch concurrently with JDBC requests.

Documentation

28358e8

FrankChen021 reviewed Oct 9, 2022

View reviewed changes

kfaraz added the Area - Querying label Oct 10, 2022

zachjsh reviewed Oct 10, 2022

View reviewed changes

sql/src/main/java/org/apache/druid/sql/avatica/DruidJdbcResultSet.java Show resolved Hide resolved

zachjsh reviewed Oct 10, 2022

View reviewed changes

sql/src/main/java/org/apache/druid/sql/avatica/DruidJdbcResultSet.java Show resolved Hide resolved

More testing and debugging

6e77439

Fixed race condition in Druid's Avatica server-side handler Fixed issue with no-user connections

Bug fix

edf32f3

LakshSingla reviewed Oct 12, 2022

View reviewed changes

docs/configuration/index.md Outdated Show resolved Hide resolved

zachjsh approved these changes Oct 12, 2022

View reviewed changes

paul-rogers added 4 commits October 12, 2022 15:19

Fix checkstyle issue

ab35c7e

IntelliJ inspections fix

8969086

Merge branch 'master' into 221010-jdbc-async

8496752

Merge fixes

9a44aed

paul-rogers merged commit b34b435 into apache:master Oct 18, 2022

paul-rogers deleted the 221010-jdbc-async branch October 18, 2022 18:41

kfaraz added this to the 25.0 milestone Nov 22, 2022

This was referenced Dec 18, 2022

[Draft] 25.0.0 Release Notes #13592

Closed

Add SegmentAllocationQueue to batch allocation actions #13369

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async reads for JDBC #13196

Async reads for JDBC #13196

paul-rogers commented Oct 9, 2022

FrankChen021 commented Oct 9, 2022

FrankChen021 left a comment

paul-rogers commented Oct 9, 2022

FrankChen021 commented Oct 9, 2022

FrankChen021 commented Oct 11, 2022

paul-rogers commented Oct 11, 2022

zachjsh left a comment

Async reads for JDBC #13196

Async reads for JDBC #13196

Conversation

paul-rogers commented Oct 9, 2022

Release Notes

Description

FrankChen021 commented Oct 9, 2022

FrankChen021 left a comment

Choose a reason for hiding this comment

paul-rogers commented Oct 9, 2022

FrankChen021 commented Oct 9, 2022

FrankChen021 commented Oct 11, 2022

paul-rogers commented Oct 11, 2022

zachjsh left a comment

Choose a reason for hiding this comment