Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql support for dynamic parameters #6974

Merged
merged 45 commits into from
Feb 19, 2020

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Jan 31, 2019

This PR adds support for parameterized SQL queries of the form:

SELECT COUNT(*) FROM data_source WHERE foo = ? AND __time > ?

Over the HTTP API :

{
  "query" : "SELECT COUNT(*) FROM data_source WHERE foo = ? AND __time > ?",
  "parameters": [
    { "type": "VARCHAR", "value": "bar"},
    { "type": "TIMESTAMP", "value": "2000-01-01 00:00:00" }
  ]
}

JDBC:

PreparedStatement statement = connection.prepareStatement("SELECT COUNT(*) AS cnt FROM druid.foo WHERE dim1 = ? OR dim1 = ?");
statement.setString(1, "abc");
statement.setString(2, "def");
final ResultSet resultSet = statement.executeQuery();

Design

Dynamic parameter binding for Druid queries is performed in a "query optimized" (rather than "planner optimized") manner which substitutes the dynamic parameters (SqlDynamicParam and RexDynamicParam) nodes of the expression with the literals of the value that they are bound to as early as we can. This lets us maximize simplification of the query expression as much as possible prior to execution, which is likely always going to be better for query performance. For JDBC, this pushes the actual planning into the 'execute' call, so re-using a prepared statement with different parameters will result in re-planning the query, which is what I meant by the earlier assertion that this approach is "query optimized" instead of "planner optimized". Since planning is expected to be cheap compared to the actual query, this seems like the correct approach to take here, especially since these unoptimized queries could result in either complex or un-necessary virtual column expressions at execution time.

This is done through two layers of expression tree transformation, the first SqlParametizerShuttle transforming the SqlNode expression, and the second RelParameterizerShuttle transforming the RelNode expression after the SqlNode has been parsed and validated and translated into a relational form. Through experimentation I have got the first transformation to catch all of the test cases I have so far, but the ability to create a SqlLiteral from arbitrary JDBC types is less rich than the facilities to create a RexLiteral, so I still have the second transformation in place, just in case. Some additional details are available in a Calcite dev-list thread about how to approach this problem.

@clintropolis
Copy link
Member Author

One of the primary motivators was to provide a mechanism to supply very large literals (serialized bloom filters) to a Druid query without choking the tokenizer with a 10MB+ string literal, as the timings of this test show it's a nice improvement:

screen shot 2019-01-31 at 3 39 32 pm

I meant to mark the slow test as ignore though so it doesn't get run, so will update PR.

@fjy fjy added this to the 0.15.0 milestone Mar 11, 2019
@clintropolis clintropolis removed the WIP label Mar 11, 2019
@peterlittig
Copy link

I found my way here via CALCITE-2873 and am receiving the same AvaticaSqlException when attempting to prepare statements against my Druid cluster. Two questions: 1) How broad is the scope of fix in this PR? (i.e., I'm wondering if my use-case will likely be covered by the fix) and 2) Is there a best-guess timeline for when it will be merged into apache:master?

Many thanks!

@clintropolis
Copy link
Member Author

clintropolis commented Mar 20, 2019

@peterlittig, I expect this will probably be merged to master within the next couple of weeks, barring any major issues uncovered during review, and should be included in release 0.15 (which is quickly sneaking up on us even though 0.14 isn't quite finalized yet).

I'm not sure on your use case, but I hope so? I've attempted this PR to add general purpose support for dynamic parameters to Druid sql queries through both the JSON API and JDBC interfaces. I've added a handful of test cases for a variety of expressions to the new in this PR CalciteParameterQueryTest, have a look to see if the queries you're trying to do look similar to any of those, and if not I'd be happy to add a test case to make sure it works correctly.

One thing to note, sometimes dynamic parameters can trip up the type inference, which will result in a query failing to validate before it can plan. As far as I can tell, this is largely on the calcite side, as queries will confuse the SqlValidator that parses the SqlNode prior to where this PR does it's thing, but there is a sort of work around for this through using a CAST(? as <type>) in the expression to explicitly type the parameters, e.g. like this example from the linked test cases:

WHEN ? THEN SUM(cnt) / CAST(? as INT)

In this example, the first ? passes through the validator, but the 2nd must be explicitly cast to an INT to pass validation. It's unfortunate, but should get better with time and new versions.

@peterlittig
Copy link

Thanks for the response, @clintropolis! The use-cases I currently have in mind are fairly simple/straightforward. I'll definitely take a look at the test class you linked and see I'm covered. Thanks also for the timeline estimate. It would be great to see this go out in 0.15!

@gianm gianm removed this from the 0.15.0 milestone Apr 10, 2019
@clintropolis
Copy link
Member Author

One note: given that parameters are specified in a list, could it be legal to leave out all ordinals and have them default to their 1-based list index if they are all missing?

@glasser I have modified the parameters in the http call to no longer have the ordinal property, in favor of the more user friendly approach you suggested of using list order. I have updated the PR description (and docs) to reflect this.

}

@Test
public void testLongs() throws Exception
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test will fail CI in sql compatible mode until #9251 is merged

@@ -607,7 +641,7 @@ Properties connectionProperties = new Properties();
try (Connection connection = DriverManager.getConnection(url, connectionProperties)) {
try (
final Statement statement = connection.createStatement();
final ResultSet resultSet = statement.executeQuery(query)
final ResultSet resultSet = statement.executeQuery(query);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original example is better as the semicolon is not needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -273,10 +275,67 @@ public void testBloomFilters() throws Exception
);
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #6974 (comment), it'd be good to ignore this slow test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked both as ignored

this.maxRowCount = maxRowCount;
this.query = query;
ArrayList<AvaticaParameter> params = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Declare as List instead of ArrayList to separate behavior from implementation

return Long.class;
case REAL:
return Float.class;
case FLOAT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment for why this is not Float.class would be helpful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added link to docs I used as reference for mappings from JDBC to Java types

}

@Override
public boolean equals(Object o)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding an EqualsVerifier unit test for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -85,13 +110,14 @@ public boolean equals(final Object o)
return header == sqlQuery.header &&
Objects.equals(query, sqlQuery.query) &&
resultFormat == sqlQuery.resultFormat &&
Objects.equals(context, sqlQuery.context);
Objects.equals(context, sqlQuery.context) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding an EqualsVerifier unit test for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

*/
public class CalciteParameterQueryTest extends BaseCalciteQueryTest
{
private final boolean useDefault = NullHandling.replaceWithDefault();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be beneficial to test binding a null?

Copy link
Member Author

@clintropolis clintropolis Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, thanks for asking as the current implementation wasn't accepting null valued parameters because a year ago when I made this branch we didn't really publicly acknowledge that null could exist, and it had a precondition check that value was not in fact null. I've adjusted this to accept null parameters.

However, since there is no null equivalence in sql, it doesn't really seem to be all that useful, and in all cases I could think of you need a different expression to be able to explicitly use a null parameter, making using a parameter to pass null basically pointless afaict unless I'm missing some obvious case.

Additionally, calcite seems to fail to parse stuff like:

SELECT COUNT(*)
FROM table
WHERE column IS ?

producing an error of the form:

org.apache.calcite.sql.parser.SqlParseException: Encountered "IS ?" at line 3, column 10.1

I did add a couple of contrived examples, one using coalesce that optimizes out the parameter if it is null since coalesce returns first non-null, and a bloom filter test which is optimized out because it ends up being a constant, just to test the path of a null parameter getting fed into things.

@@ -275,6 +276,7 @@ public void testBloomFilters() throws Exception
);
}

@Ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explanation of why it's ignored (the annotation takes a value, which can be used for the comment).

@@ -302,6 +304,7 @@ public void testBloomFilterBigNoParam() throws Exception
);
}

@Ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about adding explanation

Copy link

@sthetland sthetland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Two suggestions and a typo.

@@ -630,6 +663,17 @@ the necessary stickiness even with a normal non-sticky load balancer. Please see

Note that the non-JDBC [JSON over HTTP](#json-over-http) API is stateless and does not require stickiness.

### Dynamic Parameters

Parameterized queries are supported with JDBC:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misread this at first as meaning JDBC enables parameterized queries somehow. Is this wording clearer:
"You can use parameterized queries in JDBC code, as in this example;"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed


|Property|Type|Description|Required|
|--------|----|-----------|--------|
|`type`|`String` (`SqlType`) | String value of `SqlType` of parameter. [`SqlType`](https://calcite.apache.org/avatica/apidocs/org/apache/calcite/avatica/SqlType.html) is an friendly wrapper around [`java.sql.Types`](https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html?is-external=true)|yes|

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "...is a friendly wrapper..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -55,6 +55,9 @@ like `100` (denoting an integer), `100.0` (denoting a floating point value), or
timestamps can be written like `TIMESTAMP '2000-01-01 00:00:00'`. Literal intervals, used for time arithmetic, can be
written like `INTERVAL '1' HOUR`, `INTERVAL '1 02:03' DAY TO MINUTE`, `INTERVAL '1-2' YEAR TO MONTH`, and so on.

Druid SQL supports dynamic parameters using the `?` syntax where parameters are bound to `?` in order. Replace any
literal with a `?` and supply parameters to the query and the values will be bound at execution time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This description feels like it could be expanded upon just a bit, just to draw out what is meant by "in order," for one thing. How about something like:

"Druid SQL supports dynamic parameters in question mark (?) syntax, where parameters are bound to the ? placeholders at execution time. To use dynamic parameters, replace any  literal in the query with a ? character and ensure that corresponding parameter values are provided at execution time. Parameters are bound to the placeholders in the order in which they are passed."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, changed

this.parameters = parameters;
}

public PrepareResult prepare(AuthenticationResult authenticationResult) throws ValidationException, RelConversionException, SqlParseException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please break the line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private AvaticaParameter createParameter(RelDataTypeField field, RelDataType type)
{
return new AvaticaParameter(
false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add a comment on why this is always false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment

@@ -142,6 +150,11 @@ public DruidStatement(
return columns;
}

public void setParameters(List<TypedValue> parameters)
{
this.parameters = parameters;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this needs to lock lock.. Maybe parameters can be passed to execute() directly without having this method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, reworked to make execute take a list of parameters to avoid messing with the lock

Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@clintropolis clintropolis merged commit b408a6d into apache:master Feb 19, 2020
@clintropolis
Copy link
Member Author

Thanks for review all 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants