[CALCITE-1128] Implement JDBC batch update methods in remote driver #209

joshelser · 2016-03-10T16:52:21Z

This commit provides an implementation for:

Statement.addBatch(String)
PreparedStatement.addBatch()
PreparedStatement.executeBatch()

The implementation is fairly straightforward except for the addition
of a new server interface: ProtobufMeta. This is a new interface which
the Meta implementation can choose to also implement to provide a "native"
implementation on top of Protobuf objects instead of the Avatica POJOs.

During the investigations Avatica performance pre-1.7.0, it was found
that converting protobufs to POJOs was a very hot code path. This short-circuit
helps us avoid extra objects on the heap and computation to create them
in what should be a very hot code path for write-workloads.

vlsi · 2016-03-10T17:00:58Z

I've scanned through the PR and it looks OK.

joshelser · 2016-03-10T17:02:47Z

Thanks for the quick glance @vlsi !

vlsi · 2016-03-10T17:07:39Z

avatica/server/src/main/java/org/apache/calcite/avatica/jdbc/JdbcMeta.java

+        int i = 1;
+        for (TypedValue value : batch) {
+          // Use the value and then increment
+          preparedStmt.setObject(i++, value.value);


Does this ever work for null values?

Proper setNull(index, sqltype, sqltypename) should be used

Not sure, I just copied what the existing prepared statement logic was doing. Let me write up a quick test.

joshelser · 2016-03-10T19:17:35Z

Just pushed two more commits. One gives a better exception in JdbcMeta and the other is a test for setting nulls in JdbcMeta (in the batch and non-batch paths). Thanks against, Vladimir!

vlsi · 2016-03-10T19:40:57Z

@joshelser, regarding null values, consider the following:
Case 1) Stored procedure proc(p_arg number), proc(p_arg varchar), etc
Which one would be called if using just setObject(1, null)?

Case 2) Consider a statement select id, name from entities where id=?.
Do you think execution plan would be different in case of setInt(1, 42) vs setString(1, "42")?
You might ask me something like "wait a minute, case 2 does not involve nulls". If that is the case, think twice )

joshelser · 2016-03-10T19:48:16Z

Uhh, wrong user mention there, Vladimir. I'm @joshelser.

Which one would be called if using just setObject(1, null)?

But the types of the parameters in the statement are already defined at this point. We're just setting binding a value -- this isn't defining the type of that value.

Do you think execution plan would be different in case of setInt(1, 42) vs setString(1, "42")?
You might ask me something like "wait a minute, case 2 does not involve nulls". If that is the case, think twice )

I really don't follow what you're trying to tell me. Can you phrase your concern by not asking me a question?

vlsi · 2016-03-10T19:49:57Z

But the types of the parameters in the statement are already defined at this point. We're just setting binding a value -- this isn't defining the type of that value.

Can you please pin-point where the type is defined?
Do you mean this setObject(.., null) will eventually be translated to proper setNull(..., sqltype)?

vlsi · 2016-03-10T19:53:12Z

Can you phrase your concern by not asking me a question?

My concern is this setObject: https://github.com/apache/calcite/pull/209/files#diff-39e8b3d56f84afa3fbffe40bed7cbbc2R895 does not convey data type properly.

Failing to specify data type causes performance degradation for major RDBMS (e.g. Oracle, PostgreSQL). For instance, PostgreSQL cannot efficiently batch statements if data type of a column changes from row to row.

joshelser · 2016-03-10T19:55:29Z

Can you please pin-point where the type is defined?

I thought that the PreparedStatement is going to know when this value can be coerced into whatever the necessary value at runtime is. Maybe I'm assuming too much on what will happen behind the scenes.

joshelser · 2016-03-10T19:56:53Z

Failing to specify data type causes performance degradation for major RDBMS (e.g. Oracle, PostgreSQL). For instance, PostgreSQL cannot efficiently batch statements if data type of a column changes from row to row.

But is this case any different than what is already done in JdbcMeta? https://github.com/apache/calcite/blob/master/avatica/server/src/main/java/org/apache/calcite/avatica/jdbc/JdbcMeta.java#L795-L800 I can understand where you're coming from, but this seems like a larger issue than just my changeset.

vlsi · 2016-03-10T19:58:30Z

value can be coerced into whatever the necessary value at runtime is

That is a typical JDBC user's pitfall.
All java's nulls are equals. You cannot tell the difference between (Integer) null vs (String) null.
That is why setNull is really important.

joshelser · 2016-03-10T19:58:49Z

this seems like a larger issue than just my changeset

.. which is fine if we'll get better performance out of setObject(int, Object, int). We should just make sure that we make that change everywhere and add tests if it is functionally incorrect or ambiguous (I'm not sure anymore if you're saying it's incorrect or just not as performant as it could be).

vlsi · 2016-03-10T20:02:05Z

But is this case any different than what is already done in JdbcMeta?

I do not track 100% of Calcite development.
I'm known to bug people, often for no reason. Sorry about that.
I've been bitten by those setObject(, null) bugs multiple times.
I had some spare 20 minutes to go through the changeset.

That is why I just highlighted the issue here.

vlsi · 2016-03-10T20:06:04Z

(I'm not sure anymore if you're saying it's incorrect or just not as performant as it could be).

Both apply.

Case 1) Stored procedure proc(p_arg number), proc(p_arg varchar), etc

In such a case, "prepared statement" would fail with null argument since database would have no way to pick the proper procedure. That is functional issue. The query would fail with SQLException instead of execution of proper procedure with null argument.
Well, it is not that often, however, it might be an issue since that is typical use for "argument overloading".

Case 2)

Performance issue: if type of bind value changes over time (e.g. setObject(1, (Integer)42), and later setObject(1, null)), database has to replan the query or split the batch, etc.

vlsi · 2016-03-10T20:09:14Z

which is fine if we'll get better performance out of setObject(int, Object, int)

I'm not sure how struct/array/SQLData types are supported, if supported at all.
In those cases, setNull(int parameterIndex, int sqlType, String typeName) would be required to propagate sql typeName.

joshelser · 2016-03-10T20:13:55Z

I do not track 100% of Calcite development.

That's why I sent you a link to the code :). We're doing the same thing in other areas of the code. Thanks for bringing it up.

This sounds to me like a limitation of how Avatica is handling types in PreparedStatements. Can you write this up in a JIRA issue? I can try to do it, but I don't want to misrepresent the issues (as I apparently don't fully understand the nitty-gritty details of the problem). If not, I can try copy-pasting what you've already typed up here. I just want to make sure that we don't lose the fact that we should try to address this.

vlsi · 2016-03-10T20:13:59Z

should just make sure that we make that change everywhere

It might be just fine to add some // TODO: fix before 2018-18-18 comment (or create a jira ticket) in order to keep that under control.

I just wanted to limit the danger. If it is possible to implement that properly already, it would be better to go that way. At later stages it could be harder to fix the issue due to "backward compatibility", etc, etc.

Ok. I think you've got my point.

julianhyde · 2016-03-11T18:01:11Z

Method javadoc needs to use third person throughout. Sorry to be pedantic.

joshelser · 2016-03-11T18:07:11Z

Method javadoc needs to use third person throughout. Sorry to be pedantic.

No worries! I appreciate you looking. I'll give it another pass over before it gets merged (I had grabbed a few yesterday).

This commit provides an implementation for: * Statement.addBatch(String) * PreparedStatement.addBatch() * PreparedStatement.executeBatch() The implementation is fairly straightforward except for the addition of a new server interface: ProtobufMeta. This is a new interface which the Meta implementation can choose to also implement to provide a "native" implementation on top of Protobuf objects instead of the Avatica POJOs. During the investigations Avatica performance pre-1.7.0, it was found that converting protobufs to POJOs was a very hot code path. This short-circuit helps us avoid extra objects on the heap and computation to create them in what should be a very hot code path for write-workloads.

joshelser · 2016-03-23T19:16:17Z

Created https://issues.apache.org/jira/browse/CALCITE-1164 for the discussion with Vladimir earlier.

vlsi reviewed Mar 10, 2016
View reviewed changes

joshelser force-pushed the 1128-batch-updates branch from 0fa2d87 to 3635151 Compare March 10, 2016 19:16

joshelser added 5 commits March 23, 2016 14:18

Clarify a javadoc comment on ExecuteBatchResult

548826e

Give a better error message on failed setObject() calls

df29e22

Add some tests for inserting null values (batch and non)

73f2890

Third-person javadoc updates

7e9a1e4

TypedValue protobuf updates.

e83aca0

joshelser force-pushed the 1128-batch-updates branch from e8bf2b4 to e83aca0 Compare March 24, 2016 02:03

Some minor nits

0d0f553

joshelser added 3 commits March 23, 2016 22:20

Remove the new API in calcite core.

2cdebe6

Another checkstyle.

d65f3ee

Silly copy paste

2d828e4

asfgit closed this in 5dfa3f1 Mar 24, 2016

joshelser deleted the 1128-batch-updates branch March 24, 2016 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CALCITE-1128] Implement JDBC batch update methods in remote driver #209

[CALCITE-1128] Implement JDBC batch update methods in remote driver #209

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi Mar 10, 2016

joshelser Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

vlsi commented Mar 10, 2016

joshelser commented Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

vlsi commented Mar 10, 2016

vlsi commented Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

julianhyde commented Mar 11, 2016

joshelser commented Mar 11, 2016

joshelser commented Mar 23, 2016

[CALCITE-1128] Implement JDBC batch update methods in remote driver #209

[CALCITE-1128] Implement JDBC batch update methods in remote driver #209

Conversation

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi Mar 10, 2016

Choose a reason for hiding this comment

joshelser Mar 10, 2016

Choose a reason for hiding this comment

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

vlsi commented Mar 10, 2016

joshelser commented Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

vlsi commented Mar 10, 2016

vlsi commented Mar 10, 2016

joshelser commented Mar 10, 2016

vlsi commented Mar 10, 2016

julianhyde commented Mar 11, 2016

joshelser commented Mar 11, 2016

joshelser commented Mar 23, 2016