Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-1128] Implement JDBC batch update methods in remote driver #209

Closed
wants to merge 10 commits into from

Conversation

joshelser
Copy link
Member

This commit provides an implementation for:

  • Statement.addBatch(String)
  • PreparedStatement.addBatch()
  • PreparedStatement.executeBatch()

The implementation is fairly straightforward except for the addition
of a new server interface: ProtobufMeta. This is a new interface which
the Meta implementation can choose to also implement to provide a "native"
implementation on top of Protobuf objects instead of the Avatica POJOs.

During the investigations Avatica performance pre-1.7.0, it was found
that converting protobufs to POJOs was a very hot code path. This short-circuit
helps us avoid extra objects on the heap and computation to create them
in what should be a very hot code path for write-workloads.

@vlsi
Copy link
Contributor

vlsi commented Mar 10, 2016

I've scanned through the PR and it looks OK.

@joshelser
Copy link
Member Author

Thanks for the quick glance @vlsi !

int i = 1;
for (TypedValue value : batch) {
// Use the value and then increment
preparedStmt.setObject(i++, value.value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ever work for null values?

Proper setNull(index, sqltype, sqltypename) should be used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, I just copied what the existing prepared statement logic was doing. Let me write up a quick test.

@joshelser
Copy link
Member Author

Just pushed two more commits. One gives a better exception in JdbcMeta and the other is a test for setting nulls in JdbcMeta (in the batch and non-batch paths). Thanks against, Vladimir!

@vlsi
Copy link
Contributor

vlsi commented Mar 10, 2016

@joshelser, regarding null values, consider the following:
Case 1) Stored procedure proc(p_arg number), proc(p_arg varchar), etc
Which one would be called if using just setObject(1, null)?

Case 2) Consider a statement select id, name from entities where id=?.
Do you think execution plan would be different in case of setInt(1, 42) vs setString(1, "42")?
You might ask me something like "wait a minute, case 2 does not involve nulls". If that is the case, think twice )

@joshelser
Copy link
Member Author

Uhh, wrong user mention there, Vladimir. I'm @joshelser.

Which one would be called if using just setObject(1, null)?

But the types of the parameters in the statement are already defined at this point. We're just setting binding a value -- this isn't defining the type of that value.

Do you think execution plan would be different in case of setInt(1, 42) vs setString(1, "42")?
You might ask me something like "wait a minute, case 2 does not involve nulls". If that is the case, think twice )

I really don't follow what you're trying to tell me. Can you phrase your concern by not asking me a question?

@vlsi
Copy link
Contributor

vlsi commented Mar 10, 2016

But the types of the parameters in the statement are already defined at this point. We're just setting binding a value -- this isn't defining the type of that value.

Can you please pin-point where the type is defined?
Do you mean this setObject(.., null) will eventually be translated to proper setNull(..., sqltype)?

@vlsi
Copy link
Contributor

vlsi commented Mar 10, 2016

Can you phrase your concern by not asking me a question?

My concern is this setObject: https://github.com/apache/calcite/pull/209/files#diff-39e8b3d56f84afa3fbffe40bed7cbbc2R895 does not convey data type properly.

Failing to specify data type causes performance degradation for major RDBMS (e.g. Oracle, PostgreSQL). For instance, PostgreSQL cannot efficiently batch statements if data type of a column changes from row to row.

@joshelser
Copy link
Member Author

Can you please pin-point where the type is defined?

I thought that the PreparedStatement is going to know when this value can be coerced into whatever the necessary value at runtime is. Maybe I'm assuming too much on what will happen behind the scenes.

@joshelser
Copy link
Member Author

Failing to specify data type causes performance degradation for major RDBMS (e.g. Oracle, PostgreSQL). For instance, PostgreSQL cannot efficiently batch statements if data type of a column changes from row to row.

But is this case any different than what is already done in JdbcMeta? https://github.com/apache/calcite/blob/master/avatica/server/src/main/java/org/apache/calcite/avatica/jdbc/JdbcMeta.java#L795-L800 I can understand where you're coming from, but this seems like a larger issue than just my changeset.

@vlsi
Copy link
Contributor

vlsi commented Mar 10, 2016

value can be coerced into whatever the necessary value at runtime is

That is a typical JDBC user's pitfall.
All java's nulls are equals. You cannot tell the difference between (Integer) null vs (String) null.
That is why setNull is really important.

@joshelser
Copy link
Member Author

this seems like a larger issue than just my changeset

.. which is fine if we'll get better performance out of setObject(int, Object, int). We should just make sure that we make that change everywhere and add tests if it is functionally incorrect or ambiguous (I'm not sure anymore if you're saying it's incorrect or just not as performant as it could be).

@vlsi
Copy link
Contributor

vlsi commented Mar 10, 2016

But is this case any different than what is already done in JdbcMeta?

  1. I do not track 100% of Calcite development.
  2. I'm known to bug people, often for no reason. Sorry about that.
  3. I've been bitten by those setObject(, null) bugs multiple times.
  4. I had some spare 20 minutes to go through the changeset.

That is why I just highlighted the issue here.

@vlsi
Copy link
Contributor

vlsi commented Mar 10, 2016

(I'm not sure anymore if you're saying it's incorrect or just not as performant as it could be).

Both apply.

Case 1) Stored procedure proc(p_arg number), proc(p_arg varchar), etc

In such a case, "prepared statement" would fail with null argument since database would have no way to pick the proper procedure. That is functional issue. The query would fail with SQLException instead of execution of proper procedure with null argument.
Well, it is not that often, however, it might be an issue since that is typical use for "argument overloading".

Case 2)

Performance issue: if type of bind value changes over time (e.g. setObject(1, (Integer)42), and later setObject(1, null)), database has to replan the query or split the batch, etc.

@vlsi
Copy link
Contributor

vlsi commented Mar 10, 2016

which is fine if we'll get better performance out of setObject(int, Object, int)

I'm not sure how struct/array/SQLData types are supported, if supported at all.
In those cases, setNull(int parameterIndex, int sqlType, String typeName) would be required to propagate sql typeName.

@joshelser
Copy link
Member Author

  1. I do not track 100% of Calcite development.

That's why I sent you a link to the code :). We're doing the same thing in other areas of the code. Thanks for bringing it up.

This sounds to me like a limitation of how Avatica is handling types in PreparedStatements. Can you write this up in a JIRA issue? I can try to do it, but I don't want to misrepresent the issues (as I apparently don't fully understand the nitty-gritty details of the problem). If not, I can try copy-pasting what you've already typed up here. I just want to make sure that we don't lose the fact that we should try to address this.

@vlsi
Copy link
Contributor

vlsi commented Mar 10, 2016

should just make sure that we make that change everywhere

It might be just fine to add some // TODO: fix before 2018-18-18 comment (or create a jira ticket) in order to keep that under control.

I just wanted to limit the danger. If it is possible to implement that properly already, it would be better to go that way. At later stages it could be harder to fix the issue due to "backward compatibility", etc, etc.

Ok. I think you've got my point.

@julianhyde
Copy link
Contributor

Method javadoc needs to use third person throughout. Sorry to be pedantic.

@joshelser
Copy link
Member Author

Method javadoc needs to use third person throughout. Sorry to be pedantic.

No worries! I appreciate you looking. I'll give it another pass over before it gets merged (I had grabbed a few yesterday).

This commit provides an implementation for:

* Statement.addBatch(String)
* PreparedStatement.addBatch()
* PreparedStatement.executeBatch()

The implementation is fairly straightforward except for the addition
of a new server interface: ProtobufMeta. This is a new interface which
the Meta implementation can choose to also implement to provide a "native"
implementation on top of Protobuf objects instead of the Avatica POJOs.

During the investigations Avatica performance pre-1.7.0, it was found
that converting protobufs to POJOs was a very hot code path. This short-circuit
helps us avoid extra objects on the heap and computation to create them
in what should be a very hot code path for write-workloads.
@joshelser
Copy link
Member Author

Created https://issues.apache.org/jira/browse/CALCITE-1164 for the discussion with Vladimir earlier.

@asfgit asfgit closed this in 5dfa3f1 Mar 24, 2016
@joshelser joshelser deleted the 1128-batch-updates branch March 24, 2016 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants