ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots #10983

liyafan82 · 2021-08-24T12:09:28Z

According to the current design of the JDBC adapter, it is not possible to reuse the vector schema roots. That is, a new vector schema root is created and released for each batch.

This can cause performance problems, because in many scenarios, the client code only reads data in vector schema root. So the vector schema roots can be reused in the following cycle: populate data -> client use data -> populate data -> ...

The current design has another problem. For most times, it has two alternating vector schema roots in memory, causing a large waste of memory, especially for large batches.

We solve both problems by providing a flag in the config, which allows the user to reuse the vector shema roots.

github-actions · 2021-08-24T12:09:56Z

https://issues.apache.org/jira/browse/ARROW-13733

emkornfield · 2021-09-02T04:18:35Z

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java

  public boolean hasNext() {
-    return nextBatch != null;
+    try {
+      return !resultSet.isAfterLast();


is this guaranteed to be implemented by most JDBC providers?

I think so.
isAfterLast is a public API of interface java.sql.ResultSet (https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#isAfterLast()), so it is supposed to be supported by each legitimate implementation.

emkornfield · 2021-09-02T04:19:41Z

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java

-    VectorSchemaRoot returned = nextBatch;
    try {
-      load(createVectorSchemaRoot());
+      VectorSchemaRoot ret = config.isReuseVectorSchemaRoot() ? nextBatch : createVectorSchemaRoot();


does it make sense to factor this out to a method that takes config? instead of repeating ternary logic in a few places?

I checked the code, and find the ternary logic is used twice. However, the logic in the two places are the opposite:

In initialize(), a new vector schema root is created, if the resue flag is enabled.

In next(), a new vector schema root is created, if the reuse flag is diabled.

So there is no common logic here?

emkornfield · 2021-09-02T04:22:24Z

java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/JdbcToArrowTest.java

    final int targetRows = 600000;
    ResultSet rs = new FakeResultSet(targetRows);
-    try (ArrowVectorIterator iter = JdbcToArrow.sqlToArrowVectorIterator(rs, allocator)) {
+    JdbcToArrowConfig config = new JdbcToArrowConfigBuilder(allocator, JdbcToArrowUtils.getUtcCalendar(), false)


please add parameter doc for the new false literaal.

Sounds good. Parameter doc added for this line of code, and also added for some other places.

emkornfield

Should there be a test that asserts the VectorSchemaRoot is actually reused when the value set is true?

liyafan82 · 2021-09-06T04:25:13Z

Should there be a test that asserts the VectorSchemaRoot is actually reused when the value set is true?

Good suggestion. I've added JdbcToArrowVectorIteratorTest#testVectorSchemaRootReuse for this.

emkornfield · 2021-09-12T21:15:43Z

+1 thank you.

According to the current design of the JDBC adapter, it is not possible to reuse the vector schema roots. That is, a new vector schema root is created and released for each batch. This can cause performance problems, because in many scenarios, the client code only reads data in vector schema root. So the vector schema roots can be reused in the following cycle: populate data -> client use data -> populate data -> ... The current design has another problem. For most times, it has two alternating vector schema roots in memory, causing a large waste of memory, especially for large batches. We solve both problems by providing a flag in the config, which allows the user to reuse the vector shema roots. Closes apache#10983 from liyafan82/fly_0824_jd Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>

ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

bdaaf51

github-actions bot added the Component: Java label Aug 24, 2021

liyafan82 requested a review from emkornfield August 31, 2021 07:01

emkornfield reviewed Sep 2, 2021

View reviewed changes

emkornfield requested changes Sep 2, 2021

View reviewed changes

ARROW-13733 [Java]: Resolve comments

ac5e859

emkornfield closed this in e8ab3ae Sep 12, 2021

asfimport mentioned this pull request Sep 12, 2021

[Java] Allow JDBC adapters to reuse vector schema roots #29366

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots #10983

ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots #10983

Uh oh!

liyafan82 commented Aug 24, 2021

Uh oh!

github-actions bot commented Aug 24, 2021

Uh oh!

emkornfield Sep 2, 2021

Uh oh!

liyafan82 Sep 6, 2021

Uh oh!

emkornfield Sep 2, 2021

Uh oh!

liyafan82 Sep 6, 2021

Uh oh!

emkornfield Sep 2, 2021

Uh oh!

liyafan82 Sep 6, 2021

Uh oh!

emkornfield left a comment

Uh oh!

liyafan82 commented Sep 6, 2021

Uh oh!

emkornfield commented Sep 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots #10983

ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots #10983

Uh oh!

Conversation

liyafan82 commented Aug 24, 2021

Uh oh!

github-actions bot commented Aug 24, 2021

Uh oh!

emkornfield Sep 2, 2021

Choose a reason for hiding this comment

Uh oh!

liyafan82 Sep 6, 2021

Choose a reason for hiding this comment

Uh oh!

emkornfield Sep 2, 2021

Choose a reason for hiding this comment

Uh oh!

liyafan82 Sep 6, 2021

Choose a reason for hiding this comment

Uh oh!

emkornfield Sep 2, 2021

Choose a reason for hiding this comment

Uh oh!

liyafan82 Sep 6, 2021

Choose a reason for hiding this comment

Uh oh!

emkornfield left a comment

Choose a reason for hiding this comment

Uh oh!

liyafan82 commented Sep 6, 2021

Uh oh!

emkornfield commented Sep 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants