-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots #10983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| public boolean hasNext() { | ||
| return nextBatch != null; | ||
| try { | ||
| return !resultSet.isAfterLast(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this guaranteed to be implemented by most JDBC providers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
isAfterLast is a public API of interface java.sql.ResultSet (https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#isAfterLast()), so it is supposed to be supported by each legitimate implementation.
| VectorSchemaRoot returned = nextBatch; | ||
| try { | ||
| load(createVectorSchemaRoot()); | ||
| VectorSchemaRoot ret = config.isReuseVectorSchemaRoot() ? nextBatch : createVectorSchemaRoot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to factor this out to a method that takes config? instead of repeating ternary logic in a few places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the code, and find the ternary logic is used twice. However, the logic in the two places are the opposite:
- In
initialize(), a new vector schema root is created, if the resue flag is enabled. - In
next(), a new vector schema root is created, if the reuse flag is diabled.
So there is no common logic here?
| final int targetRows = 600000; | ||
| ResultSet rs = new FakeResultSet(targetRows); | ||
| try (ArrowVectorIterator iter = JdbcToArrow.sqlToArrowVectorIterator(rs, allocator)) { | ||
| JdbcToArrowConfig config = new JdbcToArrowConfigBuilder(allocator, JdbcToArrowUtils.getUtcCalendar(), false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add parameter doc for the new false literaal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Parameter doc added for this line of code, and also added for some other places.
emkornfield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a test that asserts the VectorSchemaRoot is actually reused when the value set is true?
Good suggestion. I've added |
|
+1 thank you. |
According to the current design of the JDBC adapter, it is not possible to reuse the vector schema roots. That is, a new vector schema root is created and released for each batch. This can cause performance problems, because in many scenarios, the client code only reads data in vector schema root. So the vector schema roots can be reused in the following cycle: populate data -> client use data -> populate data -> ... The current design has another problem. For most times, it has two alternating vector schema roots in memory, causing a large waste of memory, especially for large batches. We solve both problems by providing a flag in the config, which allows the user to reuse the vector shema roots. Closes apache#10983 from liyafan82/fly_0824_jd Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
According to the current design of the JDBC adapter, it is not possible to reuse the vector schema roots. That is, a new vector schema root is created and released for each batch. This can cause performance problems, because in many scenarios, the client code only reads data in vector schema root. So the vector schema roots can be reused in the following cycle: populate data -> client use data -> populate data -> ... The current design has another problem. For most times, it has two alternating vector schema roots in memory, causing a large waste of memory, especially for large batches. We solve both problems by providing a flag in the config, which allows the user to reuse the vector shema roots. Closes apache#10983 from liyafan82/fly_0824_jd Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
According to the current design of the JDBC adapter, it is not possible to reuse the vector schema roots. That is, a new vector schema root is created and released for each batch.
This can cause performance problems, because in many scenarios, the client code only reads data in vector schema root. So the vector schema roots can be reused in the following cycle: populate data -> client use data -> populate data -> ...
The current design has another problem. For most times, it has two alternating vector schema roots in memory, causing a large waste of memory, especially for large batches.
We solve both problems by providing a flag in the config, which allows the user to reuse the vector shema roots.