ARROW-6601: [Java] Improve JDBC adapter performance & add benchmark#5472
ARROW-6601: [Java] Improve JDBC adapter performance & add benchmark#5472tianchen92 wants to merge 1 commit intoapache:masterfrom
Conversation
|
cc @emkornfield |
Codecov Report
@@ Coverage Diff @@
## master #5472 +/- ##
==========================================
+ Coverage 88.64% 89.69% +1.05%
==========================================
Files 958 708 -250
Lines 127522 108173 -19349
Branches 1498 0 -1498
==========================================
- Hits 113039 97029 -16010
+ Misses 14118 11144 -2974
+ Partials 365 0 -365Continue to review full report at Codecov.
|
| */ | ||
| public BinaryConsumer(VarBinaryVector vector, int index) { | ||
| if (vector != null) { | ||
| vector.allocateNewSafe(); |
There was a problem hiding this comment.
can you clarify why this is necessary?
There was a problem hiding this comment.
Since we remove JdbcToArrowUtils#allocateVectors, the vector is initialized with capacity 0, and when we invoke vector.getOffsetBuffer.get(index *4) it will throw Exception.
And the vector passed in may be null from iterator API, so add a null check also.
|
even using setSafe it is surprising that preallocating to the desired size doesn't help, that is an interesting result. |
|
+1, thank you. |
Related to [ARROW-6601](https://issues.apache.org/jira/browse/ARROW-6601). Add a performance test as well to get a baseline number, to avoid performance regression when we change related code. When working with Jdbc adapter benchmark, I found the jmh result is very worse (about 1680000 ns/op), and I finally found that when we initialize a VectorSchemaRoot, JdbcToArrowUtils#allocateVectors is invoked which is time consuming, and this is not necessary since we use setSafe API in consumers. After remove this, the jmh result is about 2000 ns/op (3 coulumns with valueCount = 3000). I think this one should merged into 0.15 release. Closes apache#5472 from tianchen92/ARROW-6601 and squashes the following commits: fa97680 <tianchen> Improve JDBC adapter performance & add benchmark Authored-by: tianchen <niki.lj@alibaba-inc.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Related to ARROW-6601.
Add a performance test as well to get a baseline number, to avoid performance regression when we change related code.
When working with Jdbc adapter benchmark, I found the jmh result is very worse (about 1680000 ns/op), and I finally found that when we initialize a VectorSchemaRoot, JdbcToArrowUtils#allocateVectors is invoked which is time consuming, and this is not necessary since we use setSafe API in consumers. After remove this, the jmh result is about 2000 ns/op (3 coulumns with valueCount = 3000).
I think this one should merged into 0.15 release.