Skip to content

ARROW-6601: [Java] Improve JDBC adapter performance & add benchmark#5472

Closed
tianchen92 wants to merge 1 commit intoapache:masterfrom
tianchen92:ARROW-6601
Closed

ARROW-6601: [Java] Improve JDBC adapter performance & add benchmark#5472
tianchen92 wants to merge 1 commit intoapache:masterfrom
tianchen92:ARROW-6601

Conversation

@tianchen92
Copy link
Copy Markdown
Contributor

Related to ARROW-6601.

Add a performance test as well to get a baseline number, to avoid performance regression when we change related code.

When working with Jdbc adapter benchmark, I found the jmh result is very worse (about 1680000 ns/op), and I finally found that when we initialize a VectorSchemaRoot, JdbcToArrowUtils#allocateVectors is invoked which is time consuming, and this is not necessary since we use setSafe API in consumers. After remove this, the jmh result is about 2000 ns/op (3 coulumns with valueCount = 3000).

I think this one should merged into 0.15 release.

@tianchen92
Copy link
Copy Markdown
Contributor Author

cc @emkornfield

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #5472 into master will increase coverage by 1.05%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5472      +/-   ##
==========================================
+ Coverage   88.64%   89.69%   +1.05%     
==========================================
  Files         958      708     -250     
  Lines      127522   108173   -19349     
  Branches     1498        0    -1498     
==========================================
- Hits       113039    97029   -16010     
+ Misses      14118    11144    -2974     
+ Partials      365        0     -365
Impacted Files Coverage Δ
cpp/src/plasma/thirdparty/ae/ae.c 70.75% <0%> (-0.95%) ⬇️
r/src/recordbatch.cpp
go/arrow/math/uint64_amd64.go
r/R/filesystem.R
r/R/list.R
r/src/symbols.cpp
r/src/array_to_vector.cpp
go/arrow/ipc/file_reader.go
js/src/builder/index.ts
r/src/arrow_types.h
... and 242 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd8f628...fa97680. Read the comment docs.

*/
public BinaryConsumer(VarBinaryVector vector, int index) {
if (vector != null) {
vector.allocateNewSafe();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify why this is necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we remove JdbcToArrowUtils#allocateVectors, the vector is initialized with capacity 0, and when we invoke vector.getOffsetBuffer.get(index *4) it will throw Exception.
And the vector passed in may be null from iterator API, so add a null check also.

@emkornfield
Copy link
Copy Markdown
Contributor

even using setSafe it is surprising that preallocating to the desired size doesn't help, that is an interesting result.

@emkornfield
Copy link
Copy Markdown
Contributor

+1, thank you.

pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
Related to [ARROW-6601](https://issues.apache.org/jira/browse/ARROW-6601).

Add a performance test as well to get a baseline number, to avoid performance regression when we change related code.

When working with Jdbc adapter benchmark, I found the jmh result is very worse (about 1680000 ns/op), and I finally found that when we initialize a VectorSchemaRoot,  JdbcToArrowUtils#allocateVectors is invoked which is time consuming, and this is not necessary since we use setSafe API in consumers. After remove this, the jmh result is about 2000 ns/op (3 coulumns with valueCount = 3000).

I think this one should merged into 0.15 release.

Closes apache#5472 from tianchen92/ARROW-6601 and squashes the following commits:

fa97680 <tianchen>  Improve JDBC adapter performance & add benchmark

Authored-by: tianchen <niki.lj@alibaba-inc.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants