Appender support for Apache Arrow or columnar data #3412

mharmer · 2022-04-11T22:16:35Z

I noticed there was a C-API for querying results as an arrow format (added in #1978), but I don't see any support currently in the c-api or the C or C++ appenders for bulk inserting arrow/columnar data into the database.

Currently I have several billion rows of data I would like to bulk insert that is already in columnar form in memory and the only interface that I'm aware of are the row-wise appenders. The performance of inserting into a single table is around 50,000 rows per second (on older hardware) - I'm assuming that this translation back-and-forth is likely a bottleneck.

It doesn't appear that the duckdb_data_chunk has support for this either.

Mytherin · 2022-04-12T06:27:10Z

You should be able to use duckdb_append_data_chunk to do batch/vectorized appends, which should indeed be much more efficient than the scalar functions.

mharmer · 2022-04-12T15:18:23Z

My mistake, the confusion came from looking at the header that was using the opaque type and I wasn't sure how to use it.

For future reference to myself and others:

It appears the implementation for duckdb_create_data_chunk actually returns a DataChunk*
The documentation for DataChunk isn't entirely clear that data can be written to it, the free functions documented all appear to be used for reading data (with the exception of duckdb_vector_assign_string_element).
Presumably the methods on the DataChunk object itself can be used to write various data types

mharmer closed this as completed Apr 12, 2022

mharmer mentioned this issue Apr 12, 2022

Insert support with data chunk or Arrow duckdb/duckdb-rs#46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appender support for Apache Arrow or columnar data #3412

Appender support for Apache Arrow or columnar data #3412

mharmer commented Apr 11, 2022

Mytherin commented Apr 12, 2022

mharmer commented Apr 12, 2022

Appender support for Apache Arrow or columnar data #3412

Appender support for Apache Arrow or columnar data #3412

Comments

mharmer commented Apr 11, 2022

Mytherin commented Apr 12, 2022

mharmer commented Apr 12, 2022