feat(javascript): add support for streaming ingest#4125
Conversation
There was a problem hiding this comment.
Is a brand-new IPC reader for every message going to handle things like dictionaries properly? (I guess it would merely be very redundant, we'd send the dictionaries on every message?)
In theory an Arrow implementation could expose the underlying IPC message so you can turn a stream of data into a stream of messages (instead of a stream of serialized files/streams), but I'm not sure if arrow-js + arrow-rs can do this
That said as long as it works we can always improve later...
There was a problem hiding this comment.
Yeah I explored that and arrow-js currently just doesn't expose a clean API to do it. It would add a chunk of complexity for minimal benefit currently.
At this stage primarily prioritizing functional completeness and then we will continue to improve with benchmarking and performance optimization
There was a problem hiding this comment.
SGTM. Do you think it's worth filing an issue in arrow-js for down the line?
Adds streaming ingest support to the Node.js ADBC driver manager.
AdbcConnection:ingestStream(tableName, reader, options?)— streams aRecordBatchReaderinto a table, handling the full ingest lifecycleAdbcStatement:bindStream(reader)— streams aRecordBatchReaderas bound parameters and leaves execution to the callerImplementation:
ingestStreamuses a Rustmpscchannel to bridge the JS push loop and the ADBCexecute_updatecall running on the thread pool. JS serializes each batch to Arrow IPC and sends it to theChannelBatchReaderon the Rust side which deserializes it. The channel is unbounded to avoid blocking the JS event loop.Test Plan
Closes #4117