-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GH-18547: [Java] Support re-emitting dictionaries in ArrowStreamWriter (
#35920) ### Rationale for this change This allows writing IPC streams where dictionary values change between record batches. ### What changes are included in this PR? * Add new abstract `void ensureDictionariesWritten(DictionaryProvider provider, Set<Long> dictionaryIdsUsed)` to the base `ArrowWriter` class * Move existing logic that only writes dictionaries once into the `ArrowFileWriter` class * Implement replacement dictionary writing in `ArrowStreamWriter` by keeping copies of previously written dictionaries ### Are these changes tested? Yes, I've added a new unit test for this ### Are there any user-facing changes? Yes, `ArrowStreamWriter` will now write replacement dictionaries when dictionary values change between batches. **This PR includes breaking changes to public APIs.** `ArrowWriter` has a new abstract `ensureDictionariesWritten` method. This will only affect users directly inheriting from `ArrowWriter` rather than `ArrowFileWriter` or `ArrowStreamWriter`. There's also a behaviour change to `ArrowWriter`, where previously dictionaries were read from a `DictionaryProvider` on construction, but this is now delayed until the first batch is written. * Closes: #18547 Authored-by: Adam Reeve <adreeve@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>
- Loading branch information
Showing
5 changed files
with
179 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters