This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Don't use ChainedVector as DictEncoding data array unless necessary (#…
…110) Fixes #109. The issue here was when reading arrow record batches with dict encoded columns, we eagerly used `ChainedVector` for the underlying array backing the `DictEncoding` in case there were subsequent record batches that added additional elements to the dict encoding. This is too eager though, since it's probably common, like for "feather" files, where the dict encoding values are always known and provided in the first record batch. In fact, several language implementations don't even support these kind of "delta" dict updates in subsequent record batches. This PR, therefore, uses a regular array for the dict encoding backing for the first record batch, and only promotes to a ChainedVector if we happen to get a delta update.
- Loading branch information
Showing 1 changed file with 7 additions and 2 deletions.