Skip to content

Generalize StringViewArrayBuilder / StringArrayBuilder, use more broadly #21539

@neilconway

Description

@neilconway

Is your feature request related to a problem or challenge?

We have StringViewArrayBuilder and StringArrayBuilder, which are optimized versions of corresponding string builders in Arrow. However, our versions are only used in two places; we use the Arrow versions much more often, partly because our versions have a very narrow API (you can only pass a ColumnValueRef). That means we can't use our builder versions in situations where the caller has transformed the value of the column, which is pretty common, roughly in all these places:

  | `string/common.rs` (case_conversion Utf8View path) | 359-372 | `to_upper`/`to_lower` for Utf8View |
  | `unicode/initcap.rs` | 166-172, 238-244 | Non-ASCII `initcap` for Utf8/LargeUtf8 and Utf8View |
  | `unicode/reverse.rs` | 135-153 | `reverse` for all string types |
  | `unicode/translate.rs` | 225-263, 319-357 | `translate` both all-array and scalar-optimized paths |
  | `unicode/substrindex.rs` | 183-241 | `substr_index` all-array path |
  | `string/replace.rs` | 166-181, 194-209 | `replace` for Utf8View and generic string |
  | `unicode/lpad.rs` | 237, 294, 454, 510 | Various `lpad` code paths |
  | `unicode/rpad.rs` | 238, 296, 454, 511 | Various `rpad` code paths |
  | `datetime/to_char.rs` | 209-210, 252-253 | `to_char` scalar and array format paths |
  | `string/repeat.rs` | 329 | `repeat` for generic string |

If we extended the API of our builders, we could use them in all/most of these places, which would yield a nice perf win (e.g., because we'll be doing the NULL computation in bulk, not per-row). We'd need to add something like append_value(&str), write_str() / write_char() / finish_value(), and append_empty() (placeholder for NULLs).

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions