Is your feature request related to a problem or challenge?
We have StringViewArrayBuilder and StringArrayBuilder, which are optimized versions of corresponding string builders in Arrow. However, our versions are only used in two places; we use the Arrow versions much more often, partly because our versions have a very narrow API (you can only pass a ColumnValueRef). That means we can't use our builder versions in situations where the caller has transformed the value of the column, which is pretty common, roughly in all these places:
| `string/common.rs` (case_conversion Utf8View path) | 359-372 | `to_upper`/`to_lower` for Utf8View |
| `unicode/initcap.rs` | 166-172, 238-244 | Non-ASCII `initcap` for Utf8/LargeUtf8 and Utf8View |
| `unicode/reverse.rs` | 135-153 | `reverse` for all string types |
| `unicode/translate.rs` | 225-263, 319-357 | `translate` both all-array and scalar-optimized paths |
| `unicode/substrindex.rs` | 183-241 | `substr_index` all-array path |
| `string/replace.rs` | 166-181, 194-209 | `replace` for Utf8View and generic string |
| `unicode/lpad.rs` | 237, 294, 454, 510 | Various `lpad` code paths |
| `unicode/rpad.rs` | 238, 296, 454, 511 | Various `rpad` code paths |
| `datetime/to_char.rs` | 209-210, 252-253 | `to_char` scalar and array format paths |
| `string/repeat.rs` | 329 | `repeat` for generic string |
If we extended the API of our builders, we could use them in all/most of these places, which would yield a nice perf win (e.g., because we'll be doing the NULL computation in bulk, not per-row). We'd need to add something like append_value(&str), write_str() / write_char() / finish_value(), and append_empty() (placeholder for NULLs).
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
We have
StringViewArrayBuilderandStringArrayBuilder, which are optimized versions of corresponding string builders in Arrow. However, our versions are only used in two places; we use the Arrow versions much more often, partly because our versions have a very narrow API (you can only pass aColumnValueRef). That means we can't use our builder versions in situations where the caller has transformed the value of the column, which is pretty common, roughly in all these places:If we extended the API of our builders, we could use them in all/most of these places, which would yield a nice perf win (e.g., because we'll be doing the NULL computation in bulk, not per-row). We'd need to add something like
append_value(&str),write_str() / write_char() / finish_value(), andappend_empty()(placeholder for NULLs).Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response