Add OVERLAY scalar function for SQL-standard substring replacement#18790
Conversation
Implements the SQL standard TRANSLATE(string, from, to) function, which
replaces each character in 'from' with the corresponding character in
'to'. Characters beyond the length of 'to' are deleted from the output.
This matches the behavior in PostgreSQL, Oracle, and Trino.
translate('hello', 'aeiou', 'AEIOU') → 'hEllO'
translate('abc', 'abc', 'xy') → 'xy' (c deleted)
translate('abc', 'abc', '') → '' (all deleted)
Adds unit tests covering basic replacement, deletion, no-op (no match),
empty inputs, and duplicate characters in 'from'.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements OVERLAY(string PLACING replacement FROM start [FOR length]),
which replaces `length` characters of `string` starting at 1-based
position `start` with `replacement`.
When `length` is omitted it defaults to the length of `replacement`,
so replaced and inserted substrings are the same width — matching the
SQL standard and the behaviour in PostgreSQL, Trino, and DuckDB.
overlay('hello world', 'there', 7) → 'hello there'
overlay('abcdef', 'XY', 3, 0) → 'abXYcdef' (insert)
overlay('abcdef', 'XY', 3, 4) → 'abXYf' (delete more)
overlay('abcdef', '', 3, 2) → 'abef' (delete only)
Two overloads are registered:
overlay(str, replacement, start) -- length defaults to len(replacement)
overlay(str, replacement, start, length) -- explicit length
Adds unit tests covering basic replacement, insertion (length=0),
out-of-range start/length clamping, empty inputs, and full-string replacement.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
overlay("abcdef", "XY", 3, 4) deletes positions 3-6 (cdef), leaving
nothing after position 6, so the result is "abXY" not "abXYf".
Also correct the same example in the Javadoc.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18790 +/- ##
==========================================
Coverage 64.78% 64.79%
Complexity 1309 1309
==========================================
Files 3380 3380
Lines 209540 209646 +106
Branches 32797 32825 +28
==========================================
+ Hits 135751 135838 +87
- Misses 62863 62873 +10
- Partials 10926 10935 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@xiangfu0 @Jackie-Jiang all CI checks pass. Please review when you get a chance. |
There was a problem hiding this comment.
Pull request overview
Adds SQL-standard string manipulation support to Pinot’s scalar function library by introducing OVERLAY (substring replacement) and TRANSLATE (character mapping) implementations in pinot-common, along with unit tests.
Changes:
- Added
translate(string, from, to)scalar function. - Added
overlay(string, replacement, start)andoverlay(string, replacement, start, length)scalar function overloads with clamping behavior. - Added unit tests covering
translateandoverlaybehavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java | Adds new translate() and overlay() scalar functions and Javadocs. |
| pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java | Adds unit tests for the newly introduced translate() and overlay() functions. |
| @ScalarFunction | ||
| public static String translate(String input, String from, String to) { | ||
| if (input.isEmpty() || from.isEmpty()) { |
There was a problem hiding this comment.
TRANSLATE was already merged via PR #18779. This PR only adds OVERLAY — it depends on that branch in the history but the final diff vs master will only contain the OVERLAY additions.
| // start beyond end: appends replacement | ||
| assertEquals(StringFunctions.overlay("abc", "XY", 10), "abcXY"); | ||
|
|
||
| // length clamped: cannot delete past end of string | ||
| assertEquals(StringFunctions.overlay("abc", "Z", 2, 100), "aZ"); |
There was a problem hiding this comment.
Good catch. Added assertions for start <= 0 (clamped to position 1) and length < 0 (clamped to 0, pure insertion) in the latest commit.
| // Delete more than replacement length: replacement is shorter than deleted span | ||
| // FROM 3 FOR 4 removes positions 3-6 (cdef), nothing remains after position 6 | ||
| assertEquals(StringFunctions.overlay("abcdef", "XY", 3, 4), "abXY"); |
There was a problem hiding this comment.
Fixed in the latest commit — the test now correctly asserts abXY and the PR description example has been updated to match.
|
Opened the matching docs PR for this change: pinot-contrib/pinot-docs#876 |
## Summary - document the `OVERLAY` string function in the string function reference - add `OVERLAY` to the SQL function index with its Pinot call syntax - describe the implementation-backed semantics for 1-based positions, omitted `length`, insertion, append, and clamping behavior ## Source cross-check - `pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java` - `pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java` ## Validation - `git diff --check` ## Upstream - apache/pinot#18790
…pache#18790) * Add TRANSLATE scalar function for character-level string substitution Implements the SQL standard TRANSLATE(string, from, to) function, which replaces each character in 'from' with the corresponding character in 'to'. Characters beyond the length of 'to' are deleted from the output. This matches the behavior in PostgreSQL, Oracle, and Trino. translate('hello', 'aeiou', 'AEIOU') → 'hEllO' translate('abc', 'abc', 'xy') → 'xy' (c deleted) translate('abc', 'abc', '') → '' (all deleted) Adds unit tests covering basic replacement, deletion, no-op (no match), empty inputs, and duplicate characters in 'from'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Add OVERLAY scalar function for SQL-standard substring replacement Implements OVERLAY(string PLACING replacement FROM start [FOR length]), which replaces `length` characters of `string` starting at 1-based position `start` with `replacement`. When `length` is omitted it defaults to the length of `replacement`, so replaced and inserted substrings are the same width — matching the SQL standard and the behaviour in PostgreSQL, Trino, and DuckDB. overlay('hello world', 'there', 7) → 'hello there' overlay('abcdef', 'XY', 3, 0) → 'abXYcdef' (insert) overlay('abcdef', 'XY', 3, 4) → 'abXYf' (delete more) overlay('abcdef', '', 3, 2) → 'abef' (delete only) Two overloads are registered: overlay(str, replacement, start) -- length defaults to len(replacement) overlay(str, replacement, start, length) -- explicit length Adds unit tests covering basic replacement, insertion (length=0), out-of-range start/length clamping, empty inputs, and full-string replacement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix wrong test expectation in testOverlay overlay("abcdef", "XY", 3, 4) deletes positions 3-6 (cdef), leaving nothing after position 6, so the result is "abXY" not "abXYf". Also correct the same example in the Javadoc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Implements the SQL standard `OVERLAY(string PLACING replacement FROM start [FOR length])` scalar function.
```sql
SELECT OVERLAY('hello world' PLACING 'there' FROM 7) -- 'hello there'
SELECT OVERLAY('abcdef' PLACING 'XY' FROM 3 FOR 0) -- 'abXYcdef'
SELECT OVERLAY('abcdef' PLACING 'XY' FROM 3 FOR 4) -- 'abXY'
```
Follows the same pattern as the recently added `TRANSLATE` function.
Test plan