Skip to content

Add OVERLAY scalar function for SQL-standard substring replacement#18790

Merged
xiangfu0 merged 3 commits into
apache:masterfrom
Akanksha-kedia:feat-overlay-scalar-function
Jun 19, 2026
Merged

Add OVERLAY scalar function for SQL-standard substring replacement#18790
xiangfu0 merged 3 commits into
apache:masterfrom
Akanksha-kedia:feat-overlay-scalar-function

Conversation

@Akanksha-kedia

@Akanksha-kedia Akanksha-kedia commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements the SQL standard `OVERLAY(string PLACING replacement FROM start [FOR length])` scalar function.

  • When `length` is omitted it defaults to `len(replacement)`, matching PostgreSQL, Trino, and DuckDB
  • Two overloads registered: `overlay(str, replacement, start)` and `overlay(str, replacement, start, length)`
  • Handles edge cases: zero-length deletion (insertion), `start` beyond end (append), `length` clamped to remaining string, `start <= 0` clamped to 1, negative `length` clamped to 0

```sql
SELECT OVERLAY('hello world' PLACING 'there' FROM 7) -- 'hello there'
SELECT OVERLAY('abcdef' PLACING 'XY' FROM 3 FOR 0) -- 'abXYcdef'
SELECT OVERLAY('abcdef' PLACING 'XY' FROM 3 FOR 4) -- 'abXY'
```

Follows the same pattern as the recently added `TRANSLATE` function.

Test plan

  • `StringFunctionsTest#testOverlay` — 14 unit tests covering basic replacement, insertion, deletion, clamping (start<=0, negative length, length beyond end), empty inputs

Akanksha-kedia and others added 2 commits June 16, 2026 15:51
Implements the SQL standard TRANSLATE(string, from, to) function, which
replaces each character in 'from' with the corresponding character in
'to'. Characters beyond the length of 'to' are deleted from the output.
This matches the behavior in PostgreSQL, Oracle, and Trino.

  translate('hello', 'aeiou', 'AEIOU')  → 'hEllO'
  translate('abc',   'abc',   'xy')     → 'xy'    (c deleted)
  translate('abc',   'abc',   '')       → ''      (all deleted)

Adds unit tests covering basic replacement, deletion, no-op (no match),
empty inputs, and duplicate characters in 'from'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements OVERLAY(string PLACING replacement FROM start [FOR length]),
which replaces `length` characters of `string` starting at 1-based
position `start` with `replacement`.

When `length` is omitted it defaults to the length of `replacement`,
so replaced and inserted substrings are the same width — matching the
SQL standard and the behaviour in PostgreSQL, Trino, and DuckDB.

  overlay('hello world', 'there', 7)        → 'hello there'
  overlay('abcdef', 'XY', 3, 0)             → 'abXYcdef'  (insert)
  overlay('abcdef', 'XY', 3, 4)             → 'abXYf'     (delete more)
  overlay('abcdef', '', 3, 2)               → 'abef'       (delete only)

Two overloads are registered:
  overlay(str, replacement, start)             -- length defaults to len(replacement)
  overlay(str, replacement, start, length)     -- explicit length

Adds unit tests covering basic replacement, insertion (length=0),
out-of-range start/length clamping, empty inputs, and full-string replacement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Akanksha-kedia

Copy link
Copy Markdown
Contributor Author

@xiangfu0

overlay("abcdef", "XY", 3, 4) deletes positions 3-6 (cdef), leaving
nothing after position 6, so the result is "abXY" not "abXYf".
Also correct the same example in the Javadoc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov-commenter

codecov-commenter commented Jun 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.79%. Comparing base (6464736) to head (b6ed7ef).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##             master   #18790    +/-   ##
==========================================
  Coverage     64.78%   64.79%            
  Complexity     1309     1309            
==========================================
  Files          3380     3380            
  Lines        209540   209646   +106     
  Branches      32797    32825    +28     
==========================================
+ Hits         135751   135838    +87     
- Misses        62863    62873    +10     
- Partials      10926    10935     +9     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.79% <100.00%> (+<0.01%) ⬆️
temurin 64.79% <100.00%> (+<0.01%) ⬆️
unittests 64.79% <100.00%> (+<0.01%) ⬆️
unittests1 56.98% <100.00%> (+<0.01%) ⬆️
unittests2 37.26% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Akanksha-kedia

Copy link
Copy Markdown
Contributor Author

@xiangfu0 @Jackie-Jiang all CI checks pass. Please review when you get a chance.

@xiangfu0 xiangfu0 requested review from Copilot and xiangfu0 June 19, 2026 00:21
@xiangfu0 xiangfu0 added the functions Related to scalar or aggregation functions label Jun 19, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds SQL-standard string manipulation support to Pinot’s scalar function library by introducing OVERLAY (substring replacement) and TRANSLATE (character mapping) implementations in pinot-common, along with unit tests.

Changes:

  • Added translate(string, from, to) scalar function.
  • Added overlay(string, replacement, start) and overlay(string, replacement, start, length) scalar function overloads with clamping behavior.
  • Added unit tests covering translate and overlay behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java Adds new translate() and overlay() scalar functions and Javadocs.
pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java Adds unit tests for the newly introduced translate() and overlay() functions.

Comment on lines +1142 to +1144
@ScalarFunction
public static String translate(String input, String from, String to) {
if (input.isEmpty() || from.isEmpty()) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRANSLATE was already merged via PR #18779. This PR only adds OVERLAY — it depends on that branch in the history but the final diff vs master will only contain the OVERLAY additions.

Comment on lines +636 to +640
// start beyond end: appends replacement
assertEquals(StringFunctions.overlay("abc", "XY", 10), "abcXY");

// length clamped: cannot delete past end of string
assertEquals(StringFunctions.overlay("abc", "Z", 2, 100), "aZ");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added assertions for start <= 0 (clamped to position 1) and length < 0 (clamped to 0, pure insertion) in the latest commit.

Comment on lines +617 to +619
// Delete more than replacement length: replacement is shorter than deleted span
// FROM 3 FOR 4 removes positions 3-6 (cdef), nothing remains after position 6
assertEquals(StringFunctions.overlay("abcdef", "XY", 3, 4), "abXY");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit — the test now correctly asserts abXY and the PR description example has been updated to match.

@xiangfu0 xiangfu0 merged commit 5526d5b into apache:master Jun 19, 2026
11 checks passed
@xiangfu0

Copy link
Copy Markdown
Contributor

Opened the matching docs PR for this change: pinot-contrib/pinot-docs#876

pinot-contrib/pinot-docs#876

xiangfu0 added a commit to pinot-contrib/pinot-docs that referenced this pull request Jun 19, 2026
## Summary
- document the `OVERLAY` string function in the string function
reference
- add `OVERLAY` to the SQL function index with its Pinot call syntax
- describe the implementation-backed semantics for 1-based positions,
omitted `length`, insertion, append, and clamping behavior

## Source cross-check
-
`pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java`
-
`pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java`

## Validation
- `git diff --check`

## Upstream
- apache/pinot#18790
cypherean pushed a commit to cypherean/pinot that referenced this pull request Jun 24, 2026
…pache#18790)

* Add TRANSLATE scalar function for character-level string substitution

Implements the SQL standard TRANSLATE(string, from, to) function, which
replaces each character in 'from' with the corresponding character in
'to'. Characters beyond the length of 'to' are deleted from the output.
This matches the behavior in PostgreSQL, Oracle, and Trino.

  translate('hello', 'aeiou', 'AEIOU')  → 'hEllO'
  translate('abc',   'abc',   'xy')     → 'xy'    (c deleted)
  translate('abc',   'abc',   '')       → ''      (all deleted)

Adds unit tests covering basic replacement, deletion, no-op (no match),
empty inputs, and duplicate characters in 'from'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add OVERLAY scalar function for SQL-standard substring replacement

Implements OVERLAY(string PLACING replacement FROM start [FOR length]),
which replaces `length` characters of `string` starting at 1-based
position `start` with `replacement`.

When `length` is omitted it defaults to the length of `replacement`,
so replaced and inserted substrings are the same width — matching the
SQL standard and the behaviour in PostgreSQL, Trino, and DuckDB.

  overlay('hello world', 'there', 7)        → 'hello there'
  overlay('abcdef', 'XY', 3, 0)             → 'abXYcdef'  (insert)
  overlay('abcdef', 'XY', 3, 4)             → 'abXYf'     (delete more)
  overlay('abcdef', '', 3, 2)               → 'abef'       (delete only)

Two overloads are registered:
  overlay(str, replacement, start)             -- length defaults to len(replacement)
  overlay(str, replacement, start, length)     -- explicit length

Adds unit tests covering basic replacement, insertion (length=0),
out-of-range start/length clamping, empty inputs, and full-string replacement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix wrong test expectation in testOverlay

overlay("abcdef", "XY", 3, 4) deletes positions 3-6 (cdef), leaving
nothing after position 6, so the result is "abXY" not "abXYf".
Also correct the same example in the Javadoc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Related to scalar or aggregation functions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants