Skip to content

perf: Optimize NULL handling in substr#21519

Merged
comphead merged 1 commit intoapache:mainfrom
neilconway:neilc/perf-substr-nulls
Apr 9, 2026
Merged

perf: Optimize NULL handling in substr#21519
comphead merged 1 commit intoapache:mainfrom
neilconway:neilc/perf-substr-nulls

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Similar to other recent changes, substr currently checks for NULLs and builds the result NULL bitmap on a per-row basis. It is faster to instead compute the result NULL bitmap in bulk via bitwise AND.

Benchmarks (ARM64):

  - substr, no count, short strings/substr_large_string [size=1024]: 21.4µs → 20.9µs (-2.3%)
  - substr, no count, short strings/substr_large_string [size=4096]: 83.1µs → 83.0µs (-0.1%)
  - substr, no count, short strings/substr_string [size=1024]: 20.5µs → 19.8µs (-3.4%)
  - substr, no count, short strings/substr_string [size=4096]: 78.8µs → 77.0µs (-2.3%)
  - substr, no count, short strings/substr_string_view [size=1024]: 18.9µs → 16.1µs (-14.8%)
  - substr, no count, short strings/substr_string_view [size=4096]: 74.0µs → 61.6µs (-16.8%)
  - substr, scalar args, long strings/substr_large_string [size=1024]: 35.2µs → 34.0µs (-3.4%)
  - substr, scalar args, long strings/substr_large_string [size=4096]: 140.6µs → 134.5µs (-4.3%)
  - substr, scalar args, long strings/substr_string [size=1024]: 35.5µs → 33.8µs (-4.8%)
  - substr, scalar args, long strings/substr_string [size=4096]: 138.9µs → 134.2µs (-3.4%)
  - substr, scalar args, long strings/substr_string_view [size=1024]: 34.0µs → 31.0µs (-8.8%)
  - substr, scalar args, long strings/substr_string_view [size=4096]: 132.0µs → 121.8µs (-7.7%)
  - substr, scalar args, short strings/substr_string [size=1024]: 31.0µs → 29.2µs (-5.8%)
  - substr, scalar args, short strings/substr_string [size=4096]: 120.8µs → 111.5µs (-7.7%)
  - substr, scalar args, short strings/substr_string_view [size=1024]: 26.8µs → 23.1µs (-13.8%)
  - substr, scalar args, short strings/substr_string_view [size=4096]: 101.6µs → 86.4µs (-14.9%)
  - substr, scalar start, no count, long strings/substr_string [size=1024]: 34.5µs → 33.2µs (-3.8%)
  - substr, scalar start, no count, long strings/substr_string [size=4096]: 134.4µs → 133.6µs (-0.6%)
  - substr, scalar start, no count, long strings/substr_string_view [size=1024]: 32.9µs → 29.4µs (-10.6%)
  - substr, scalar start, no count, long strings/substr_string_view [size=4096]: 126.1µs → 115.2µs (-8.6%)
  - substr, scalar start, no count, short strings/substr_string [size=1024]: 20.9µs → 20.1µs (-3.8%)
  - substr, scalar start, no count, short strings/substr_string [size=4096]: 80.1µs → 77.5µs (-3.2%)
  - substr, scalar start, no count, short strings/substr_string_view [size=1024]: 19.9µs → 16.7µs (-16.1%)
  - substr, scalar start, no count, short strings/substr_string_view [size=4096]: 74.4µs → 62.4µs (-16.1%)
  - substr, short count, long strings/substr_large_string [size=1024]: 30.3µs → 28.4µs (-6.3%)
  - substr, short count, long strings/substr_large_string [size=4096]: 117.1µs → 112.0µs (-4.4%)
  - substr, short count, long strings/substr_string [size=1024]: 30.2µs → 28.3µs (-6.3%)
  - substr, short count, long strings/substr_string [size=4096]: 118.0µs → 111.0µs (-5.9%)
  - substr, short count, long strings/substr_string_view [size=1024]: 26.1µs → 22.8µs (-12.6%)
  - substr, short count, long strings/substr_string_view [size=4096]: 101.5µs → 87.7µs (-13.6%)
  - substr, with count, long strings/substr_large_string [size=1024]: 34.6µs → 32.8µs (-5.2%)
  - substr, with count, long strings/substr_large_string [size=4096]: 136.7µs → 133.0µs (-2.7%)
  - substr, with count, long strings/substr_string [size=1024]: 34.2µs → 32.7µs (-4.4%)
  - substr, with count, long strings/substr_string [size=4096]: 136.6µs → 132.3µs (-3.1%)
  - substr, with count, long strings/substr_string_view [size=1024]: 33.3µs → 30.3µs (-9.0%)
  - substr, with count, long strings/substr_string_view [size=4096]: 129.1µs → 119.6µs (-7.4%)

What changes are included in this PR?

  • Implement optimization
  • Rename make_and_append_view to append_view, and have callers deal with NULL handling; making it part of append_view encourages per-row NULL computations, which should be avoided when possible.
  • Mark append_view as never-inline; this avoids a performance regression on some of the substr microbenchmarks, where LLVM is a little eager to inline a large-ish function into a hot loop.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the functions Changes to functions implementation label Apr 9, 2026
Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @neilconway

@comphead comphead added this pull request to the merge queue Apr 9, 2026
Merged via the queue into apache:main with commit 8939726 Apr 9, 2026
31 checks passed
@neilconway neilconway deleted the neilc/perf-substr-nulls branch April 10, 2026 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize NULL handling in substr

2 participants