Skip to content

[fix](doc) string-functions en: backport zh-only examples (locate, instr, length, lpad, from-base64)#3833

Merged
morningman merged 1 commit into
apache:masterfrom
boluor:fix/string-functions-en-backport-zh-examples-batchA
May 30, 2026
Merged

[fix](doc) string-functions en: backport zh-only examples (locate, instr, length, lpad, from-base64)#3833
morningman merged 1 commit into
apache:masterfrom
boluor:fix/string-functions-en-backport-zh-examples-batchA

Conversation

@boluor
Copy link
Copy Markdown
Contributor

@boluor boluor commented May 28, 2026

Summary

For these five pages the ZH version had grown to 9-10 examples while EN still only documented 3-5. Backported the missing ZH-only examples to EN with English headers. All expected outputs were captured directly from a fresh Doris 4.1.1 cluster (not transcribed from the ZH page) so the EN side is verified against the implementation.

Page Examples before Examples added Examples after
locate.md 3 +7 10
instr.md 4 +6 10
length.md 3 +6 9
lpad.md 3 +6 9
from-base64.md 5 +5 10

Examples added cover the natural corner cases each function tends to have: NULL / empty / boundary inputs, UTF-8 multi-byte, repeated occurrences, numeric/URL/email/JSON shapes, round-trip pairs (FROM_BASE64(TO_BASE64(x))), etc. All were already passing on ZH; this PR just brings EN to parity.

What's not in this PR

ltrim.md was on the original P1-zh-richer list but the only zh-only example ("多种空白字符处理", LTRIM(' \t\n hello world')) has an incorrect expected output: by default LTRIM strips only the ASCII space character, not \t / \n, so the actual result still contains the tab and newline. Did not backport this one — will file the ZH expected-output bug separately.

Verification

For each file:

  • Ran every added SQL against Doris 4.1.1 and pasted the exact ASCII output the cluster returned
  • Re-extracted each EN markdown file with the verifier's SQL extractor; statement counts now match the ZH side (10 / 10 / 9 / 9 / 10)
  • No fence / markdown-parsing changes were needed

Test plan

  • Each added example runs cleanly on Doris 4.1.1 and matches the printed expected output
  • EN stmt structure now parallels ZH after the change
  • No SQL was authored or modified — only EN prose headers translated; SQL and result tables are byte-identical to the corresponding ZH blocks

🤖 Generated with Claude Code

…str, length, lpad, from-base64)

For these five pages the ZH version had grown to 9-10 examples while EN
still only documented 3-5. Backported the missing ZH-only examples to EN
with English headers. All expected outputs were captured directly from a
fresh Doris 4.1.1 cluster, not transcribed from the ZH page.

Per-file delta:

  * locate.md      +7 examples (NULL / empty / pos / boundary / UTF-8 /
                                case / empty-substr-with-pos)
  * instr.md       +6 examples (repeated / specials / UTF-8 / digits /
                                multi-word / paths-and-URLs)
  * length.md      +6 examples (empty / mixed / escapes / UTF-8 vs
                                CHAR_LENGTH / emoji / digits)
  * lpad.md        +6 examples (zero-len / empty-pad / repeat / UTF-8 /
                                zero-pad numeric / negative-len)
  * from-base64.md +5 examples (long / UTF-8 / email / JSON / round-trip
                                with TO_BASE64)

ltrim.md was originally on the same list but the only zh-only example
('多种空白字符处理') has an incorrect expected output -- by default LTRIM
strips ASCII space only, not \t/\n; the ZH expected row dropped the
remaining whitespace, which is wrong. Skipped backporting that one;
filing as a doc bug separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@morningman morningman merged commit 2456946 into apache:master May 30, 2026
3 checks passed
morningman pushed a commit that referenced this pull request May 30, 2026
… NOT strip \t / \n by default) (#3838)

## Summary

The original ZH example 3 ("多种空白字符处理") in `ltrim.md` ran:

```sql
SELECT LTRIM('  \t\n  hello world');
```

and claimed the result was `hello world`. That's wrong: by default
`LTRIM` strips only ASCII space, not tab (`\t`) or newline (`\n`), so
the actual result is `\t\n hello world` — two characters shorter than
the input (the two leading spaces), not six chars shorter.

Rewrote the example along the same lines as the corresponding RTRIM
"默认只剥除 ASCII 空格" example: compare `LENGTH()` before and after the call
so the behavior is unambiguous and the doc renders cleanly without
embedded tabs/newlines in the result table.

## Verification

On Doris 4.1.1:

```
LENGTH('  \t\n  hello world')          -> 17
LENGTH(LTRIM('  \t\n  hello world'))   -> 15
```

So only 2 chars (the two leading spaces) are removed, as documented now.

## Context

Originally flagged when this example was excluded from the bulk ZH→EN
backport in PR #3833 (the only zh-only ltrim example whose expected
output didn't match reality on the cluster).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
morningman pushed a commit that referenced this pull request May 30, 2026
…orner-case adds (en+zh) (#3848)

## Summary

Mechanical port of the 4.x fixes in #3833, #3837, #3838, #3839 to
dev/master. Verified against today's master build (selectdb-qa-test
tarball).

**Skipped** (already applied on dev):
- strleft.md ZH dedup (#3837)
- milliseconds-add.md EN BIGINT-range example (#3834 EN-side; the ZH
duplicate-removal piece is in the sibling PR)

## Files (13)

### EN string-functions
- **from-base64.md, instr.md, length.md, locate.md, lpad.md** — backport
corner-case examples (NULL / empty / multi-byte UTF-8 / numeric / etc.)
added in #3833
- **rtrim.md** — add the LENGTH-based 'default-only-strips-ASCII-space'
example (#3837)
- **substring.md** — add the missing 'empty source string' example +
'NULL passed directly' (#3839)
- **trim.md** — replace example 2's prose + expected output
(`trim('ababccaab', 'ab')` is `cca`, not `ababcca` — trim repeatedly
strips from both ends), plus the UTF-8 multi-byte-pattern example
(#3839)

### EN other-functions
- **field.md** — full replace with 4.x post-fix version: adds CREATE
TABLE setup for `baseall` and `class_test` (which the page references
but never created), adds a NULL row to `class_test`, adds DESC and NULLS
FIRST examples that exercise NULL handling (#3789 + #3839 combined)

### ZH
- **field.md** — add the simple `SELECT FIELD(2, 3, 1, 2, 5)` example
(#3839)
- **ltrim.md** — rewrite example 3 with correct expected output; LTRIM
strips ASCII space only, NOT `\t`/`\n`, so the result still contains the
tab/newline. Switched to a LENGTH() comparison for clarity (#3838)
- **substring.md** — add the SUBSTR alias example (#3839)
- **trim.md** — add three examples (no-match returns original; repeated
pattern strips until exhausted; asymmetric removal with multi-char
pattern). Also drop a trailing `;;` typo (#3839)

## Verification

Verified end-to-end against today's master cluster — all added /
modified examples behave identically on master as on 4.1.1, so the doc
fixes apply unchanged.

## Related 4.x PRs
#3833 #3837 #3838 #3839 (and #3789 for the field.md setup blocks).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants