[fix](doc) string-functions en: backport zh-only examples (locate, instr, length, lpad, from-base64) by boluor · Pull Request #3833 · apache/doris-website

boluor · 2026-05-28T14:43:46Z

Summary

For these five pages the ZH version had grown to 9-10 examples while EN still only documented 3-5. Backported the missing ZH-only examples to EN with English headers. All expected outputs were captured directly from a fresh Doris 4.1.1 cluster (not transcribed from the ZH page) so the EN side is verified against the implementation.

Page	Examples before	Examples added	Examples after
`locate.md`	3	+7	10
`instr.md`	4	+6	10
`length.md`	3	+6	9
`lpad.md`	3	+6	9
`from-base64.md`	5	+5	10

Examples added cover the natural corner cases each function tends to have: NULL / empty / boundary inputs, UTF-8 multi-byte, repeated occurrences, numeric/URL/email/JSON shapes, round-trip pairs (FROM_BASE64(TO_BASE64(x))), etc. All were already passing on ZH; this PR just brings EN to parity.

What's not in this PR

ltrim.md was on the original P1-zh-richer list but the only zh-only example ("多种空白字符处理", LTRIM(' \t\n hello world')) has an incorrect expected output: by default LTRIM strips only the ASCII space character, not \t / \n, so the actual result still contains the tab and newline. Did not backport this one — will file the ZH expected-output bug separately.

Verification

For each file:

Ran every added SQL against Doris 4.1.1 and pasted the exact ASCII output the cluster returned
Re-extracted each EN markdown file with the verifier's SQL extractor; statement counts now match the ZH side (10 / 10 / 9 / 9 / 10)
No fence / markdown-parsing changes were needed

Test plan

Each added example runs cleanly on Doris 4.1.1 and matches the printed expected output
EN stmt structure now parallels ZH after the change
No SQL was authored or modified — only EN prose headers translated; SQL and result tables are byte-identical to the corresponding ZH blocks

🤖 Generated with Claude Code

…str, length, lpad, from-base64) For these five pages the ZH version had grown to 9-10 examples while EN still only documented 3-5. Backported the missing ZH-only examples to EN with English headers. All expected outputs were captured directly from a fresh Doris 4.1.1 cluster, not transcribed from the ZH page. Per-file delta: * locate.md +7 examples (NULL / empty / pos / boundary / UTF-8 / case / empty-substr-with-pos) * instr.md +6 examples (repeated / specials / UTF-8 / digits / multi-word / paths-and-URLs) * length.md +6 examples (empty / mixed / escapes / UTF-8 vs CHAR_LENGTH / emoji / digits) * lpad.md +6 examples (zero-len / empty-pad / repeat / UTF-8 / zero-pad numeric / negative-len) * from-base64.md +5 examples (long / UTF-8 / email / JSON / round-trip with TO_BASE64) ltrim.md was originally on the same list but the only zh-only example ('多种空白字符处理') has an incorrect expected output -- by default LTRIM strips ASCII space only, not \t/\n; the ZH expected row dropped the remaining whitespace, which is wrong. Skipped backporting that one; filing as a doc bug separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… NOT strip \t / \n by default) (#3838) ## Summary The original ZH example 3 ("多种空白字符处理") in `ltrim.md` ran: ```sql SELECT LTRIM(' \t\n hello world'); ``` and claimed the result was `hello world`. That's wrong: by default `LTRIM` strips only ASCII space, not tab (`\t`) or newline (`\n`), so the actual result is `\t\n hello world` — two characters shorter than the input (the two leading spaces), not six chars shorter. Rewrote the example along the same lines as the corresponding RTRIM "默认只剥除 ASCII 空格" example: compare `LENGTH()` before and after the call so the behavior is unambiguous and the doc renders cleanly without embedded tabs/newlines in the result table. ## Verification On Doris 4.1.1: ``` LENGTH(' \t\n hello world') -> 17 LENGTH(LTRIM(' \t\n hello world')) -> 15 ``` So only 2 chars (the two leading spaces) are removed, as documented now. ## Context Originally flagged when this example was excluded from the bulk ZH→EN backport in PR #3833 (the only zh-only ltrim example whose expected output didn't match reality on the cluster). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…orner-case adds (en+zh) (#3848) ## Summary Mechanical port of the 4.x fixes in #3833, #3837, #3838, #3839 to dev/master. Verified against today's master build (selectdb-qa-test tarball). **Skipped** (already applied on dev): - strleft.md ZH dedup (#3837) - milliseconds-add.md EN BIGINT-range example (#3834 EN-side; the ZH duplicate-removal piece is in the sibling PR) ## Files (13) ### EN string-functions - **from-base64.md, instr.md, length.md, locate.md, lpad.md** — backport corner-case examples (NULL / empty / multi-byte UTF-8 / numeric / etc.) added in #3833 - **rtrim.md** — add the LENGTH-based 'default-only-strips-ASCII-space' example (#3837) - **substring.md** — add the missing 'empty source string' example + 'NULL passed directly' (#3839) - **trim.md** — replace example 2's prose + expected output (`trim('ababccaab', 'ab')` is `cca`, not `ababcca` — trim repeatedly strips from both ends), plus the UTF-8 multi-byte-pattern example (#3839) ### EN other-functions - **field.md** — full replace with 4.x post-fix version: adds CREATE TABLE setup for `baseall` and `class_test` (which the page references but never created), adds a NULL row to `class_test`, adds DESC and NULLS FIRST examples that exercise NULL handling (#3789 + #3839 combined) ### ZH - **field.md** — add the simple `SELECT FIELD(2, 3, 1, 2, 5)` example (#3839) - **ltrim.md** — rewrite example 3 with correct expected output; LTRIM strips ASCII space only, NOT `\t`/`\n`, so the result still contains the tab/newline. Switched to a LENGTH() comparison for clarity (#3838) - **substring.md** — add the SUBSTR alias example (#3839) - **trim.md** — add three examples (no-match returns original; repeated pattern strips until exhausted; asymmetric removal with multi-char pattern). Also drop a trailing `;;` typo (#3839) ## Verification Verified end-to-end against today's master cluster — all added / modified examples behave identically on master as on 4.1.1, so the doc fixes apply unchanged. ## Related 4.x PRs #3833 #3837 #3838 #3839 (and #3789 for the field.md setup blocks). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This was referenced May 28, 2026

[fix](doc) ltrim.md zh example 3: correct expected output (LTRIM does NOT strip \t / \n by default) #3838

Merged

[fix](doc) dev: backport 4.x string-function example expansions and corner-case adds (en+zh) #3848

Merged

boluor temporarily deployed to Production May 30, 2026 02:44 — with GitHub Actions Inactive

morningman merged commit 2456946 into apache:master May 30, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix](doc) string-functions en: backport zh-only examples (locate, instr, length, lpad, from-base64)#3833

[fix](doc) string-functions en: backport zh-only examples (locate, instr, length, lpad, from-base64)#3833
morningman merged 1 commit into
apache:masterfrom
boluor:fix/string-functions-en-backport-zh-examples-batchA

boluor commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

boluor commented May 28, 2026

Summary

What's not in this PR

Verification

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants