[fix](doc) string-functions en: backport zh-only examples (locate, instr, length, lpad, from-base64)#3833
Merged
morningman merged 1 commit intoMay 30, 2026
Conversation
…str, length, lpad, from-base64)
For these five pages the ZH version had grown to 9-10 examples while EN
still only documented 3-5. Backported the missing ZH-only examples to EN
with English headers. All expected outputs were captured directly from a
fresh Doris 4.1.1 cluster, not transcribed from the ZH page.
Per-file delta:
* locate.md +7 examples (NULL / empty / pos / boundary / UTF-8 /
case / empty-substr-with-pos)
* instr.md +6 examples (repeated / specials / UTF-8 / digits /
multi-word / paths-and-URLs)
* length.md +6 examples (empty / mixed / escapes / UTF-8 vs
CHAR_LENGTH / emoji / digits)
* lpad.md +6 examples (zero-len / empty-pad / repeat / UTF-8 /
zero-pad numeric / negative-len)
* from-base64.md +5 examples (long / UTF-8 / email / JSON / round-trip
with TO_BASE64)
ltrim.md was originally on the same list but the only zh-only example
('多种空白字符处理') has an incorrect expected output -- by default LTRIM
strips ASCII space only, not \t/\n; the ZH expected row dropped the
remaining whitespace, which is wrong. Skipped backporting that one;
filing as a doc bug separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
morningman
pushed a commit
that referenced
this pull request
May 30, 2026
… NOT strip \t / \n by default) (#3838) ## Summary The original ZH example 3 ("多种空白字符处理") in `ltrim.md` ran: ```sql SELECT LTRIM(' \t\n hello world'); ``` and claimed the result was `hello world`. That's wrong: by default `LTRIM` strips only ASCII space, not tab (`\t`) or newline (`\n`), so the actual result is `\t\n hello world` — two characters shorter than the input (the two leading spaces), not six chars shorter. Rewrote the example along the same lines as the corresponding RTRIM "默认只剥除 ASCII 空格" example: compare `LENGTH()` before and after the call so the behavior is unambiguous and the doc renders cleanly without embedded tabs/newlines in the result table. ## Verification On Doris 4.1.1: ``` LENGTH(' \t\n hello world') -> 17 LENGTH(LTRIM(' \t\n hello world')) -> 15 ``` So only 2 chars (the two leading spaces) are removed, as documented now. ## Context Originally flagged when this example was excluded from the bulk ZH→EN backport in PR #3833 (the only zh-only ltrim example whose expected output didn't match reality on the cluster). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
morningman
pushed a commit
that referenced
this pull request
May 30, 2026
…orner-case adds (en+zh) (#3848) ## Summary Mechanical port of the 4.x fixes in #3833, #3837, #3838, #3839 to dev/master. Verified against today's master build (selectdb-qa-test tarball). **Skipped** (already applied on dev): - strleft.md ZH dedup (#3837) - milliseconds-add.md EN BIGINT-range example (#3834 EN-side; the ZH duplicate-removal piece is in the sibling PR) ## Files (13) ### EN string-functions - **from-base64.md, instr.md, length.md, locate.md, lpad.md** — backport corner-case examples (NULL / empty / multi-byte UTF-8 / numeric / etc.) added in #3833 - **rtrim.md** — add the LENGTH-based 'default-only-strips-ASCII-space' example (#3837) - **substring.md** — add the missing 'empty source string' example + 'NULL passed directly' (#3839) - **trim.md** — replace example 2's prose + expected output (`trim('ababccaab', 'ab')` is `cca`, not `ababcca` — trim repeatedly strips from both ends), plus the UTF-8 multi-byte-pattern example (#3839) ### EN other-functions - **field.md** — full replace with 4.x post-fix version: adds CREATE TABLE setup for `baseall` and `class_test` (which the page references but never created), adds a NULL row to `class_test`, adds DESC and NULLS FIRST examples that exercise NULL handling (#3789 + #3839 combined) ### ZH - **field.md** — add the simple `SELECT FIELD(2, 3, 1, 2, 5)` example (#3839) - **ltrim.md** — rewrite example 3 with correct expected output; LTRIM strips ASCII space only, NOT `\t`/`\n`, so the result still contains the tab/newline. Switched to a LENGTH() comparison for clarity (#3838) - **substring.md** — add the SUBSTR alias example (#3839) - **trim.md** — add three examples (no-match returns original; repeated pattern strips until exhausted; asymmetric removal with multi-char pattern). Also drop a trailing `;;` typo (#3839) ## Verification Verified end-to-end against today's master cluster — all added / modified examples behave identically on master as on 4.1.1, so the doc fixes apply unchanged. ## Related 4.x PRs #3833 #3837 #3838 #3839 (and #3789 for the field.md setup blocks). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
For these five pages the ZH version had grown to 9-10 examples while EN still only documented 3-5. Backported the missing ZH-only examples to EN with English headers. All expected outputs were captured directly from a fresh Doris 4.1.1 cluster (not transcribed from the ZH page) so the EN side is verified against the implementation.
locate.mdinstr.mdlength.mdlpad.mdfrom-base64.mdExamples added cover the natural corner cases each function tends to have: NULL / empty / boundary inputs, UTF-8 multi-byte, repeated occurrences, numeric/URL/email/JSON shapes, round-trip pairs (
FROM_BASE64(TO_BASE64(x))), etc. All were already passing on ZH; this PR just brings EN to parity.What's not in this PR
ltrim.mdwas on the original P1-zh-richer list but the only zh-only example ("多种空白字符处理",LTRIM(' \t\n hello world')) has an incorrect expected output: by defaultLTRIMstrips only the ASCII space character, not\t/\n, so the actual result still contains the tab and newline. Did not backport this one — will file the ZH expected-output bug separately.Verification
For each file:
Test plan
🤖 Generated with Claude Code