[fix](doc) correct defaults and example outputs that disagreed with the engine#3725
Merged
Merged
Conversation
…he engine
These were verified by reading the Doris source (and the doris-streamloader and doris-flink-connector source where the docs describe those tools).
doris-streamloader.md: `workers` default
- Best Practices said "default is the number of CPU cores"; the tool's default is `0`, which enables automatic mode (the worker count is derived from import size, `disk_throughput`, and `streamload_throughput`, typically resolving to 1, 2, 4, or 8). CPU cores are not involved.
flink-doris-connector.md: `source.use-flight-sql` default
- Parameter table said `FALSE`; the connector default has been `TRUE` since 25.1.0 (commit e691bf89). Update the docs/4.x parameter table to `TRUE` to match the current connector. v2.1 / v3.x left untouched — the connector versions paired with those Doris releases used `FALSE` and would need a larger rewrite to disambiguate.
json.md: `read_json_by_line` default
- The detailed description said "Default: false", contradicting the compatibility matrix and tip at the top of the same page. The actual default is `true` if `strip_outer_array` is not set; setting `strip_outer_array=true` flips it to `false`. Broker Load and Routine Load force `true`.
JSON.md: `json_type` returns "int", not "TINYINT"
- The prose claimed the second `123` was of type `TINYINT`, but the sample output right above shows `int`. `json_type` returns the string `"int"` for all sub-bigint integer widths (per `be/src/util/jsonb_document.h` `typeName()`); it never returns `"tinyint"`.
to-base64-binary.md: edge case wording
- Said "If input is an empty string, returns an empty string", but the function takes `VARBINARY`. Reword to "If input is an empty VARBINARY (zero bytes), returns an empty string".
date-floor.md: QUARTER missing from the type list
- `date_ceil` lists QUARTER but `date_floor` did not. The engine supports `date_floor(x, INTERVAL n QUARTER)` (FE `QuarterFloor.java`, `BuiltinScalarFunctions.java:987`); v3.x already has a tip noting "QUARTER is supported since 3.0.8 and 3.1.0". Add QUARTER to the type list in docs/, version-4.x, and version-3.x. v2.1 left untouched (engine didn't support QUARTER for floor there).
minutes-sub / minutes-add / months-sub / months-add: singular vs plural in error tag
- Error messages were transcribed as `Operation minutes_add of …` / `Operation months_add of …`, but the engine's `get_time_unit_name` (`be/src/exprs/function/datetime_errors.h`) returns the singular tags `minute_add` / `month_add`. Fixed.
previous-day.md: frontmatter description + TIMESTAMPTZ claim
- Frontmatter was missing `description`. The body said the function supports DATE, DATETIME, and TIMESTAMPTZ; the parameter table only lists DATE and DATETIME, and the FE signature (`PreviousDay.java:40-41`) is DATE_V2 only. Aligned the body wording to the parameter table (drop TIMESTAMPTZ).
Scope: docs/, versioned_docs/version-{2.1,3.x,4.x}/ where the affected text exists, plus i18n/zh-CN/ counterparts when the same English string appears (workers wording was translated; param-table values are language-independent). Findings apache#239 (from_hex) and apache#240 (to_hex), which were on the audit list, are skipped because the source confirms the docs are already correct.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A pass of doc fixes where the documentation disagreed with what the engine (or the companion tools
doris-streamloaderanddoris-flink-connector) actually does. Each one was verified against the source.doris-streamloader.md—workersdefaultBest Practices said the default was "the number of CPU cores".
apache/doris-streamloader/main.godefinesflag.IntVar(&workers, "workers", 0, ...)— the default is0, which means automatic mode (the tool computes a value from import size,disk_throughput, andstreamload_throughput, typically resolving to 1, 2, 4, or 8). CPU cores are not consulted anywhere incalculateAndCheckWorkers.flink-doris-connector.md—source.use-flight-sqldefaultThe parameter table said default =
FALSE, contradicting the prose nearby ("Starting from Doris 2.1, ADBC is the default read protocol").apache/doris-flink-connectorConfigurationOptions.javadefinesUSE_FLIGHT_SQL_DEFAULT = truesince the 25.1.0 connector (PR #574, commite691bf89, 2025-03-13).Updated the parameter table to
TRUEindocs/andversion-4.x/(which are paired with the current 25.x connector).version-2.1/andversion-3.x/were left untouched — the connector versions paired with those Doris releases did default toFALSE, and the prose claim there is a separate issue that warrants a wider rewrite.json.md—read_json_by_linedefaultThe detailed description further down the page said "Default: false", contradicting the matrix and tip at the top of the same page. The actual default in
JsonFileFormatProperties.java:62-69istrueif neitherread_json_by_linenorstrip_outer_arrayis supplied; settingstrip_outer_array=trueflips it tofalse. Broker Load and Routine Load always forcetrue.JSON.md—json_typereturns"int", not"TINYINT"The prose claimed the second
123was of typeTINYINT, but the sample output immediately above shows the resultint.json_type(seebe/src/util/jsonb_document.htypeName()around lines 647-680) returns the string"int"forT_Int8,T_Int16, andT_Int32— there is no"tinyint"/"smallint"value. Reworded the prose to match what users actually see.to-base64-binary.md— edge-case wordingSaid "If input is an empty string, returns an empty string". The function takes
VARBINARY, not string. Reworded to "If input is an empty VARBINARY (zero bytes), returns an empty string".date-floor.md— QUARTER missing from the type listdate_ceillistsQUARTERin its type list butdate_floordid not. The engine supportsdate_floor(x, INTERVAL n QUARTER)— seefe/fe-core/.../scalar/QuarterFloor.javaandBuiltinScalarFunctions.java:987 scalar(QuarterFloor.class, "quarter_floor").version-3.x/already had a tip stating "QUARTER is supported since 3.0.8 and 3.1.0", but the type list still didn't mention it. AddedQUARTERto the type list indocs/,version-4.x/, andversion-3.x/.version-2.1/left untouched (engine in 2.1 didn't supportQUARTERfor floor).minutes-sub/minutes-add/months-sub/months-add— singular vs plural in error tagError messages were transcribed as
Operation minutes_add of …/Operation months_add of …, but the engine'sget_time_unit_nameinbe/src/exprs/function/datetime_errors.hreturns the singular tagsminute_add/month_add:So the actual error a user sees is
Operation minute_add of …. The throw-on-overflow refactor that introduced these messages landed in Sept 2025, so this only applies todocs//version-4.x/and their zh counterparts (in 2.1/3.x these calls return NULL with no error message).previous-day.md— frontmatterdescription+ TIMESTAMPTZ claimdescriptionfield.DATE,DATETIME, andTIMESTAMPTZ, but the parameter table only listsDATEandDATETIME, and the FE signature (PreviousDay.java:40-41) isDATE_V2only:Scope
Changes applied to
docs/,versioned_docs/version-{2.1,3.x,4.x}/where the same content exists, and toi18n/zh-CN/for the cases where the English string is reused verbatim (param-table values, error messages, the QUARTER type list) or where the zh prose exists as a direct translation (theworkersdefault sentence). Forread_json_by_line,JSON_TYPE,to-base64-binary, andprevious-day, the zh content is structured differently and was left for a separately translated edit.Findings
#239(from_hex) and#240(to_hex) were on the audit list but the source confirms the existing docs are already correct — those are not in this PR.Test plan