Skip to content

[fix](doc) correct defaults and example outputs that disagreed with the engine#3725

Merged
morningman merged 1 commit into
apache:masterfrom
boluor:fix-batch35-source-verified
May 22, 2026
Merged

[fix](doc) correct defaults and example outputs that disagreed with the engine#3725
morningman merged 1 commit into
apache:masterfrom
boluor:fix-batch35-source-verified

Conversation

@boluor
Copy link
Copy Markdown
Contributor

@boluor boluor commented May 21, 2026

Summary

A pass of doc fixes where the documentation disagreed with what the engine (or the companion tools doris-streamloader and doris-flink-connector) actually does. Each one was verified against the source.

doris-streamloader.mdworkers default

Best Practices said the default was "the number of CPU cores". apache/doris-streamloader/main.go defines flag.IntVar(&workers, "workers", 0, ...) — the default is 0, which means automatic mode (the tool computes a value from import size, disk_throughput, and streamload_throughput, typically resolving to 1, 2, 4, or 8). CPU cores are not consulted anywhere in calculateAndCheckWorkers.

flink-doris-connector.mdsource.use-flight-sql default

The parameter table said default = FALSE, contradicting the prose nearby ("Starting from Doris 2.1, ADBC is the default read protocol"). apache/doris-flink-connector ConfigurationOptions.java defines USE_FLIGHT_SQL_DEFAULT = true since the 25.1.0 connector (PR #574, commit e691bf89, 2025-03-13).

Updated the parameter table to TRUE in docs/ and version-4.x/ (which are paired with the current 25.x connector). version-2.1/ and version-3.x/ were left untouched — the connector versions paired with those Doris releases did default to FALSE, and the prose claim there is a separate issue that warrants a wider rewrite.

json.mdread_json_by_line default

The detailed description further down the page said "Default: false", contradicting the matrix and tip at the top of the same page. The actual default in JsonFileFormatProperties.java:62-69 is true if neither read_json_by_line nor strip_outer_array is supplied; setting strip_outer_array=true flips it to false. Broker Load and Routine Load always force true.

JSON.mdjson_type returns "int", not "TINYINT"

The prose claimed the second 123 was of type TINYINT, but the sample output immediately above shows the result int. json_type (see be/src/util/jsonb_document.h typeName() around lines 647-680) returns the string "int" for T_Int8, T_Int16, and T_Int32 — there is no "tinyint" / "smallint" value. Reworded the prose to match what users actually see.

to-base64-binary.md — edge-case wording

Said "If input is an empty string, returns an empty string". The function takes VARBINARY, not string. Reworded to "If input is an empty VARBINARY (zero bytes), returns an empty string".

date-floor.md — QUARTER missing from the type list

date_ceil lists QUARTER in its type list but date_floor did not. The engine supports date_floor(x, INTERVAL n QUARTER) — see fe/fe-core/.../scalar/QuarterFloor.java and BuiltinScalarFunctions.java:987 scalar(QuarterFloor.class, "quarter_floor"). version-3.x/ already had a tip stating "QUARTER is supported since 3.0.8 and 3.1.0", but the type list still didn't mention it. Added QUARTER to the type list in docs/, version-4.x/, and version-3.x/. version-2.1/ left untouched (engine in 2.1 didn't support QUARTER for floor).

minutes-sub / minutes-add / months-sub / months-add — singular vs plural in error tag

Error messages were transcribed as Operation minutes_add of … / Operation months_add of …, but the engine's get_time_unit_name in be/src/exprs/function/datetime_errors.h returns the singular tags minute_add / month_add:

case TimeUnit::MINUTE: return "minute_add";
case TimeUnit::MONTH:  return "month_add";

So the actual error a user sees is Operation minute_add of …. The throw-on-overflow refactor that introduced these messages landed in Sept 2025, so this only applies to docs/ / version-4.x/ and their zh counterparts (in 2.1/3.x these calls return NULL with no error message).

previous-day.md — frontmatter description + TIMESTAMPTZ claim

  • Frontmatter was missing a description field.
  • The body said the function supports DATE, DATETIME, and TIMESTAMPTZ, but the parameter table only lists DATE and DATETIME, and the FE signature (PreviousDay.java:40-41) is DATE_V2 only:
    private static final List<FunctionSignature> SIGNATURES = ImmutableList.of(
        FunctionSignature.ret(DateV2Type.INSTANCE).args(DateV2Type.INSTANCE, StringType.INSTANCE));
    Aligned the body wording to the parameter table (dropped TIMESTAMPTZ).

Scope

Changes applied to docs/, versioned_docs/version-{2.1,3.x,4.x}/ where the same content exists, and to i18n/zh-CN/ for the cases where the English string is reused verbatim (param-table values, error messages, the QUARTER type list) or where the zh prose exists as a direct translation (the workers default sentence). For read_json_by_line, JSON_TYPE, to-base64-binary, and previous-day, the zh content is structured differently and was left for a separately translated edit.

Findings #239 (from_hex) and #240 (to_hex) were on the audit list but the source confirms the existing docs are already correct — those are not in this PR.

Test plan

  • For each finding, verify the relevant Doris source (config constant, FE signature, BE implementation, error-message helper) and quote the file:line in the commit body.
  • Spot-check each rendered diff.
  • CI build (docusaurus + sidebar checks).

…he engine

These were verified by reading the Doris source (and the doris-streamloader and doris-flink-connector source where the docs describe those tools).

doris-streamloader.md: `workers` default
- Best Practices said "default is the number of CPU cores"; the tool's default is `0`, which enables automatic mode (the worker count is derived from import size, `disk_throughput`, and `streamload_throughput`, typically resolving to 1, 2, 4, or 8). CPU cores are not involved.

flink-doris-connector.md: `source.use-flight-sql` default
- Parameter table said `FALSE`; the connector default has been `TRUE` since 25.1.0 (commit e691bf89). Update the docs/4.x parameter table to `TRUE` to match the current connector. v2.1 / v3.x left untouched — the connector versions paired with those Doris releases used `FALSE` and would need a larger rewrite to disambiguate.

json.md: `read_json_by_line` default
- The detailed description said "Default: false", contradicting the compatibility matrix and tip at the top of the same page. The actual default is `true` if `strip_outer_array` is not set; setting `strip_outer_array=true` flips it to `false`. Broker Load and Routine Load force `true`.

JSON.md: `json_type` returns "int", not "TINYINT"
- The prose claimed the second `123` was of type `TINYINT`, but the sample output right above shows `int`. `json_type` returns the string `"int"` for all sub-bigint integer widths (per `be/src/util/jsonb_document.h` `typeName()`); it never returns `"tinyint"`.

to-base64-binary.md: edge case wording
- Said "If input is an empty string, returns an empty string", but the function takes `VARBINARY`. Reword to "If input is an empty VARBINARY (zero bytes), returns an empty string".

date-floor.md: QUARTER missing from the type list
- `date_ceil` lists QUARTER but `date_floor` did not. The engine supports `date_floor(x, INTERVAL n QUARTER)` (FE `QuarterFloor.java`, `BuiltinScalarFunctions.java:987`); v3.x already has a tip noting "QUARTER is supported since 3.0.8 and 3.1.0". Add QUARTER to the type list in docs/, version-4.x, and version-3.x. v2.1 left untouched (engine didn't support QUARTER for floor there).

minutes-sub / minutes-add / months-sub / months-add: singular vs plural in error tag
- Error messages were transcribed as `Operation minutes_add of …` / `Operation months_add of …`, but the engine's `get_time_unit_name` (`be/src/exprs/function/datetime_errors.h`) returns the singular tags `minute_add` / `month_add`. Fixed.

previous-day.md: frontmatter description + TIMESTAMPTZ claim
- Frontmatter was missing `description`. The body said the function supports DATE, DATETIME, and TIMESTAMPTZ; the parameter table only lists DATE and DATETIME, and the FE signature (`PreviousDay.java:40-41`) is DATE_V2 only. Aligned the body wording to the parameter table (drop TIMESTAMPTZ).

Scope: docs/, versioned_docs/version-{2.1,3.x,4.x}/ where the affected text exists, plus i18n/zh-CN/ counterparts when the same English string appears (workers wording was translated; param-table values are language-independent). Findings apache#239 (from_hex) and apache#240 (to_hex), which were on the audit list, are skipped because the source confirms the docs are already correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@morningman morningman merged commit 641fa35 into apache:master May 22, 2026
3 checks passed
@boluor boluor deleted the fix-batch35-source-verified branch May 22, 2026 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants