Skip to content

Restore default config in generate-system-tables-docs to cover system logs#104287

Open
alexey-milovidov wants to merge 9 commits intomasterfrom
restore-system-logs-docs-config
Open

Restore default config in generate-system-tables-docs to cover system logs#104287
alexey-milovidov wants to merge 9 commits intomasterfrom
restore-system-logs-docs-config

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

Reinstates utils/generate-system-tables-docs-config.xml as the default config for utils/generate-system-tables-docs and adds <prepare_system_log_tables_on_startup>true</prepare_system_log_tables_on_startup> so that all configured *_log tables are created up-front in clickhouse-local and their schemas are visible in system.columns without requiring a flush.

clickhouse-local already initializes system logs whenever any <*_log> section is present in the config (via hasAnySystemLogConfigured, see programs/local/LocalServer.cpp:1187-1197), so no server-side change was needed; the previous claim in the dropped XML that "log tables are not created in clickhouse-local because initializeSystemLogs requires full server infrastructure" was stale and has been removed.

The included docs regeneration adds previously-missing pages for azure_queue_log, s3queue_log, filesystem_cache_log, filesystem_read_prefetches_log, delta_lake_metadata_log, and transactions_info_log, and refreshes the <!--AUTOGENERATED--> blocks for the remaining *_log tables (query_log, trace_log, text_log, metric_log, error_log, query_thread_log, part_log, crash_log, session_log, opentelemetry_span_log, query_views_log, zookeeper_log, processors_profile_log, asynchronous_insert_log, backup_log, blob_storage_log, query_metric_log, dead_letter_queue, zookeeper_connection_log, aggregated_zookeeper_log, iceberg_metadata_log, predicate_statistics_log, histogram_metric_log, asynchronous_metric_log, background_schedule_pool_log).

Changelog category (leave one):

  • Documentation (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Auto-generated documentation for system log tables (query_log, trace_log, etc.) is now produced from the source by utils/generate-system-tables-docs.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

🤖 Generated with Claude Code

…em logs

Reinstates `utils/generate-system-tables-docs-config.xml` as the default
config used by the docs generator and adds
`prepare_system_log_tables_on_startup` so that all configured `*_log`
tables are created up-front in `clickhouse-local` and their schemas are
visible in `system.columns` without requiring a flush.

`clickhouse-local` already initializes system logs whenever any
`<*_log>` section is present in the config (via
`hasAnySystemLogConfigured`), so no server-side change is needed; the
previous claim that "log tables are not created in clickhouse-local
because initializeSystemLogs() requires full server infrastructure" was
stale.

With the default config restored, the docs job now generates entries
for `query_log`, `trace_log`, `text_log`, `metric_log`, `error_log`,
`query_thread_log`, `part_log`, `crash_log`, `session_log`,
`opentelemetry_span_log`, `query_views_log`, `zookeeper_log`,
`processors_profile_log`, `asynchronous_insert_log`, `backup_log`,
`blob_storage_log`, `query_metric_log`, `dead_letter_queue`,
`zookeeper_connection_log`, `aggregated_zookeeper_log`,
`iceberg_metadata_log`, `delta_lake_metadata_log`,
`predicate_statistics_log`, `histogram_metric_log`,
`asynchronous_metric_log`, `background_schedule_pool_log`, and the
previously-undocumented `azure_queue_log`, `s3queue_log`,
`filesystem_cache_log`, `filesystem_read_prefetches_log`, and
`transactions_info_log`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 7, 2026

Workflow [PR], commit [0f0d517]

Summary:

job_name test_name status info comment
Unit tests (msan, function_prop_fuzzer) FAIL

AI Review

Summary

This PR restores the default config for utils/generate-system-tables-docs, enables eager system-log table preparation in clickhouse-local, and regenerates system-table docs from source metadata. I did not find additional correctness or safety issues beyond already discussed review threads.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-documentation Documentation PRs for the specific code PR label May 7, 2026
Style check was failing because the regenerated `*_log` docs surface
identifiers and descriptions from C++ code that the aspell dictionary
hasn't seen before:

- Add 181 technical terms (class names, methods, events, ZooKeeper
  error codes, etc.) to `ci/jobs/scripts/check_style/aspell-ignore/en/aspell-dict.txt`.
- Fix five typos in metric/event descriptions that ended up in the docs
  via `system.columns`: `constanstans` → `constants`,
  `aggragating` → `aggregating`, `oudated` → `outdated`,
  `acquite` → `acquire`, `idnex` → `index`.

Style check report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104287&sha=437e062f7e7b750dd2147373d010221c890d8539&name_0=PR&name_1=Style%20check

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
doc_type: 'reference'
---

Contains a history of all prefetches done during reading from MergeTables backed by a remote filesystem.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording typo: MergeTables looks incorrect here; this should likely be MergeTree tables (or a more precise term if you intended a different engine family).

Could you fix the source description and regenerate this page?

doc_type: 'reference'
---

Contains logging entries with the information files processes by S3Queue engine.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo/wording issue in the table description: information files processes is grammatically incorrect, and this page is for azure_queue_log but the sentence says S3Queue.

Please update the source comment (likely in SystemLog.h, then regenerate) to something like: "Contains log entries with information about files processed by the AzureQueue engine." (and similarly fix s3queue_log wording).

- `replication_lag` ([Nullable(UInt32)](/sql-reference/data-types/nullable)) — The replication lag of the `Replicated` database replica (for clusters that belong to a Replicated database).
- `recovery_time` ([Nullable(UInt64)](/sql-reference/data-types/nullable)) — The recovery time of the `Replicated` database replica (for clusters that belong to a Replicated database), in milliseconds.

**Aliases:**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regeneration introduces a duplicate aliases section in the same page: one inside the <!--AUTOGENERATED_*--> block and another immediately after it. The same duplication appears in multiple files touched by this PR (clusters.md, columns.md, metrics.md, etc.), which makes the rendered docs repetitive.

Please keep aliases in one place only (preferably inside the autogenerated block) and remove the trailing manual **Aliases:** blocks from affected pages.

alexey-milovidov and others added 3 commits May 7, 2026 20:42
Three follow-ups for the docs regeneration in PR #104287:

1. `docs/en/operations/system-tables/histogram_metric_log.md` had two
   blank lines between the `## Columns {#columns}` header and the
   `<!--AUTOGENERATED_START-->` marker, which tripped the markdownlint
   `MD012/no-multiple-blanks` rule. Drop the extra blank line.

2. `utils/generate-system-tables-docs` produced that double blank line
   itself: `add_markers_to_file` reads `before = content[:after_header]`
   and then appends `"\n\n"`, but the header pattern's trailing `\s*$`
   greedy-consumes the line's terminating newline, leaving `before` with
   one trailing `\n`. The result was `header\n` + `\n\n` =
   `header\n\n\n` — two blank lines. Strip trailing newlines from
   `before` before reattaching the blank-line separator.

3. The autogenerator created a new minimal `delta_lake_metadata_log.md`
   while the existing `delta_metadata_log.md` already declared
   `slug: /operations/system-tables/delta_lake_metadata_log` (filename
   did not match the table name). Two pages with the same slug failed
   the docusaurus build with `Duplicate routes found!`. Move the rich
   hand-written content into `delta_lake_metadata_log.md` (matching the
   table name so future regenerations stay in place) and delete
   `delta_metadata_log.md`.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104287&sha=cf04a7c101f979c2ec406ea755eea5fddeabf2db&name_0=PR&name_1=Docs%20check

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback on the regenerated docs:

- `filesystem_read_prefetches_log` referred to `MergeTables` (typo);
  rewrite as `MergeTree tables`.
- `s3queue_log` and `azure_queue_log` shared an identical, ungrammatical
  description (`logging entries with the information files processes by
  S3Queue engine`) — and the `azure_queue_log` variant incorrectly
  pointed at the `S3Queue` engine. Rephrase both as
  `Contains log entries with information about files processed by the
  S3Queue/AzureQueue engine.`

The source comments live in `src/Interpreters/SystemLog.h`; update both
the source and the regenerated markdown so they stay in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The regenerator now emits `**Aliases:**` inside the
`<!--AUTOGENERATED_*-->` block, but the hand-written copies that pre-dated
this change were left in place, so 15 system-table pages rendered the
same alias list twice (once inside the block and once immediately after
it).

Remove the duplicate blocks from `clusters.md`, `columns.md`,
`databases.md`, `dimensional_metrics.md`, `dropped_tables_parts.md`,
`events.md`, `histogram_metrics.md`, `jemalloc_bins.md`, `metrics.md`,
`parts.md`, `processes.md`, `projection_parts.md`,
`projection_parts_columns.md`, `replicas.md`, and `user_processes.md`,
keeping only the autogenerated one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- `bytes` ([UInt64](/sql-reference/data-types/int-uint)) — Number of inserted bytes.
- `rows` ([UInt64](/sql-reference/data-types/int-uint)) — Number of inserted rows.
- `exception` ([String](/sql-reference/data-types/string)) — Exception message.
- `status` ([Enum8('Ok' = 0, 'ParsingError' = 1, 'FlushError' = 2)](/sql-reference/data-types/enum)) — Status of the view. Values: 'Ok' = 1 — Successful insert, 'ParsingError' = 2 — Exception when parsing the data, 'FlushError' = 3 — Exception when flushing the data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum values and prose are inconsistent on this line: the type shows 'Ok'=0, 'ParsingError'=1, 'FlushError'=2, but the text says 'Ok'=1 and 'FlushError'=3. This can mislead users reading the docs.

Please align the description with the actual enum values (and consider changing 'Status of the view' to 'Status of the insert'). Since this page is autogenerated, the source column description should be fixed and docs regenerated.

The `Build docusaurus` job fails to parse `histogram_metric_log.md` and
`backup_log.md` because their autogenerated column descriptions contain
the bare sequence `<=`, which MDX treats as the start of a JSX element.

Update the column comments in `HistogramMetricLog.cpp` and `BackupLog.cpp`
to use the Unicode `≤` character (and wrap the `num_entries ≤ num_files`
expression in a code span, matching the previous wording in `backup_log.md`
on `master`), and refresh the corresponding autogenerated blocks.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104287&sha=6671bce9fe64d2e29b1c70a5514ee9748fecc1ab&name_0=PR&name_1=Docs%20check
PR: #104287
The autogenerated description of the `status` column listed enum values
that did not match the actual `Status` enum defined in
`AsynchronousInsertLog.h` (`Ok` = 0, `ParsingError` = 1, `FlushError` = 2)
and described the column as "Status of the view", which is leftover from
`query_views_log` and not applicable here.

Update the column comment in `AsynchronousInsertLog.cpp` to reference
the correct values and to call it the status of the insert, and refresh
the autogenerated block.

PR: #104287
`ObjectStorageQueueLogElement::getColumnsDescription` is shared between
`system.s3queue_log` and `system.azure_queue_log`, but the column
comments hardcoded "S3Queue table" and "the object in s3", which is
misleading on the `azure_queue_log` page after the docs regeneration.

Rephrase the affected comments in `ObjectStorageQueueLog.cpp` to
reference both engines (`S3Queue` or `AzureQueue`) and to talk about
"object storage" instead of `s3`, then refresh the autogenerated blocks
in both pages. Also add trailing periods so the column descriptions are
consistent with the rest of the file.

PR: #104287
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 8, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 100.00% (73/73) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-documentation Documentation PRs for the specific code PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant