Restore default config in generate-system-tables-docs to cover system logs#104287
Restore default config in generate-system-tables-docs to cover system logs#104287alexey-milovidov wants to merge 9 commits intomasterfrom
generate-system-tables-docs to cover system logs#104287Conversation
…em logs Reinstates `utils/generate-system-tables-docs-config.xml` as the default config used by the docs generator and adds `prepare_system_log_tables_on_startup` so that all configured `*_log` tables are created up-front in `clickhouse-local` and their schemas are visible in `system.columns` without requiring a flush. `clickhouse-local` already initializes system logs whenever any `<*_log>` section is present in the config (via `hasAnySystemLogConfigured`), so no server-side change is needed; the previous claim that "log tables are not created in clickhouse-local because initializeSystemLogs() requires full server infrastructure" was stale. With the default config restored, the docs job now generates entries for `query_log`, `trace_log`, `text_log`, `metric_log`, `error_log`, `query_thread_log`, `part_log`, `crash_log`, `session_log`, `opentelemetry_span_log`, `query_views_log`, `zookeeper_log`, `processors_profile_log`, `asynchronous_insert_log`, `backup_log`, `blob_storage_log`, `query_metric_log`, `dead_letter_queue`, `zookeeper_connection_log`, `aggregated_zookeeper_log`, `iceberg_metadata_log`, `delta_lake_metadata_log`, `predicate_statistics_log`, `histogram_metric_log`, `asynchronous_metric_log`, `background_schedule_pool_log`, and the previously-undocumented `azure_queue_log`, `s3queue_log`, `filesystem_cache_log`, `filesystem_read_prefetches_log`, and `transactions_info_log`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Workflow [PR], commit [0f0d517] Summary: ❌
AI ReviewSummaryThis PR restores the default config for ClickHouse Rules
Final Verdict
|
Style check was failing because the regenerated `*_log` docs surface identifiers and descriptions from C++ code that the aspell dictionary hasn't seen before: - Add 181 technical terms (class names, methods, events, ZooKeeper error codes, etc.) to `ci/jobs/scripts/check_style/aspell-ignore/en/aspell-dict.txt`. - Fix five typos in metric/event descriptions that ended up in the docs via `system.columns`: `constanstans` → `constants`, `aggragating` → `aggregating`, `oudated` → `outdated`, `acquite` → `acquire`, `idnex` → `index`. Style check report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104287&sha=437e062f7e7b750dd2147373d010221c890d8539&name_0=PR&name_1=Style%20check Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| doc_type: 'reference' | ||
| --- | ||
|
|
||
| Contains a history of all prefetches done during reading from MergeTables backed by a remote filesystem. |
There was a problem hiding this comment.
Wording typo: MergeTables looks incorrect here; this should likely be MergeTree tables (or a more precise term if you intended a different engine family).
Could you fix the source description and regenerate this page?
| doc_type: 'reference' | ||
| --- | ||
|
|
||
| Contains logging entries with the information files processes by S3Queue engine. |
There was a problem hiding this comment.
Typo/wording issue in the table description: information files processes is grammatically incorrect, and this page is for azure_queue_log but the sentence says S3Queue.
Please update the source comment (likely in SystemLog.h, then regenerate) to something like: "Contains log entries with information about files processed by the AzureQueue engine." (and similarly fix s3queue_log wording).
| - `replication_lag` ([Nullable(UInt32)](/sql-reference/data-types/nullable)) — The replication lag of the `Replicated` database replica (for clusters that belong to a Replicated database). | ||
| - `recovery_time` ([Nullable(UInt64)](/sql-reference/data-types/nullable)) — The recovery time of the `Replicated` database replica (for clusters that belong to a Replicated database), in milliseconds. | ||
|
|
||
| **Aliases:** |
There was a problem hiding this comment.
This regeneration introduces a duplicate aliases section in the same page: one inside the <!--AUTOGENERATED_*--> block and another immediately after it. The same duplication appears in multiple files touched by this PR (clusters.md, columns.md, metrics.md, etc.), which makes the rendered docs repetitive.
Please keep aliases in one place only (preferably inside the autogenerated block) and remove the trailing manual **Aliases:** blocks from affected pages.
Three follow-ups for the docs regeneration in PR #104287: 1. `docs/en/operations/system-tables/histogram_metric_log.md` had two blank lines between the `## Columns {#columns}` header and the `<!--AUTOGENERATED_START-->` marker, which tripped the markdownlint `MD012/no-multiple-blanks` rule. Drop the extra blank line. 2. `utils/generate-system-tables-docs` produced that double blank line itself: `add_markers_to_file` reads `before = content[:after_header]` and then appends `"\n\n"`, but the header pattern's trailing `\s*$` greedy-consumes the line's terminating newline, leaving `before` with one trailing `\n`. The result was `header\n` + `\n\n` = `header\n\n\n` — two blank lines. Strip trailing newlines from `before` before reattaching the blank-line separator. 3. The autogenerator created a new minimal `delta_lake_metadata_log.md` while the existing `delta_metadata_log.md` already declared `slug: /operations/system-tables/delta_lake_metadata_log` (filename did not match the table name). Two pages with the same slug failed the docusaurus build with `Duplicate routes found!`. Move the rich hand-written content into `delta_lake_metadata_log.md` (matching the table name so future regenerations stay in place) and delete `delta_metadata_log.md`. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104287&sha=cf04a7c101f979c2ec406ea755eea5fddeabf2db&name_0=PR&name_1=Docs%20check Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback on the regenerated docs: - `filesystem_read_prefetches_log` referred to `MergeTables` (typo); rewrite as `MergeTree tables`. - `s3queue_log` and `azure_queue_log` shared an identical, ungrammatical description (`logging entries with the information files processes by S3Queue engine`) — and the `azure_queue_log` variant incorrectly pointed at the `S3Queue` engine. Rephrase both as `Contains log entries with information about files processed by the S3Queue/AzureQueue engine.` The source comments live in `src/Interpreters/SystemLog.h`; update both the source and the regenerated markdown so they stay in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The regenerator now emits `**Aliases:**` inside the `<!--AUTOGENERATED_*-->` block, but the hand-written copies that pre-dated this change were left in place, so 15 system-table pages rendered the same alias list twice (once inside the block and once immediately after it). Remove the duplicate blocks from `clusters.md`, `columns.md`, `databases.md`, `dimensional_metrics.md`, `dropped_tables_parts.md`, `events.md`, `histogram_metrics.md`, `jemalloc_bins.md`, `metrics.md`, `parts.md`, `processes.md`, `projection_parts.md`, `projection_parts_columns.md`, `replicas.md`, and `user_processes.md`, keeping only the autogenerated one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| - `bytes` ([UInt64](/sql-reference/data-types/int-uint)) — Number of inserted bytes. | ||
| - `rows` ([UInt64](/sql-reference/data-types/int-uint)) — Number of inserted rows. | ||
| - `exception` ([String](/sql-reference/data-types/string)) — Exception message. | ||
| - `status` ([Enum8('Ok' = 0, 'ParsingError' = 1, 'FlushError' = 2)](/sql-reference/data-types/enum)) — Status of the view. Values: 'Ok' = 1 — Successful insert, 'ParsingError' = 2 — Exception when parsing the data, 'FlushError' = 3 — Exception when flushing the data |
There was a problem hiding this comment.
The enum values and prose are inconsistent on this line: the type shows 'Ok'=0, 'ParsingError'=1, 'FlushError'=2, but the text says 'Ok'=1 and 'FlushError'=3. This can mislead users reading the docs.
Please align the description with the actual enum values (and consider changing 'Status of the view' to 'Status of the insert'). Since this page is autogenerated, the source column description should be fixed and docs regenerated.
The `Build docusaurus` job fails to parse `histogram_metric_log.md` and `backup_log.md` because their autogenerated column descriptions contain the bare sequence `<=`, which MDX treats as the start of a JSX element. Update the column comments in `HistogramMetricLog.cpp` and `BackupLog.cpp` to use the Unicode `≤` character (and wrap the `num_entries ≤ num_files` expression in a code span, matching the previous wording in `backup_log.md` on `master`), and refresh the corresponding autogenerated blocks. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104287&sha=6671bce9fe64d2e29b1c70a5514ee9748fecc1ab&name_0=PR&name_1=Docs%20check PR: #104287
The autogenerated description of the `status` column listed enum values that did not match the actual `Status` enum defined in `AsynchronousInsertLog.h` (`Ok` = 0, `ParsingError` = 1, `FlushError` = 2) and described the column as "Status of the view", which is leftover from `query_views_log` and not applicable here. Update the column comment in `AsynchronousInsertLog.cpp` to reference the correct values and to call it the status of the insert, and refresh the autogenerated block. PR: #104287
`ObjectStorageQueueLogElement::getColumnsDescription` is shared between `system.s3queue_log` and `system.azure_queue_log`, but the column comments hardcoded "S3Queue table" and "the object in s3", which is misleading on the `azure_queue_log` page after the docs regeneration. Rephrase the affected comments in `ObjectStorageQueueLog.cpp` to reference both engines (`S3Queue` or `AzureQueue`) and to talk about "object storage" instead of `s3`, then refresh the autogenerated blocks in both pages. Also add trailing periods so the column descriptions are consistent with the rest of the file. PR: #104287
LLVM Coverage Report
Changed lines: 100.00% (73/73) · Uncovered code |
Reinstates
utils/generate-system-tables-docs-config.xmlas the default config forutils/generate-system-tables-docsand adds<prepare_system_log_tables_on_startup>true</prepare_system_log_tables_on_startup>so that all configured*_logtables are created up-front inclickhouse-localand their schemas are visible insystem.columnswithout requiring a flush.clickhouse-localalready initializes system logs whenever any<*_log>section is present in the config (viahasAnySystemLogConfigured, seeprograms/local/LocalServer.cpp:1187-1197), so no server-side change was needed; the previous claim in the dropped XML that "log tables are not created inclickhouse-localbecauseinitializeSystemLogsrequires full server infrastructure" was stale and has been removed.The included docs regeneration adds previously-missing pages for
azure_queue_log,s3queue_log,filesystem_cache_log,filesystem_read_prefetches_log,delta_lake_metadata_log, andtransactions_info_log, and refreshes the<!--AUTOGENERATED-->blocks for the remaining*_logtables (query_log,trace_log,text_log,metric_log,error_log,query_thread_log,part_log,crash_log,session_log,opentelemetry_span_log,query_views_log,zookeeper_log,processors_profile_log,asynchronous_insert_log,backup_log,blob_storage_log,query_metric_log,dead_letter_queue,zookeeper_connection_log,aggregated_zookeeper_log,iceberg_metadata_log,predicate_statistics_log,histogram_metric_log,asynchronous_metric_log,background_schedule_pool_log).Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Auto-generated documentation for system log tables (
query_log,trace_log, etc.) is now produced from the source byutils/generate-system-tables-docs.Documentation entry for user-facing changes
🤖 Generated with Claude Code