Skip to content

Pretty formats: optional non-breaking-space leading padding#103559

Open
ashrithb wants to merge 1 commit intoClickHouse:masterfrom
ashrithb:fix/issue-95122-pretty-leading-non-breaking-space
Open

Pretty formats: optional non-breaking-space leading padding#103559
ashrithb wants to merge 1 commit intoClickHouse:masterfrom
ashrithb:fix/issue-95122-pretty-leading-non-breaking-space

Conversation

@ashrithb
Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add a session setting output_format_pretty_use_nbsp_for_leading_padding (default false). When enabled, the leading row-number padding and the indent before grid borders in Pretty, PrettyCompact, and PrettySpace formats are rendered with the UTF-8 encoding of U+00A0 NO-BREAK SPACE instead of an ASCII space. The visual width is identical in monospace, but the padding survives copy-paste through tools that compress or trim runs of regular spaces. Only takes effect when output_format_pretty_grid_charset = UTF-8; suppressed under ASCII charset by design.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Motivation

When users copy Pretty/PrettyCompact output through tools that
compress runs of ASCII spaces (Slack, certain markdown renderers, some
terminal pasteboards), the leading row-number padding and the indent
before grid borders get mangled — the columns no longer line up.

Fix

Add a session setting output_format_pretty_use_nbsp_for_leading_padding
(default false). When the setting is enabled AND
output_format_pretty_grid_charset is UTF-8, the leading whitespace
left_blank string in PrettyBlockOutputFormat is rendered with the
UTF-8 encoding of U+00A0 NO-BREAK SPACE (0xC2 0xA0) instead of an
ASCII space (0x20). The visual width is identical in monospace, but
NBSP survives the space-collapsing tools because they do not normalise
non-ASCII whitespace.

The setting is suppressed under ASCII charset by design — the grid is
rendered with ASCII characters, so NBSP would look out of place.

Default false so existing reference output stays bit-identical and no
existing .reference files need to be regenerated. The
SettingsChangesHistory.cpp entry is in the 26.5 block so the new
setting is visible to settings-history consumers.

Test

Adds tests/queries/0_stateless/04123_pretty_nbsp_leading_padding.sh
which uses od -An -v -tx1 to inspect the actual leading bytes of the
output and asserts:

  • default: first byte of the column-header line is 0x20 (ASCII space)
  • setting on, UTF-8 charset: first two bytes are 0xC2 0xA0 (NBSP)
  • setting on, ASCII charset: first byte stays 0x20 (NBSP suppressed)

Closes #95122
nbsp_demo

Add output_format_pretty_use_nbsp_for_leading_padding (default false).
When the setting is enabled and the grid charset is UTF-8, the leading
whitespace at the start of each Pretty/PrettyCompact line - the
row-number indent and the indent before grid borders - is rendered as
the UTF-8 encoding of U+00A0 NO-BREAK SPACE (0xC2 0xA0) instead of
ASCII space.

The output stays visually identical in monospace, but the padding
survives copy-paste through tools that compress or trim runs of
regular spaces, which preserves the table layout end-to-end. Default
is off so existing reference outputs and consumers are untouched; the
setting is a behaviour opt-in. ASCII charset (output_format_pretty_grid_charset = ASCII) keeps ASCII spaces even when the setting is on,
because non-ASCII bytes in an ASCII grid would defeat the point of
the ASCII output.

Adds a stateless .sh test that pins the leading bytes for the three
combinations (default, opt-in, opt-in + ASCII charset).

Resolves ClickHouse#95122
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Apr 27, 2026
@alexey-milovidov
Copy link
Copy Markdown
Member

The second screenshot shows, that it would be better to use it for all type of padding.
Let's rename the setting, and change the behavior accordingly.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 27, 2026

Workflow [PR], commit [f0f71d6]

Summary:

job_name test_name status info comment
Stress test (amd_msan) FAIL
Logical error: Port is already connected, (header: [A]) (STID: 0996-3f42) FAIL cidb
AST fuzzer (amd_debug) FAIL
Logical error: Variant A (B) has size C, but expected D (STID: 3068-3371) FAIL cidb

AI Review

Summary

This PR adds a new setting, output_format_pretty_use_nbsp_for_leading_padding, and wires it through Pretty format rendering plus a stateless test. The implementation change looks consistent, but there is one test-quality issue that should be fixed before merge.

Findings

⚠️ Majors

  • [tests/queries/0_stateless/04123_pretty_nbsp_leading_padding.sh:17] The test comment says it validates the "second line" (column header row), but the command uses sed -n '1p'. This mismatch makes the test fragile and can miss regressions in row-prefix padding code paths. Update the sampled line (or comment) so the test verifies the intended row and covers both leading-padding paths.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions: align the test’s sampled line with its stated intent (or adjust the intent/comment) and ensure the assertion covers the intended Pretty leading-padding path.

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Apr 27, 2026
# which carries leading row-number padding) and look for a leading 0x20.
default_first_bytes=$(
$CLICKHOUSE_CLIENT --query="SELECT 1 AS x FORMAT PrettyCompact" \
| sed -n '1p' | head -c 3 | od -An -tx1 | tr -d ' \n'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sed -n '1p' contradicts the comment above ("second line") and only samples the very first rendered line. That can miss regressions in the data-row prefix path (row_num_string padding), which is generated separately.

Please either switch to 2p (to actually test the column-header line) or update the assertion to explicitly target a data row so both leading-padding paths are covered.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 27, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 97.56% (40/41) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Try to use non-breaking space characters as leading symbols in Pretty formats

2 participants