Pretty formats: optional non-breaking-space leading padding#103559
Pretty formats: optional non-breaking-space leading padding#103559ashrithb wants to merge 1 commit intoClickHouse:masterfrom
Conversation
Add output_format_pretty_use_nbsp_for_leading_padding (default false). When the setting is enabled and the grid charset is UTF-8, the leading whitespace at the start of each Pretty/PrettyCompact line - the row-number indent and the indent before grid borders - is rendered as the UTF-8 encoding of U+00A0 NO-BREAK SPACE (0xC2 0xA0) instead of ASCII space. The output stays visually identical in monospace, but the padding survives copy-paste through tools that compress or trim runs of regular spaces, which preserves the table layout end-to-end. Default is off so existing reference outputs and consumers are untouched; the setting is a behaviour opt-in. ASCII charset (output_format_pretty_grid_charset = ASCII) keeps ASCII spaces even when the setting is on, because non-ASCII bytes in an ASCII grid would defeat the point of the ASCII output. Adds a stateless .sh test that pins the leading bytes for the three combinations (default, opt-in, opt-in + ASCII charset). Resolves ClickHouse#95122
|
The second screenshot shows, that it would be better to use it for all type of padding. |
|
Workflow [PR], commit [f0f71d6] Summary: ❌
AI ReviewSummaryThis PR adds a new setting, Findings
ClickHouse Rules
Final Verdict
|
| # which carries leading row-number padding) and look for a leading 0x20. | ||
| default_first_bytes=$( | ||
| $CLICKHOUSE_CLIENT --query="SELECT 1 AS x FORMAT PrettyCompact" \ | ||
| | sed -n '1p' | head -c 3 | od -An -tx1 | tr -d ' \n' |
There was a problem hiding this comment.
sed -n '1p' contradicts the comment above ("second line") and only samples the very first rendered line. That can miss regressions in the data-row prefix path (row_num_string padding), which is generated separately.
Please either switch to 2p (to actually test the column-header line) or update the assertion to explicitly target a data row so both leading-padding paths are covered.
LLVM Coverage Report
Changed lines: 97.56% (40/41) · Uncovered code |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Add a session setting
output_format_pretty_use_nbsp_for_leading_padding(defaultfalse). When enabled, the leading row-number padding and the indent before grid borders inPretty,PrettyCompact, andPrettySpaceformats are rendered with the UTF-8 encoding ofU+00A0NO-BREAK SPACE instead of an ASCII space. The visual width is identical in monospace, but the padding survives copy-paste through tools that compress or trim runs of regular spaces. Only takes effect whenoutput_format_pretty_grid_charset = UTF-8; suppressed under ASCII charset by design.Documentation entry for user-facing changes
Motivation
When users copy
Pretty/PrettyCompactoutput through tools thatcompress runs of ASCII spaces (Slack, certain markdown renderers, some
terminal pasteboards), the leading row-number padding and the indent
before grid borders get mangled — the columns no longer line up.
Fix
Add a session setting
output_format_pretty_use_nbsp_for_leading_padding(default
false). When the setting is enabled ANDoutput_format_pretty_grid_charsetis UTF-8, the leading whitespaceleft_blankstring inPrettyBlockOutputFormatis rendered with theUTF-8 encoding of
U+00A0NO-BREAK SPACE (0xC2 0xA0) instead of anASCII space (
0x20). The visual width is identical in monospace, butNBSP survives the space-collapsing tools because they do not normalise
non-ASCII whitespace.
The setting is suppressed under ASCII charset by design — the grid is
rendered with ASCII characters, so NBSP would look out of place.
Default
falseso existing reference output stays bit-identical and noexisting
.referencefiles need to be regenerated. TheSettingsChangesHistory.cppentry is in the 26.5 block so the newsetting is visible to settings-history consumers.
Test
Adds
tests/queries/0_stateless/04123_pretty_nbsp_leading_padding.shwhich uses
od -An -v -tx1to inspect the actual leading bytes of theoutput and asserts:
0x20(ASCII space)0xC2 0xA0(NBSP)0x20(NBSP suppressed)Closes #95122
