Skip to content

feat: paddy-format — lisp-style closing-bracket formatter#684

Merged
paddymul merged 27 commits into
mainfrom
feat/paddy-format
Apr 30, 2026
Merged

feat: paddy-format — lisp-style closing-bracket formatter#684
paddymul merged 27 commits into
mainfrom
feat/paddy-format

Conversation

@paddymul
Copy link
Copy Markdown
Collaborator

@paddymul paddymul commented Apr 29, 2026

Summary

Adds scripts/paddy_format.py — a libcst-based formatter that rewrites Python source toward compact lisp-style brackets:

  1. Collapse on trailing comma — multiline bracket groups whose collapsed form fits in 120 chars become a single line; trailing comma is dropped. (Inverse of Black's magic trailing comma.)
  2. Stack the close — multiline groups without a trailing comma keep their layout, but the closing ) ] } moves up to the previous line.
  3. Empty multiline → flatfunc(\n) becomes func(), same for [\n] / {\n}.
  4. Wrap at 120 — single-line groups that exceed 120 chars are greedy-packed; continuation lines align one column past the open bracket.

Comments anywhere in the affected whitespace block all four transforms — never absorb a comment. Idempotent. Returns input unchanged on syntax errors.

Bracket types covered: Call, List, Set, Dict, Tuple, FunctionDef.params, parenthesized ImportFrom (collapse + stack); Call, List, Set, Dict (wrap). Extending wrap to FunctionDef/Tuple/ImportFrom is a follow-up.

Scope (intentionally narrow)

  • Tool only. Not wired into CI as --check. Not run across the codebase.
  • A separate PR will reformat buckaroo/ and tests/ and add paddy-format --check to LintPython.
  • Adds libcst as a dev dep.

CLI

uv run python scripts/paddy_format.py <files>           # rewrite in place
uv run python scripts/paddy_format.py --check <files>   # exit 1 if any would change

Testing

  • 19 golden input/output cases covering each bracket type, collapse vs stack vs wrap, comments, syntax errors, idempotence.
  • Smoke-tested across all 92 Python files in buckaroo/: zero parse failures, zero idempotency failures.

Notes

  • uv.lock diff is large because current uv bumps the schema (revision 2 → 3, upload_time → upload-time). Real additions are libcst + transitive pyyaml-ft.

🤖 Generated with Claude Code

paddymul and others added 2 commits April 29, 2026 17:18
paddy-format will rewrite Python source so closing brackets ) ] }
stack on the previous line instead of dangling on their own (the
Black/ruff convention). This commit is the failing-test half of TDD:
golden input/output cases for Call/List/Dict/Tuple/Set/FunctionDef/
Import, plus idempotence and graceful-syntax-error tests. The
implementation is a stub passthrough so every interesting case
fails — fix lands in the next commit.

uv.lock churn includes incidental schema bump (revision 2 → 3,
upload_time → upload-time) from current uv; libcst + transitive
pyyaml-ft are the only real additions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-push ruff-format hook required this. Mostly blank lines after
docstrings. Once paddy-format is the canonical formatter for these
files we'll undo this.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 29, 2026

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.13.5.dev25181081785

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.13.5.dev25181081785

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.13.5.dev25181081785" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

Walks the CST and rewrites the whitespace immediately before each
closing bracket so that it stacks on the previous line. Two patterns:

  Type A — Call / FunctionDef params / ImportFrom: the dangling-close
  whitespace lives in `last_item.comma.whitespace_after` (or in
  `whitespace_after_arg` when there is no trailing comma). Drop the
  trailing comma and clear the post-arg whitespace.

  Type B — List / Set / Dict / Tuple-with-parens: the whitespace
  lives on the close-bracket node itself (rbracket / rbrace /
  rpar[0].whitespace_before). Drop the trailing comma and clear that
  whitespace.

Skipped when a comment lives in the affected whitespace — never
absorb a comment by stacking the close. Returns input unchanged on
syntax errors. Idempotent: a second pass is a no-op because the
whitespace is no longer a ParenthesizedWhitespace.

Also adds a CLI: `paddy-format file.py` rewrites in place,
`--check` exits 1 if any file would change.

Smoke-tested against all 92 .py files in the buckaroo package: every
file parses after the rewrite, every file is idempotent, 52 would
be lispified.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@paddymul paddymul marked this pull request as ready for review April 29, 2026 22:05
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7370898c97

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/paddy_format.py Outdated
Comment thread scripts/paddy_format.py
Comment on lines +63 to +64
if not params.params:
return updated
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle function defs without regular params.params

Avoid returning early when params.params is empty, because multiline function signatures can still carry closing-paren whitespace in other parameter groups (for example keyword-only-only signatures like def f(*, a, b,\n):). In those cases this branch skips formatting entirely, so the tool does not apply its advertised closing-bracket rewrite to valid function definitions.

Useful? React with 👍 / 👎.

New rule (and inverse of Black): a trailing comma in a multiline
bracket group signals "this fits on a line" — collapse it. Updates
only the call_with_trailing_comma case for now; this commit is
purposefully red so the fix lands as a separate commit per TDD.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-rule transformer per bracket group (Call args, Function params,
List/Set/Dict/Tuple literals, parenthesized ImportFrom):

  1. Trailing comma → collapse the whole group to one line, drop
     the trailing comma. Trailing comma is the "this fits" signal —
     the inverse of Black's magic-trailing-comma convention.
  2. No trailing comma + multiline → stack the close on the previous
     line (the existing behavior).

Comments still block both transforms — never absorb a comment by
moving whitespace. Single-element tuples (`(x,)`) keep their trailing
comma since it's semantic. Idempotent.

Updates the other golden test cases to expect the collapsed form,
matching the call_with_trailing_comma case set up in the previous
commit.

Smoke-tested across all 92 .py files in the buckaroo package: zero
parse failures, zero idempotency failures, 54 files would change.
Note that without a line-length budget, some long signatures
collapse to very long single lines — a length cap is a reasonable
follow-up if real use surfaces problems.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
paddymul and others added 2 commits April 29, 2026 22:02
func(\n) should become func(). Failing test, fix lands next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
func(\n) → func(), [\n] → [], {\n} → {}, def f(\n): → def f():.
The trailing-comma rule doesn't apply (there's no comma), so this
is a separate path: empty body + multiline interior whitespace =>
flatten to single line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four new golden cases:
- long_call_greedy_wrap: 203-char call wraps greedily; continuation
  lines align one column past the open paren.
- multiline_collapse_target_too_long_wraps_instead: trailing-comma
  multiline whose collapsed form exceeds 120 — wraps greedily,
  trailing comma is dropped (the collapse rule loses to the budget).
- long_list_greedy_wrap: 162-char list wraps greedily.
- unsplittable_single_arg_overflows: a single arg longer than 120
  has nothing to break on; line stays as-is.

Three of the four are red against the current implementation —
fix lands next commit. The fourth passes incidentally because the
current code returns the input unchanged for that shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a second pass that runs after the existing collapse / stack /
empty-flatten transforms. The wrap pass:

  1. Re-parse the source with libcst + PositionProvider metadata.
  2. Find every wrappable bracket group (Call, List, Set, Dict with
     >= 2 items) whose containing line exceeds the budget.
  3. Pick the outermost (leftmost start column, earliest line).
  4. Greedy-pack: lay items left-to-right at column (open_bracket+1);
     break the line whenever adding the next item with its trailing
     comma would push past 120.
  5. Continuation lines are aligned with the column right after the
     open bracket — the lispy style discussed.
  6. Repeat until no more over-budget lines are wrappable.

Trailing-comma collapse from pass 1 still wins when the collapsed
form fits in 120 chars; when it doesn't, pass 2 breaks the line and
the trailing comma stays dropped.

Falls back gracefully on unsplittable cases (single arg longer than
the budget — the line stays over-budget).

This commit handles Call/List/Set/Dict. FunctionDef params, Tuple,
and parenthesized ImportFrom remain on the table as follow-ups —
they're not covered by the test fixtures yet.

All 19 tests pass. Smoke-tested against the buckaroo package: 92
files parse, 92 idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A multiline Call whose continuation line sits at column 0 (legal
inside parens, visually broken) should be re-indented to
original_indent + 4 spaces. Trailing whitespace after the comma is
cleaned up too.

This is a new transformation, distinct from wrap-at-120 (which
aligns continuation with the open paren). Filed as a failing test;
implementation deliberately not in this commit so we can talk about
the apparent conflict with the existing wrap style first — see PR
discussion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Unifies wrap and re-indent under one rule per discussion: continuation
lines always sit at line_indent + 4, never aligned with the open
bracket. Updates the existing wrap test fixtures to expect the new
indent (col 4 instead of cols 28 / 11). Implementation follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related changes that share a single rule: continuation lines of
any multi-line bracket group sit at line_indent + 4.

  1. New _reindent_pass: walks every multi-line Call/List/Set/Dict and
     normalizes its continuation whitespace to line_indent + 4.
     ParenthesizedWhitespace.indent is forced to False so the indent
     count is exactly the last_line value (libcst would otherwise add
     the parent statement's indent on top, breaking idempotency on the
     second run).

  2. Wrap pass updated: greedy_pack now takes separate first_col and
     continuation_col. first_col stays "right after the open bracket"
     (where line 1 places its first item); continuation_col is
     line_indent + 4 (same rule as re-indent). Continuation lines no
     longer align with the open bracket.

Idempotency assertion added to every fixture in
test_paddy_format_golden — every case is now automatically checked
to be a no-op on a second pass.

20 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A new directive that, when placed on the line above an assignment,
formats the value as a column table — one element per row, decimal
points / least-significant digits aligned vertically.

Three failing fixtures:
- single-col floats: decimal points line up
- mixed ints + floats: ints align with the column where the decimal
  point would have been (least-significant digit)
- multi-col tuples: each tuple position is an independent column,
  right-aligned

Design choices baked in (push back if wrong):
- Directive syntax: comment `# table-format` on the line above.
- Output: trailing comma kept, close bracket on its own line at the
  original indent (Black-style data block) — overrides the usual
  trailing-comma-collapse rule when the directive is present.
- Per-column right-alignment of the integer part. Inter-column
  separator stays a plain ", "; tests use examples where col widths
  happen to align — the right-pad-vs-no-trailing-space tradeoff for
  variable col widths is deliberately not in any fixture yet.

Implementation deferred to the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ings

Same fixtures, easier to read — no \n escapes. Matches the style of the
other parametrized cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous hand-typed fixture had a math error (29 input items vs 25
in expected) and inconsistent decimal strides. Regenerated
programmatically with path-1 rules:

  * Each cell is a uniform max_int_width + 1 + max_frac_width chars
    wide (here, 3 + 1 + 2 = 6).
  * Cells are left-padded for the integer part and right-padded for
    the fraction part. Cells like " 45.6 " have a trailing space
    inside before the comma — the cost of strict cross-row alignment.
  * Continuation indent = position right after the open bracket
    (col 8 for "data = [") so the first decimal of each row sits
    at the same column.
  * Decimal stride across cells is exactly 8 chars (cell + ", ").

Input: 24 items (six full cycles of [1.23, 45.6, 7.89, 100.5]).
Output: 14 items on row 1 (length 119), 10 items on row 2
(length 87, ends with `]`).

Implementation pending — multi_col_tuples fixture left as-is for now;
will revisit once the rule for "directive on a short list" is settled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per option (1) — directive is a wrap-time hint, not a force-expand.
Short list with `# table-format` stays on a single line; only the
long-input variants get the table layout.

Changes:
- table_format_multi_col_tuples: now a no-op (input == expected).
- table_format_multi_col_tuples_wrap: new fixture, 12 tuples,
  single-line form exceeds 120 chars, expected output is one tuple
  per line with cells aligned across rows. Continuation indent is
  the standard line_indent + 4 (every row has the same shape, so
  cross-row alignment is automatic).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a `# table-format` (or `#table-format`) comment sits on the
line immediately above an assignment whose value is a List, and that
list's single-line form would exceed the 120-char budget, the list
is laid out as a column table instead of greedy-wrapped.

Two shapes:

  Single-column list of numbers — uniform cells of width
  max_int_width + 1 + max_frac_width. Integer parts are left-padded
  and fraction parts are right-padded, so decimals (or the position
  the decimal would occupy for an int) line up at a fixed offset
  within every cell. Cells are packed greedily into rows; the
  continuation column for row 2+ is the column right after `[`,
  so decimals also line up across rows at exactly an 8-char stride.
  Trailing spaces inside cells before commas are accepted as the
  cost of strict alignment.

  Multi-column list of tuples — each tuple gets its own row at the
  standard line_indent + 4 continuation. Each tuple position is an
  independent column with its own padding; cells inside a row line
  up across rows. Output is data-block style: trailing comma after
  every tuple, close bracket on its own line at the original
  statement indent.

Detection: `# table-format` may live in the module header (for the
first statement) or in a statement's leading_lines (for subsequent
statements, including those nested in IndentedBlocks). Comment text
is matched permissively (with or without space after `#`).

Implementation runs as the final pass in paddy_format(), so it
overrides any prior transforms — Pass 1's collapse rule, Pass 2's
re-indent, and Pass 3's wrap will all touch a directive-marked list
on a re-parse, but Pass 4 always reasserts the table layout, so the
final output is idempotent. Path-1 alignment chosen per the design
discussion in the PR.

25/25 unit tests pass including the per-fixture idempotency check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two minimal fixtures distilled from buckaroo files that the smoke
test flagged as non-idempotent:

- idempotent_outer_call_continuation_shifts_inner_dict (from
  pluggable_analysis_framework/safe_summary_df.py): an outer Call
  with a continuation row, containing a Dict whose inner key sits
  at a column that's tied to the OLD outer continuation. After the
  outer continuation gets re-indented to line_indent+4, the inner
  key still references the old column, and a second pass is needed
  to re-resolve.

- idempotent_nested_list_inside_dict_value (from ddd_library.py):
  same root cause — Pass 1 collapses an outer trailing-comma block,
  shifting the line that holds the start of an inner List. The
  inner List's continuation lines are computed off the inner List's
  start line, which moves between passes.

Both expected outputs are the steady-state (run-2) form. With the
current implementation, paddy_format(input) != steady_state, so the
golden assertion fails AND the in-fixture idempotency check would
fail. Fix follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two interacting passes were each idempotent on their own but together
required a second run to settle:

  1. _reindent_pass uses the *current* line indent of each multi-line
     bracket group's start line. When an outer bracket is re-indented,
     an inner bracket's start line shifts, but the inner indent was
     already computed off the old line.

  2. _wrap_pass on a long single-line collapsed list (Pass 1's output)
     turns it multi-line, which moves the start line of any nested
     multi-line value. The next re-indent run then sees a different
     line_indent and produces a different continuation column.

Two changes:

  * _reindent_pass now wraps a single sweep (_reindent_pass_once) in
    a fixed-point loop. Handles purely nested re-indent cascades.

  * paddy_format() now loops re-indent + wrap + table-format until
    the source stops changing. Handles the cross-pass interaction
    where wrap reveals a new line_indent for an inner group.

Both red repros (idempotent_outer_call_continuation_shifts_inner_dict
and idempotent_nested_list_inside_dict_value) now pass. Smoke test
across all 92 buckaroo .py files: 60 would change, 0 non-idempotent,
0 parse failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two additions to lock down current table-format behavior on number
shapes the existing fixtures didn't cover:

- table_format_ints_only_wrap: max_int_width = 5, max_frac = 0,
  cell width 5. Cells right-aligned to width 5; least-significant
  digits line up across rows at a 7-char stride.

- table_format_mixed_ints_floats_wrap: max_int_width = 2,
  max_frac_width = 3, cell width 6. Ints in a float column are
  padded out with trailing spaces to fill the cell (so "1" renders
  as " 1    " — leading 1, value, then four trailing spaces for
  the missing ".XXX"). Decimal column lines up at offset 2 within
  every cell.

Both pass against the current implementation — they're regression
locks, not red repros. Idempotency assertion in test_paddy_format_golden
covers them automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a failing fixture for `# table-format` applied to a list of
dicts that share keys and have numeric values. Per the design
discussion: each dict < 100 chars, no nested dicts. Each key
becomes a column; values per column are decimal-aligned with
uniform cell widths so the keys themselves line up across rows.

Currently `_TableFormatter.leave_List` only routes to the multi-col
path when every element is a `cst.Tuple`; dict elements fall
through to the single-col / no-op branches. The implementation to
make this pass follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the directive's target is a list of Dicts that share the same
keys in the same order, lay it out one dict per line with values
in each column decimal-aligned (uniform cells, leading int-pad +
trailing frac-pad). Each key — `'a':`, `'b':` — therefore lines
up across rows because the preceding cells are all the same width.

Constraint per the design discussion: each dict has to be a flat
mapping of atom keys (strings or names) to atom values (numbers or
tuples of numbers). Mismatched keys, nested dicts, or non-atom
values trip the function and it returns None — falls through to
the no-op branch in `_TableFormatter.leave_List`.

`_atom_text` extended to render Dict / SimpleString / Name so the
budget check (`_list_compact_length`) knows what a flat dict costs.

The leading int-pad of each value lives in `whitespace_after_colon`
(default one space + the leading-pad width); the trailing frac-pad
lives in the comma's `whitespace_before` for non-last columns and
in the dict's `rbrace.whitespace_before` for the last column. Outer
list uses the same data-block style as the multi-col tuples path:
trailing comma after every dict, close bracket on its own line.

30/30 unit tests pass with idempotency. Smoke-tested across all 92
buckaroo .py files: 0 non-idempotent, 0 parse failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
paddymul and others added 3 commits April 30, 2026 13:28
…y collapse

Four parametrized cases that all fail today:

- comment_in_args_blocks_collapse_close_stays_at_col0: comment-protected
  collapse leaves the source as-is, but the reindent pass still drifts
  the close bracket from col 0 to col 4.
- kwonly_only / kwonly_after_regular / posonly: _collapse_funcdef only
  walks Parameters.params, ignoring posonly_params, kwonly_params, and
  star_arg, so signatures using `*` or `/` get partial / broken collapses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- test_table_format_directive_outside_top_level_is_ignored: pin that
  `# table-format` only fires on top-level / IndentedBlock Assign+List;
  a directive in front of a list nested inside a Call argument is a
  no-op by design.
- test_cli_*: --check returns 1 when changes needed, 0 otherwise; the
  unflagged form rewrites in place and handles a multi-file run.
- test_paddy_format_smoke_on_buckaroo: parametrized round-trip over
  every `buckaroo/**/*.py` file, asserting the output still parses and
  is idempotent. Marked `slow` (~30s for ~90 files); CI's
  `-m "not slow"` skips it. Run locally with `pytest -m slow`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five collapse paths (Call, FunctionDef, Collection, Tuple, ImportFrom)
shared the same shape: validate trailing comma + cleanliness of every
whitespace slot, then rebuild items with `, ` separators and an empty
post-item whitespace. Pull the shape into two module-level helpers and
let each method shrink to ~10 lines. Tuple's single-element trailing-
comma preservation is now a flag (`preserve_singleton_comma=True`).

No behavior change — all existing tests still pass, the four RED
parametrize cases for the comment+reindent and kwonly/posonly bugs
still fail the same way.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
paddymul and others added 2 commits April 30, 2026 13:58
…lapse

Two fixes for the RED tests in 06557d0:

1. _Reindenter no longer reindents the LAST item's comma.whitespace_after
   on a Call / List / Set / Dict. That whitespace sits before the close
   bracket, not before another item — when a comment had blocked the
   collapse pass and left the close on its own line at the user's chosen
   column, the reindent was relocating it to indent+4. Skip the last
   item's comma; whitespace_before for the close stays put.

2. _collapse_funcdef now walks every parameter slot — posonly_params,
   posonly_ind (`/`), params, star_arg (`*` or `*args`), kwonly_params,
   star_kwarg (`**kwargs`) — instead of just `params.params`. New
   `_iter_param_slots` helper yields (kind, item) in source order; the
   collapse rebuilds Parameters with `, ` separators on every non-last
   slot and DEFAULT on the last. Signatures using `*` or `/` now
   collapse fully instead of producing partial multi-line layouts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@paddymul paddymul added this pull request to the merge queue Apr 30, 2026
Merged via the queue into main with commit 4002406 Apr 30, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant