New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

add support for formatting reStructuredText code snippets #9003

Merged

BurntSushi merged 6 commits into main from ag/fmt/rest

Dec 5, 2023

Member

BurntSushi commented Dec 5, 2023 •

edited

(This is not possible to actually use until #8854 is merged.)

ruff_python_formatter: add reStructuredText docstring formatting support

This commit makes use of the refactoring done in prior commits to slot
in reStructuredText support. Essentially, we add a new type of code
example and look for both literal blocks and code block directives.
Literal blocks are treated as Python by default because it seems to be a
common practice.

That is, literal blocks like this:

def example():
    """
    Here's an example::

        foo( 1 )

    All done.
    """
    pass

Will get reformatted. And code blocks (via reStructuredText directives)
will also get reformatted:

def example():
    """
    Here's an example:

    .. code-block:: python

        foo( 1 )

    All done.
    """
    pass

When looking for a code block, it is possible for it to become invalid.
In which case, we back out of looking for a code example and print the
lines out as they are. As with doctest formatting, if reformatting the
code would result in invalid Python or if the code collected from the
block is invalid, then formatting is also skipped.

A number of tests have been added to check both the formatting and
resetting behavior. Mixed indentation is also tested a fair bit, since
one of my initial attempts at dealing with mixed indentation ended up
not working.

I recommend working through this PR commit-by-commit. There is in
particular a somewhat gnarly refactoring before reST support is added.

Closes #8859

BurntSushi added docstring formatter labels

BurntSushi requested review from MichaReiser and charliermarsh

December 5, 2023 00:11

BurntSushi force-pushed the ag/fmt/rest branch from 782586b to 9e0f147 Compare

December 5, 2023 00:12

BurntSushi added this to the Formatter: Stable milestone

Contributor

github-actions bot commented Dec 5, 2023 •

edited

`ruff-ecosystem` results

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser approved these changes

View reviewed changes

Member

MichaReiser left a comment

Nice work.

My only concern is that I find it difficult to assess how the changes introducing the action queue impact performance.

I think we can get a good answer to this by simply enabling docstring formatting for our benchmarks until we find the time to add dedicated docstring tests. This gives us at least an idea on how expensive the feature is for docstring not containing code examples.

crates/ruff_python_formatter/resources/test/fixtures/ruff/docstring_code_examples.py Show resolved Hide resolved

crates/ruff_python_formatter/src/expression/string/docstring.rs Show resolved Hide resolved

crates/ruff_python_formatter/resources/test/fixtures/ruff/docstring_code_examples.py Show resolved Hide resolved

crates/ruff_python_formatter/src/expression/string/docstring.rs Show resolved Hide resolved

crates/ruff_python_formatter/src/expression/string/docstring.rs

+                          self.kind = Some(CodeExampleKind::Rst(litblock));
+                          queue.push_back(CodeExampleAddAction::Print { original });
+                      } else {
+                          queue.push_back(CodeExampleAddAction::Print { original });

Member

MichaReiser Dec 5, 2023

Does this mean that we'll push to the VecDeque even in case where the docstring contains no code exampels? I would be interested in understanding the performance implication of allocating here

Member Author

BurntSushi Dec 5, 2023

Yeah it will. Although it will push to the queue for every line, it should only allocate the first time. Even in the case where a code example is present, I don't expect the queue to ever grow beyond more than a couple elements, so the allocation should be amortized. But it's only amortized within each docstring.

It does indeed look like the happy path in the status quo does not allocate. I still perceive this to likely be a marginal cost, but I'll see about adding some benchmarks to this PR.

crates/ruff_python_formatter/src/expression/string/docstring.rs Show resolved Hide resolved

crates/ruff_python_formatter/src/expression/string/docstring.rs Outdated Show resolved Hide resolved

crates/ruff_python_formatter/src/expression/string/docstring.rs Show resolved Hide resolved

crates/ruff_python_formatter/src/expression/string/docstring.rs

+                          // Pad to the next multiple of tab_width
+                          seen_indent_len += 8 - (seen_indent_len.rem_euclid(8));
+                          line = &line[1..];
+                      } else if char.is_whitespace() {

Member

MichaReiser Dec 5, 2023

Python only supports \t and (space) as valid indentation characters (and \f form feeds that reset the indentation). It's unclear to me if supporting all white space is okay because we're inside a docstring or should limit it to Python-whitespace only (the same applies for indent_with_suffix, we have trim_whitespace_start helper to do so)

Member Author

BurntSushi Dec 5, 2023

Hmmm. I see. I hadn't known about this or the trim_whitespace_start helper.

One possible issue I see here is that the docstring whitespace normalization uses just the bare trim_start() and trim_end() routines in places. Is that correct, or should those be switched to the Python specific definition too? Same deal with indentation_length. It uses char::is_whitespace from std, which covers all the Unicode forms of whitespace too.

In any case, I've updated my uses of whitespace to trim_whitespace_start and is_python_whitespace. (Probably that module should get re-tooled a bit. i.e., Add an extension trait for char and rename trim_whitespace_start and assorted methods to trim_python_start.)

crates/ruff_python_formatter/src/expression/string/docstring.rs Outdated Show resolved Hide resolved

Member

MichaReiser commented Dec 5, 2023

@zanieb should we remove the formatter label from the PR, considering that the functionality isn't available yet or will we remove the changelog entry manually?

Member

charliermarsh commented Dec 5, 2023

@MichaReiser - We clean the changelogs manually, so no issue keeping the label on there.

BurntSushi added 4 commits

December 5, 2023 09:06


          ruff_python_formatter: fix bug in doctest formatting

8a9a877

This fixes a bug where if the very last line in a docstring is a
doctest, then it wasn't getting formatted.

We do a little more than fix that bug in this commit. In particular, we
industrialize how we handle actions by using a queue. Instead of trying
to do too much with each action, we make each one a bit simpler and
build the infrastructure necessary to permit more arbitrary composition.
This in particular lets us handle weird corner cases like "the code
example and the docstring end at the same time" without much fuss.

In essence, actions like Format and Reset no longer carry the "original"
line. Instead of doing that, we generate two actions: the first to
format and the second (if necessary) to print the original line.

This infrastructure will become more apparently useful when dealing with
reStructuredText blocks.


          ruff_python_formatter: simplify format's contract

6483d89

There was really no good reason to have it panic if the code block given
is empty. Instead, we can just return `None` and everything will get
handled correctly. (It will turn into a `Reset` with zero lines, which
will in turn be a no-op.)

In practice I do believe empty code blocks are not possible, but I felt
like this makes the code a bit more robust.


          ruff_python_formatter: force space indent in code snippets

6cebe63

The comment in the source code should explain, but basically,
if the user has tab indentation enabled, then this winds up
screwing with code snippet formatting. The essential problem
seems to be that docstring normalization strips tabs, and this
interplay between code snippet reformatting with tabs winds up
making formatting non-idempotent.

We justify this hard-coded option by pointing out that tabs
get stripped as part of docstring normalization anyway.


          ruff_python_formatter: group imports

148372c

Small style tweak.

. o O { too bad `imports_granularity` isn't stable in rustfmt yet }

BurntSushi force-pushed the ag/fmt/rest branch from 9e0f147 to 9769bf6 Compare

December 5, 2023 16:36

BurntSushi added 2 commits

December 5, 2023 11:46


          ruff_python_formatter: add reStructuredText docstring formatting support

e53053e

This commit makes use of the refactoring done in prior commits to slot
in reStructuredText support. Essentially, we add a new type of code
example and look for *both* literal blocks and code block directives.
Literal blocks are treated as Python by default because it seems to be a
common practice[1].

That is, literal blocks like this:

```
def example():
    """
    Here's an example::

        foo( 1 )

    All done.
    """
    pass
```

Will get reformatted. And code blocks (via reStructuredText directives)
will also get reformatted:

```
def example():
    """
    Here's an example:

    .. code-block:: python

        foo( 1 )

    All done.
    """
    pass
```

When looking for a code block, it is possible for it to become invalid.
In which case, we back out of looking for a code example and print the
lines out as they are. As with doctest formatting, if reformatting the
code would result in invalid Python or if the code collected from the
block is invalid, then formatting is also skipped.

A number of tests have been added to check both the formatting and
resetting behavior. Mixed indentation is also tested a fair bit, since
one of my initial attempts at dealing with mixed indentation ended up
not working.

Closes #8859

[1]: adamchainz/blacken-docs#195


          ruff_python_formatter: add line breaks between fields

618e48d

This is what seems consistent with the prevailing code.

BurntSushi force-pushed the ag/fmt/rest branch from 9769bf6 to 618e48d Compare

December 5, 2023 16:47

Member Author

BurntSushi commented Dec 5, 2023 •

edited

In lieu of micro-benchmarks, I decided to do a little ad hoc benchmarking with hyperfine on dagster.

Using this branch with a cherry-pick of #8854, I ran formatting with the default config and formatting with format-code-in-docstrings enabled:

$ hyperfine \
    --warmup 3 \
    --prepare 'git reset --hard master' \
    --cleanup 'git reset --hard master' \
    'ruff format --config /tmp/emptyruff.toml ./' \
    'ruff format --config /tmp/ruff.toml ./'
Benchmark 1: ruff format --config /tmp/emptyruff.toml ./
  Time (mean ± σ):      84.4 ms ±   1.8 ms    [User: 1084.2 ms, System: 155.3 ms]
  Range (min … max):    80.8 ms …  88.2 ms    16 runs

Benchmark 2: ruff format --config /tmp/ruff.toml ./
  Time (mean ± σ):      85.1 ms ±   2.1 ms    [User: 1104.4 ms, System: 160.0 ms]
  Range (min … max):    82.1 ms …  89.5 ms    16 runs

Summary
  ruff format --config /tmp/emptyruff.toml ./ ran
    1.01 ± 0.03 times faster than ruff format --config /tmp/ruff.toml ./

Where:

$ git remote -v
origin  git@github.com:dagster-io/dagster (fetch)
origin  git@github.com:dagster-io/dagster (push)

$ git rev-parse HEAD
dbb064c2ddda74265b8174edd9775e1302ca6ba0

$ cat /tmp/emptyruff.toml
$ cat /tmp/ruff.toml
[format]
format-code-in-docstrings = true

I ran it multiple times, and the same result occurred. So there is just a very slight observable slow-down here. But, this particular pile of code does have a large number of reStructuredText code blocks. So you'd expect it to possibly run a little more slowly since it is doing more work.

On a profile, I can see run_action_queue, but it is barely a blip. To be sure, I tweaked the top-level entry point for code snippet formatting to do this:

if !self.code_example.kind.is_none()
    || CodeExampleDoctest::new(line).is_some()
    || CodeExampleRst::new(line).is_some()
{
    self.code_example.add(line, &mut self.action_queue);
    self.run_action_queue()
} else {
    self.print_one(&line.as_output())
}

So basically, as long as we weren't already collecting a code example and the current line didn't look like the start of one, then we could avoid the queue and just print the line directly. I then baked this off against what I had:

$ hyperfine \
    --warmup 3 \
    --prepare 'git reset --hard master' \
    --cleanup 'git reset --hard master' \
    'ruff-rst-formatting-with-queue format --config /tmp/emptyruff.toml ./' \
    'ruff-rst-formatting-with-queue format --config /tmp/ruff.toml ./' \
    'ruff-rst-formatting-with-fast-path format --config /tmp/ruff.toml ./'
Benchmark 1: ruff-rst-formatting-with-queue format --config /tmp/emptyruff.toml ./
  Time (mean ± σ):      84.9 ms ±   2.2 ms    [User: 1077.6 ms, System: 163.4 ms]
  Range (min … max):    80.1 ms …  89.4 ms    17 runs

Benchmark 2: ruff-rst-formatting-with-queue format --config /tmp/ruff.toml ./
  Time (mean ± σ):      85.2 ms ±   2.5 ms    [User: 1119.1 ms, System: 148.7 ms]
  Range (min … max):    81.7 ms …  92.3 ms    17 runs

Benchmark 3: ruff-rst-formatting-with-fast-path format --config /tmp/ruff.toml ./
  Time (mean ± σ):      86.0 ms ±   2.2 ms    [User: 1098.4 ms, System: 163.7 ms]
  Range (min … max):    81.4 ms …  89.7 ms    16 runs

Summary
  ruff-rst-formatting-with-queue format --config /tmp/emptyruff.toml ./ ran
    1.00 ± 0.04 times faster than ruff-rst-formatting-with-queue format --config /tmp/ruff.toml ./
    1.01 ± 0.04 times faster than ruff-rst-formatting-with-fast-path format --config /tmp/ruff.toml ./

I ran this a few times and sometimes the fast path would be a hair faster and other times it would be flipped. Running without docstring formatting enabled at all was consistently faster by 1.00-1.01 times.

I think this satisfies me personally that the queue overhead is probably negligible, at least until we have some data suggesting otherwise. (I'm sure there are more synthetic benchmarks one could construct that might show a bigger difference.)

Member Author

BurntSushi commented Dec 5, 2023

Going to bring this, but @MichaReiser feel free to leave more feedback and I'll address it in a follow-up PR. :-)

BurntSushi merged commit c48ba69 into main

17 checks passed

BurntSushi deleted the ag/fmt/rest branch

December 5, 2023 19:14

Member

MichaReiser commented Dec 5, 2023

Thanks for running the manual benchmark. This looks good to me.

sciyoshi mentioned this pull request

Formatter: Uses space indentation in docstrings when using indent-style = "tab" #8430

Closed

krehel mentioned this pull request

ruff 0.1.8 Homebrew/homebrew-core#157266

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment