Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split implicit concatenated strings before binary expressions #7145

Merged
merged 3 commits into from
Sep 8, 2023

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Sep 5, 2023

Summary

This PR improves Black compatibility by breaking implicit concatenated string before any binary expression.

(
    b + c + d + "aaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
    "bbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
    "cccccccccccccccccccccccccc" % aaaaaaaaaaaa + x
)

You can see how Black inserts line breaks between the implicit concatenated string parts but it keeps the binary expression operators and operands on the same line.

The challenge with implementing this formatting is that it is necessary that the outermost group is the group around the implicit concatenated strings but this doesn't match our AST structure: The strings are leaf nodes in the AST. See the tree for a very basic example a + 'b' 'c'

graph
	+ --> a
	+ --> A
    A["'b' 'c'"]
Loading

The existing implementation used a custom layout for when the implicit concatenated string is on the left side but it isn't able to handle implicit concatenated strings on the right or in nested binary expressions.

This PR solves this by flattening the binary expression. From the code documentation

/// Binary chain represented as a flat vector where operands are stored at even indices and operators
/// add odd indices.
///
/// ```python
/// a + 5 * 3 + 2
/// ```
///
/// Gets parsed as:
///
/// ```text
/// graph
/// +
/// ├──a
/// ├──*
/// │   ├──b
/// │   └──c
/// └──d
/// ```
///
/// The slice representation of the above is closer to what you have in source. It's a simple sequence of operands and operators,
/// entirely ignoring operator precedence (doesn't flatten parenthesized expressions):
///
/// ```text
/// -----------------------------
/// | a | + | 5 | * | 3 | + | 2 |
/// -----------------------------
/// ```
///
/// The advantage of a flat structure are:
/// * It becomes possible to adjust the operator / operand precedence. E.g splitting implicit concatenated strings before `+` operations.
/// * It allows arbitrary slicing of binary expressions for as long as a slice always starts and ends with an operand.
///
/// A slice is guaranteed to always start and end with an operand. The smallest valid slice is a slice containing a single operand.
/// Operands in multi-operand slices are separated by operators.

The flat representation allows to first split the binary expression by implicit concatenated strings and only then, operator by operator based on the operator precedence.

Test Plan

I added new tests, reviewed that the changes now match Black's formatting.

The PR improves the black compatibility

PR

project similarity index total files changed files
cpython 0.76083 1789 1633
django 0.99966 2760 58
transformers 0.99928 2587 454
twine 0.99982 33 1
typeshed 0.99978 3496 2173
warehouse 0.99823 648 23
zulip 0.99948 1437 28

Base

project similarity index total files changed files
cpython 0.76083 1789 1632
django 0.99957 2760 67
transformers 0.99928 2587 456
twine 0.99982 33 1
typeshed 0.99978 3496 2173
warehouse 0.99818 648 24
zulip 0.99942 1437 32

@MichaReiser MichaReiser added the formatter Related to the formatter label Sep 5, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

@MichaReiser MichaReiser added this to the Formatter: Beta milestone Sep 6, 2023
@MichaReiser MichaReiser marked this pull request as ready for review September 6, 2023 12:42
@MichaReiser

This comment was marked as outdated.

}

pub(crate) enum FormatTrailingComments<'a> {
Node(AnyNodeRef<'a>),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the Node variance because this PR removes the last trailing_node_comments reference

use ruff_python_ast::Operator;

#[derive(Copy, Clone)]
pub struct FormatOperator;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this out of the bin_expr formatting because it is also used in augmented assign and it removes the size of the bin_expr file a bit. The code itself is unchanged.

@@ -168,10 +168,9 @@ pub fn format_node<'a>(
}

/// Public function for generating a printable string of the debug comments.
pub fn pretty_comments(formatted: &Formatted<PyFormatContext>, source: &str) -> String {
let comments = formatted.context().comments();
pub fn pretty_comments(root: &Mod, comment_ranges: &CommentRanges, source: &str) -> String {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not strictly related to the PR but I ran into situations where I missed to format some comments but was unable to see the comments in the playground because it extracted them from the formatted output...

This change now allows to see the comments even if the formatting itself fails.

"bbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
"cccccccccccccccccccccccccc" % aaaaaaaaaaaa
+ x
"cccccccccccccccccccccccccc" % aaaaaaaaaaaa + x
)

(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add support for compare expressions in a separate PR

let chain = BinaryChain::from_binary_expression(item, &comments, f.context().source());

let source = f.context().source();
let mut string_operands = chain
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add support for f-strings in a separate PR

@@ -27,120 +29,188 @@ pub struct FormatExprBinOp;
impl FormatNodeRule<ExprBinOp> for FormatExprBinOp {
fn fmt_fields(&self, item: &ExprBinOp, f: &mut PyFormatter) -> FormatResult<()> {
let comments = f.context().comments().clone();

match Self::layout(item, f.context()) {
BinOpLayout::LeftString(expression) => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LeftString was a "cheap" implementation of the new formatting logic that only supported implicit concatenated strings at the left of a binary expression. The new formatting supports implicit concatenated strings in arbitrary positions.

Comment on lines 414 to 419
#[allow(unsafe_code)]
unsafe {
// SAFETY: `BinaryChainSlice` has the same layout as a slice because it uses `repr(transparent)`
&*(slice as *const [OperandOrOperator<'a>] as *const BinaryChainSlice<'a>)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the same layout is necessary to implement Deref on BinaryChain

@MichaReiser
Copy link
Member Author

I moved the formatting to its own file as preparation for re-using it for ExprCompare. It makes rebasing and reviewing the followup PR easier

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 6, 2023

CodSpeed Performance Report

Merging #7145 will degrade performances by 2.44%

Comparing flatten-binary-expression (668d1c4) with main (fa0b6f4)

Summary

❌ 1 regressions
✅ 24 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main flatten-binary-expression Change
linter/all-rules[numpy/ctypeslib.py] 32.9 ms 33.8 ms -2.44%

@MichaReiser
Copy link
Member Author

CodSpeed Performance Report

Merging #7145 will degrade performances by 2.44%

Comparing flatten-binary-expression (668d1c4) with main (fa0b6f4)

Summary

❌ 1 regressions ✅ 24 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main flatten-binary-expression Change
linter/all-rules[numpy/ctypeslib.py] 32.9 ms 33.8 ms -2.44%

I doubt that ;)

@MichaReiser MichaReiser enabled auto-merge (squash) September 8, 2023 06:39
@MichaReiser MichaReiser merged commit e376c3f into main Sep 8, 2023
15 checks passed
@MichaReiser MichaReiser deleted the flatten-binary-expression branch September 8, 2023 06:51
MichaReiser added a commit that referenced this pull request Sep 8, 2023
## Summary

This PR implements the logic for breaking implicit concatenated strings before compare expressions by building on top of  #7145 

The main change is a new `BinaryLike` enum that has the `BinaryExpression` and `CompareExpression` variants. Supporting both variants requires some downstream changes but doesn't introduce any new concepts. 

## Test Plan

I added a few more tests. The compatibility improvements are minor but we now perfectly match black on twine 🥳 


**PR**

| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76083 |              1789 |              1632 |
| django       |           0.99966 |              2760 |                58 |
| transformers |           0.99928 |              2587 |               454 |
| **twine**        |           1.00000 |                33 |                 0 | <-- improved
| typeshed     |           0.99978 |              3496 |              2173 |
| **warehouse**    |           0.99824 |               648 |                22 | <-- improved
| zulip        |           0.99948 |              1437 |                28 |


**Base**

| project      | similarity index  | total files       | changed files     |
|--------------|------------------:|------------------:|------------------:|
| cpython      |           0.76083 |              1789 |              1633 |
| django       |           0.99966 |              2760 |                58 |
| transformers |           0.99928 |              2587 |               454 |
| twine        |           0.99982 |                33 |                 1 | 
| typeshed     |           0.99978 |              3496 |              2173 |
| warehouse    |           0.99823 |               648 |                23 |
| zulip        |           0.99948 |              1437 |                28 |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants