Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatter removes parentheses around call chaining #7050

Closed
cnpryer opened this issue Sep 1, 2023 · 12 comments · Fixed by #7109
Closed

Formatter removes parentheses around call chaining #7050

cnpryer opened this issue Sep 1, 2023 · 12 comments · Fixed by #7109
Assignees
Labels
bug Something isn't working formatter Related to the formatter help wanted Contributions especially welcome

Comments

@cnpryer
Copy link
Contributor

cnpryer commented Sep 1, 2023

Version: ruff 0.0.287

Line-length: 88

Related: #5343

My team does a lot of exploratory scripting with Pandas (most are analysts, not developers), so often some analysts will chain together Pandas methods for SQL-like operations.

Note that the removed parentheses still results in valid Python. I actually prefer this change since the expression is already wrapped by pd.concat and shouldn't confuse the original author or subsequent readers. The removed parentheses, IMO, are redundant (see (...).groupby(..)).

I figured this kind of code is probably common outside the current ecosystem checks (and in Python's data analysis world), especially with Pandas users.

The following is an example of a diff snippet from one of our repositories:

Input, formatted with black (23.7.0):

merged = pd.concat(
    [
        df1_aaaaaaaaaaaa[
            df1_aaaaaaaaaaaa["a"].isin(["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"])
        ],
        (
            df1_aaaaaaaaaaaa.loc[
                df1_aaaaaaaaaaaa["a"] == "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                [
                    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                ],
            ]
            .merge(
                df2_aaaaaaaaaaaa[["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"]],
                on=["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"],
                how="a",
                validate="a",
            )
            .drop(columns=["a"])
            .merge(
                df3_aaaaaaaaaaaa[
                    [
                        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                    ]
                ],
                on=["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"],
                how="a",
                validate="a",
            )
        )
        .groupby(
            [
                "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
            ]
        )
        .sum()
        .reset_index(),
    ]
)

ruff format diff:

diff --git a/scratch.py b/scratch.py
index d2ee558..2e4900c 100644
--- a/scratch.py
+++ b/scratch.py
@@ -3,35 +3,33 @@ merged = pd.concat(
         df1_aaaaaaaaaaaa[
             df1_aaaaaaaaaaaa["a"].isin(["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"])
         ],
-        (
-            df1_aaaaaaaaaaaa.loc[
-                df1_aaaaaaaaaaaa["a"] == "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
+        df1_aaaaaaaaaaaa.loc[
+            df1_aaaaaaaaaaaa["a"] == "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
+            [
+                "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
+                "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
+                "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
+                "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
+            ],
+        ]
+        .merge(
+            df2_aaaaaaaaaaaa[["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"]],
+            on=["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"],
+            how="a",
+            validate="a",
+        )
+        .drop(columns=["a"])
+        .merge(
+            df3_aaaaaaaaaaaa[
                 [
                     "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                     "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
                     "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
-                    "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
-                ],
-            ]
-            .merge(
-                df2_aaaaaaaaaaaa[["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"]],
-                on=["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"],
-                how="a",
-                validate="a",
-            )
-            .drop(columns=["a"])
-            .merge(
-                df3_aaaaaaaaaaaa[
-                    [
-                        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
-                        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
-                        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
-                    ]
-                ],
-                on=["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"],
-                how="a",
-                validate="a",
-            )
+                ]
+            ],
+            on=["aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"],
+            how="a",
+            validate="a",
         )
         .groupby(
             [

Playground: https://play.ruff.rs/11aa202b-fee6-4a8f-bc61-330acffeca9c

More to come :)

@cnpryer

This comment was marked as outdated.

@charliermarsh charliermarsh added the formatter Related to the formatter label Sep 1, 2023
@charliermarsh
Copy link
Member

Interesting, it looks like Black will preserve parentheses around the first part of a call chain, no matter how many components are included in the parentheses (example). Not obvious to me whether we want to preserve this or not.

@charliermarsh charliermarsh self-assigned this Sep 1, 2023
@cnpryer
Copy link
Contributor Author

cnpryer commented Sep 2, 2023

Hmm maybe I'm misunderstanding your example. It seems to preserve with Ruff (playground). But I can boil my original example down as:

Within some parenthesized context you can get the following behavior:

I can get both Ruff and Black to preserve the parenthesized dataframe/series expression with one method call

# Input
[(df1_aaaaaaaaaaaa.loc[df1_aaaaaaaaaaaa["a"] == "aaaaaaaaaaaaaaaaa"]).groupby()]

# Black
[(df1_aaaaaaaaaaaa.loc[df1_aaaaaaaaaaaa["a"] == "aaaaaaaaaaaaaaaaa"]).groupby()]

# Ruff
[(df1_aaaaaaaaaaaa.loc[df1_aaaaaaaaaaaa["a"] == "aaaaaaaaaaaaaaaaa"]).groupby()]

Black Playground

Ruff Playground

When you add a second method call Ruff will strip the parentheses

# Input
[(df1_aaaaaaaaaaaa.loc[df1_aaaaaaaaaaaa["a"] == "aaaaaaaaaaaaaaaaa"]).groupby().sum()]

# Black
[(df1_aaaaaaaaaaaa.loc[df1_aaaaaaaaaaaa["a"] == "aaaaaaaaaaaaaaaaa"]).groupby().sum()]

# Ruff
[df1_aaaaaaaaaaaa.loc[df1_aaaaaaaaaaaa["a"] == "aaaaaaaaaaaaaaaaa"].groupby().sum()]

Black Playground

Ruff Playground

What I want to play with would be examples requiring us to retain order of operations for dataframe/series operations.

@cnpryer
Copy link
Contributor Author

cnpryer commented Sep 2, 2023

And even simpler the same behavior occurs with:

[((d)).call()]

Which is a good example of what you're pointing out.

@cnpryer
Copy link
Contributor Author

cnpryer commented Sep 2, 2023

Oh wow, you don't need the parenthesized context either.

((d)).call()

Black Playground

Ruff Playground

@MichaReiser
Copy link
Member

Ruff should preserve expression-parentheses except in clause headers (like if (True)). Not sure why it isn't in this case. Thanks for narrowing down the example.

@MichaReiser MichaReiser added the bug Something isn't working label Sep 2, 2023
@MichaReiser MichaReiser added this to the Formatter: Beta milestone Sep 2, 2023
@MichaReiser MichaReiser added the help wanted Contributions especially welcome label Sep 2, 2023
@MichaReiser
Copy link
Member

@charliermarsh are you working on this or is it up for grabs?

@charliermarsh
Copy link
Member

I was working on this but didn’t get to PR yet. I believe it’s specific to call chain formatting.

@charliermarsh
Copy link
Member

If I don’t find time to fix today, I’ll unassign myself :)

@MichaReiser
Copy link
Member

sounds good. @konstin has context on this as well.

@charliermarsh
Copy link
Member

For reference, using this to demonstrate Black's behavior are parenthesized "heads" on call chains.

@charliermarsh
Copy link
Member

charliermarsh commented Sep 2, 2023

I made some progress on this at charlie/parens but haven't fully figured it out. I thought this could be solved by changing expr_attribute.rs alone but I'm now thinking we may need to change the fluent style detection code too. If anyone beats me to this they're welcome to take it over (\cc @konstin as a good contender :)). I'll re-assign if I find more time for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working formatter Related to the formatter help wanted Contributions especially welcome
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants