Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

airbyte-ci: format python using ruff instead of isort+black #33251

Closed
wants to merge 4 commits into from

Conversation

postamar
Copy link
Contributor

@postamar postamar commented Dec 8, 2023

This picks up where #29853 stalled. At the time, we didn't have airbyte-ci format. I re-used @aaronsteers 's pyproject.toml settings here. The resulting diff is surprisingly not huge.

Copy link

vercel bot commented Dec 8, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Dec 13, 2023 6:20pm

@postamar postamar requested review from aaronsteers and removed request for oustynova December 8, 2023 16:24
@octavia-squidington-iv octavia-squidington-iv requested a review from a team December 8, 2023 16:25
Copy link
Contributor

@bnchrch bnchrch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to before. No objections to Ruff.

Its faster, and lets us move from 2 tools to 1.

The fact that the rules are slightly different is annoying but formatting is to enforce a consistent style. The details of the style we choose are not important.

However, Given that @alafanechere is actively working in the area I would defer to him for final approval

@postamar
Copy link
Contributor Author

postamar commented Dec 8, 2023

Thanks for the review! Yes, I'll wait until his pre-commit hook + workflow changes have merged.

@aaronsteers
Copy link
Collaborator

aaronsteers commented Dec 8, 2023

@bnchrch, @postamar

I have a bit of good news and bad news. It looks like between the prior PR and this one, ruff announced and released ruff format as a replacement for black. @bnchrch - you actually were the one who made me aware this was even in the works with the prior PR. So, that's the good news: ruff can now do everything black could do, and (apart from mypy), ruff now replaces "all the things".

Bad news: this PR uses ruff format (the black replacement) but not also ruff check --fix. That full changeset has over 1K files changed and is here:

Worse news: some rules from ruff format and ruff check --fix and incompatible, so running those in sequence will lead to an infinite chain of modified files. Since black rules were always the "opinionated" and "correct" way to format files, the format rules should win when in conflict with check --fix, and indeed, the blog post tells us some rules we need to disable in order to make them compatible.

So - we have a choice in front of us:

  • We can merge this as-is, and take check --fix as a follow-on PR.
  • We can do all in one go, but we should at minimum disable conflicting rules between format and check.

PS - I'm a bit bummed that ruff check --fix and ruff format are different operations, but I guess I can see the logic from their side.

@bnchrch
Copy link
Contributor

bnchrch commented Dec 9, 2023

Very excited to see that they expanded.

@aaronsteers I can't seem to find any reference to where ruff check --fix and ruff format conflict.

I could find a list of deviations in behaviour from black but so far it seems like if we are running ruff's version of linting and formatting we should be safe.

But! I imagine im just being blind in my reading on Friday.

I did run a little test in this PR
#33275

Where I had the following order of operations

  1. ruff format
  2. commited
  3. run check --fix
  4. commited
  5. ruff format
  6. commited
  7. ruff check --fix

And these two things seem to be true

  1. ruff check --fix changes are not formatted properly
  2. ruff format changes are correct according to ruff check

This means during fix if we ever run ruff check --fix we must run ruff format directly after.

and during check we should ensure we run ruff check before ruff format --check

But I think at this point I would be for us adding the formatting code in one PR and the check code in another. Since we werent doing much linting to begin with and I would love to keep the momentum going and get ruff into our code base. Because youre right, it is fast.

@aaronsteers
Copy link
Collaborator

aaronsteers commented Dec 9, 2023

@bnchrch :

Here's the link for documented conflicts: https://docs.astral.sh/ruff/formatter/#format-suppression

Also: here's the commit from format after check --fix: 71ac7b1

After a quick manual review, some of the updates from this commit seem to be driven by what could be a difference in line length preferences. Might be worth more research to see if an inconsistent line length in my process caused some of those additional diffs.

@@ -21,7 +21,9 @@ class DefaultFileBasedAvailabilityStrategy(AbstractFileBasedAvailabilityStrategy
def __init__(self, stream_reader: AbstractFileBasedStreamReader):
self.stream_reader = stream_reader

def check_availability(self, stream: "AbstractFileBasedStream", logger: logging.Logger, _: Optional[Source]) -> Tuple[bool, Optional[str]]: # type: ignore[override]
def check_availability(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will generate mypy error as # type: ignore[override] needs to be applied to the line where stream: "AbstractFileBasedStream" is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll check the other annotations touched by this diff as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The others are OK, I checked with mypy using the config at airbyte-cdk/python/mypy.ini

@postamar
Copy link
Contributor Author

@aaronsteers somehow I knew it was too good to be true. Still, this change in this PR seems like it's not only safe but also desirable, considering that ruff format is like black but better.

There's nothing stopping anyone from adding check --fix and its ruleset in an incremental fashion. It seems like a nontrivial amount of work, though.

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in for using ruff in place of black and isort but I'd love to make sure this tool change is not changing our formatting rules. Ideally this tool change should not require a new formatting of the repo.
I belive there might be some configuration tweaking to get this result.

@postamar
Copy link
Contributor Author

That's legitimate. I don't have the time nor the interest in getting to understand the ruff rules, so I'll let someone else take over from here.

@aaronsteers
Copy link
Collaborator

@postamar re:

@aaronsteers somehow I knew it was too good to be true. Still, this change in this PR seems like it's not only safe but also desirable, considering that ruff format is like black but better.

There's nothing stopping anyone from adding check --fix and its ruleset in an incremental fashion. It seems like a nontrivial amount of work, though.

Yes, 100% agreed. Let's go forward with this as a first step. And incrementally, we can add additional format/lint autofixes in a future iteration.

@alafanechere - can you say more about your point above? The changes auto-applied here are basically bringing our code to meet black auto format standards, but using Ruff which is faster and better long-term. I didn't think we had any similar auto formatting for Python code before this, but I could be mistaken. (Or maybe we have auto formatting for some directories but not others?)

Copy link
Collaborator

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. 👍 Approving for my part. ✅

As discussed in the threads above, we can always do a follow-up to add more linting and autofixes.

One thing that might be nice is (maybe under "contributing" and maybe in a follow-up pr?) a section on "code style" or python "dev standards" - with a mention of what auto-formatter we use and how developers/contributors can invoke it manually. (Just a thought... please disregard if this already exists.)

Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good but you might have to approve and merge despite the failing cdk mypy check unless you plan on resolving the typing errors too. The check runs on all modified cdk files so a lot of failures are surfaced on this PR

@postamar
Copy link
Contributor Author

@alafanechere I looked into it some more and no, we can't minimize the diff more by tweaking the config. The short of it is, there is no config, both black and ruff are opinionated formatters. Ruff has somewhat different opinions on certain matters, which explains a lot of the diff: https://docs.astral.sh/ruff/formatter/black/ . Note that black's style is also continuously evolving, see https://black.readthedocs.io/en/stable/the_black_code_style/future_style.html . I admittedly don't have a horse in this race but it all strikes me as quite benign.

Given these facts, I'm hoping that you might change your mind. Otherwise the logical alternative is that we not just keep using black, but this exact version of black, which doesn't strike me as better.

@natikgadzhi
Copy link
Contributor

Yolo, one tool is better than four, we should switch and clean out our documentation. It's going to be easier on us, and it's going to be easier on external contributors and new engineers starting on the team.

@natikgadzhi natikgadzhi reopened this Apr 20, 2024
@natikgadzhi
Copy link
Contributor

Just continuing dicussion here, obviously we would need to reformat everything.

Here's what I think we should do:

  1. Replace Black + isort with Ruff:
  • Merge Python Ruff support (config only) #29866
  • Then, replace black and isort in airbyte-ci format check and format fix. To clarify, we will use ruff format, not ruff check. Why? Well, because the lift to get check to pass is big.
  • Push up formatting fixes after running ruff format locally, this will get all python code in compliance.
  1. Remove black and isort from all packages dependencies. Nice side-effect: we're using 2 years old version of black (22.*) and it has security holes in it, so — we'd get rid of those.
  2. Remove black and isort from all internal Python packages, including airbyte-ci and the CDK.
  3. In CDK specifically, remove flake8. Instead, replace check with ruff check. This is going to hurt, I suppose.
  4. While we're at it, we should make sure our CDK and airbyte-ci mypy configuration targets Python 3.10+ and make any neccesary fixes.

@postamar postamar closed this Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues CDK Connector Development Kit connectors/destination/amazon-sqs connectors/destination/aws-datalake connectors/destination/chroma connectors/destination/databend connectors/destination/firebolt connectors/destination/google-sheets connectors/destination/kvdb connectors/destination/qdrant connectors/destination/scaffold-destination-python connectors/destination/sftp-json connectors/destination/sqlite connectors/source/airtable connectors/source/amazon-ads connectors/source/amazon-seller-partner connectors/source/aws-cloudtrail connectors/source/bigcommerce connectors/source/bing-ads connectors/source/cart connectors/source/close-com connectors/source/commcare connectors/source/facebook-pages connectors/source/fastbill connectors/source/freshdesk connectors/source/genesys connectors/source/github connectors/source/gitlab connectors/source/google-analytics-v4 connectors/source/hubspot connectors/source/intercom connectors/source/iterable connectors/source/kyve connectors/source/lever-hiring connectors/source/linkedin-pages connectors/source/mailchimp connectors/source/microsoft-dataverse connectors/source/my-hours connectors/source/netsuite connectors/source/notion connectors/source/okta connectors/source/outbrain-amplify connectors/source/paypal-transaction connectors/source/paystack connectors/source/pivotal-tracker connectors/source/primetric connectors/source/public-apis connectors/source/qualaroo connectors/source/rki-covid connectors/source/s3 connectors/source/salesloft connectors/source/sftp-bulk connectors/source/shopify connectors/source/snapchat-marketing connectors/source/stripe connectors/source/surveycto connectors/source/surveymonkey connectors/source/todoist connectors/source/trello connectors/source/weatherstack connectors/source/webflow connectors/source/zendesk-chat connectors/source/zenloop connectors/source/zuora normalization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants