Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement pandas-vet #1235

Merged
merged 6 commits into from
Dec 15, 2022
Merged

Conversation

edgarrmondragon
Copy link
Contributor

@edgarrmondragon edgarrmondragon commented Dec 14, 2022

  • PD001: pandas should always be imported as 'import pandas as pd'
  • PD002: 'inplace = True' should be avoided; it has inconsistent behavior
  • PD003: '.isna' is preferred to '.isnull'; functionality is equivalent
  • PD004: '.notna' is preferred to '.notnull'; functionality is equivalent
  • PD005: Use arithmetic operator instead of method
  • PD006: Use comparison operator instead of method
  • PD007: '.ix' is deprecated; use more explicit '.loc' or '.iloc'
  • PD008: Use '.loc' instead of '.at'. If speed is important, use numpy.
  • PD009: Use '.iloc' instead of '.iat'. If speed is important, use numpy.
  • PD010 '.pivot_table' is preferred to '.pivot' or '.unstack'; provides same functionality
  • PD011 Use '.array' or '.to_array()' instead of '.values'; 'values' is ambiguous
  • PDO12 '.read_csv' is preferred to '.read_table'; provides same functionality
  • PD013 '.melt' is preferred to '.stack'; provides same functionality
  • PD015 Use '.merge' method instead of 'pd.merge' function. They have equivalent functionality.
  • PD901 'df' is a bad variable name. Be kinder to your future self.

Closes #1141

@edgarrmondragon
Copy link
Contributor Author

edgarrmondragon commented Dec 14, 2022

I'll try to implement PDV005 and PDV006 in the coming days.

@ksdaftari
Copy link

@edgarrmondragon Just fyi I had opened following ticket with pandas vet bit ago, that don't know if worth considering (again would double check if true, but wanted to bring up just in case) deppen8/pandas-vet#113

@ksdaftari
Copy link

@edgarrmondragon Additionally do remember then having some what unresolved issue with deppen8/pandas-vet#74 where some checks could give false negatives, don't know if concern or if still run into here but wanted to bring up just in case

@edgarrmondragon
Copy link
Contributor Author

edgarrmondragon commented Dec 15, 2022

@edgarrmondragon Additionally do remember then having some what unresolved issue with deppen8/pandas-vet#74 where some checks could give false negatives, don't know if concern or if still run into here but wanted to bring up just in case

@ksdaftari I have not implemented PD005 but we can leave it out for now. TLDR it's hard to solve with static linting such as Ruff's or flake8's as pointed out by the maintainer.

The workaround

A quick hack (that could be optional for example) would be to check if pandas is imported in the file. Sure, it won't work for a lot of cases, but for a lot of cases it also will work.

could lead to false negatives instead.

Wdyt?

@edgarrmondragon
Copy link
Contributor Author

@edgarrmondragon Just fyi I had opened following ticket with pandas vet bit ago, that don't know if worth considering (again would double check if true, but wanted to bring up just in case) deppen8/pandas-vet#113

Yeah, makes sense to recommend to_numpy() here 👍

@edgarrmondragon edgarrmondragon marked this pull request as ready for review December 15, 2022 01:17
@charliermarsh
Copy link
Member

@edgarrmondragon - Just double-confirming, all set for review here?

@edgarrmondragon
Copy link
Contributor Author

@edgarrmondragon - Just double-confirming, all set for review here?

Yup, all set 🙂

@charliermarsh
Copy link
Member

Great, will review and hopefully merge today :)

@charliermarsh charliermarsh merged commit 27de342 into astral-sh:main Dec 15, 2022
@edgarrmondragon edgarrmondragon deleted the pandas-vet branch December 15, 2022 19:33
@andersk
Copy link
Contributor

andersk commented Dec 18, 2022

Any reason we renamed these codes from the upstream PDnnn to PDVnnn?

@charliermarsh
Copy link
Member

I’ve been advocating the use of more specific prefixes for plugins, to avoid collisions in the present and future. For example, flake8-tidy-imports uses the “I” prefix, which is maybe fine if you have to explicitly instead the plugin to activate it, but causes collisions in the Ruff model. So I’d generally been biasing towards three-letter prefixes for all plugins.

It’s a questionable choice and maybe one that merits case-by-case consideration, would love your opinion. In this case, PD might be confusing if we implemented another Pandas-related plugin.

We could also consider adding redirects from PD to PDV like we do for some other codes (e.g., U to UP).

@andersk
Copy link
Contributor

andersk commented Dec 18, 2022

Hmm. It seems to me that the main benefit of the ABC123 system is compatibility with the upstream codes. Yeah, we’ll need to deviate occasionally when there are collisions. But if we’re also going to deviate on account of hypothetical collisions (even for plugins following the upstream recommendation of 2 or 3 characters), then maybe we’d be better off with a different naming system designed for humans instead of military radio dials?

This has sort of been in the back of my mind—something more like ESLint’s system where e.g. eslint-plugin-unicorn provides errors like unicorn/no-array-for-each would reduce the need to constantly refer back to the documentation or keep comments next to every error code to help remember what they mean.

@charliermarsh
Copy link
Member

Yeah that's a very reasonable argument. Perhaps the right approach for now is to only deviate if it will cause a conflict with an existing plugin or knowingly cause a conflict with another known plugin that we might reasonably implement. I can revert to PD for this specific plugin.

And yes -- I do want to redo the error code system (ideally in a way that's compatible with the existing system and thus with Flake8), and I think ESLint is probably the right model to follow (though I also want to retain unique IDs for each code like Pylint, which is necessary for compatibility anyway). I wrote a bit about it here: #967.

This will probably come with some more flexibility around noqa directives... like ESLint's ignore-next-line behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for pandas-vet
4 participants