Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement flake8-type-checking #1785

Closed
nstarman opened this issue Jan 11, 2023 · 31 comments · Fixed by #2147
Closed

Implement flake8-type-checking #1785

nstarman opened this issue Jan 11, 2023 · 31 comments · Fixed by #2147
Labels
plugin Implementing a known but unsupported plugin

Comments

@nstarman
Copy link

nstarman commented Jan 11, 2023

It's a useful tool for determining which imports aren't run-time required, reducing import time. Thanks!

@charliermarsh charliermarsh added the plugin Implementing a known but unsupported plugin label Jan 12, 2023
@mkniewallner
Copy link
Contributor

Note that the project was renamed to flake8-type-checking (https://github.com/snok/flake8-type-checking).

@schrockn
Copy link

Came here to file this exact same issue. I had no idea it was supported in flake8. This would be huge for us! cc: cc @smackesey

@charliermarsh charliermarsh changed the title Implement flake8-typing-only-imports Implement flake8-type-checking Jan 19, 2023
@sondrelg
Copy link

I'd be very happy to take an initial stab at porting the library. I think most error codes can be auto-fixed, so the ported version might be significantly more useful in this form.

Are there any docs or resources I should read up on before diving into the ruff source code?

@charliermarsh
Copy link
Member

Awesome! We have some high-level docs on adding new rules in the contributing guide. Those are mostly helpful for guiding you on the mechanics of defining the rule, navigating the various files, etc. We'll also want to add a module in src/rules/flake8_type_checking to define the rules and settings (you could look at src/rules/flake8_bugbear as an example of the preferred structure).

Beyond the rule logic, I suspect we'll also need to adjust src/checkers/ast.rs to track some additional state -- that file is kind of a jumbo AST visitor that calls out to the various rule implementations at the appropriate time, and tracks a bunch of state (e.g., bindings, imports, scopes). For example, I'm guessing we need to track whether every import usage takes place within a type annotation or not? We already have state on the Checker for detecting whether we're in an annotation, and we already have used and unused import tracking, so hopefully those pieces can be leveraged in some way.

My suggestion, just to get familiar with the abstractions and conventions, would be to start with maybe the empty type-checking block rule? It seems like that wouldn't require any significant changes to the Checker apart from calling out to src/rules/flake8_type_checking/rules/non_empty_type_checking_block.rs when we process a StmtKind::If. So it'll mostly give you a chance to focus on how it all fits together, and lay the groundwork for the plugin itself.

@ofek
Copy link
Contributor

ofek commented Jan 20, 2023

Let me know if I can help, I'm competent in Rust but I'm not familiar with the code base. This would be an awesome feature and would reduce a lot of cognitive overhead for me personally because for Hatch I painstakingly use conditional and/or lazy imports to reduce CLI startup time.

@charliermarsh
Copy link
Member

I’m happy to do a first PR today to lay some of the groundwork here.

@ofek
Copy link
Contributor

ofek commented Jan 20, 2023

That would be neat since I'm going to cut a release in a few days which will generate projects that use Ruff and it would be nice to enable this.

@sondrelg
Copy link

sondrelg commented Jan 20, 2023

It might be worth discussing whether the rules are worth porting 1:1 and whether any APIs should change.

For example, TC006 has not been as helpful as I'd hoped. Pycharm's built-in type checker (not sure about other IDEs) struggles to understand that stringified imports still need to be there when type checking, so will remove the imports related to casts if you run the auto-clean imports command. I don't think I would have accepted that rule had I known about this ahead of time.

The TC error code also conflicts with another plugin, called tryceratops, so it might be nice to opt for another error code here; although it would mean it's not a complete drop-in replacement.

I was thinking I'd sit down an take a stab at a first rule tomorrow. I agree, the empty type checking block is probably a good candidate. If you've already started though @charliermarsh, feel free to go ahead and I'll try to get involved after the groundwork has been done 👍

@charliermarsh
Copy link
Member

A couple things:

  • Would definitely like to learn from your experience regarding which rules are reliable or not. Seems like we should omit TC006 based those comments.
  • There's some precedent for changing the rule code prefixes. For example, we use TID for flake8-tidy-imports, so that it doesn't conflict with the isort rules (under I). We typically do it if a plugin uses a single-letter prefix which is very likely to cause conflicts, or if a plugin conflicts with something we've already implemented (e.g., we used COM for flake8-commas since C conflicts with flake8-comprehensions). In this case, I'm open to either choice. We could use TC, and then expect tryceratops to use a different code (like TRY) if it gets implemented. Or we could use something like TYP.

I may get started on this tonight (at least to setup the plugin scaffolding), but either way, I'll post here at the end of the day so that you know whether there's any overlap!

@charliermarsh
Copy link
Member

@sondrelg - I went ahead and added the scaffolding for the plugin in #2048, along with a basic implementation of TC005. Let me know if you have any questions on that PR.

I think the biggest thing to figure out is what additional state we need to track in Checker to support those other checks. We already track a Binding for every import, which tracks usages (and powers our unused-import check). We probably need to extend that in some way to detect whether the usages occurred in a type definition or not?

charliermarsh added a commit that referenced this issue Jan 21, 2023
This PR adds the scaffolding files for `flake8-type-checking`, along with the simplest rule (`empty-type-checking-block`), just as an example to get us started.

See: #1785.
@sondrelg
Copy link

The way the plugin works today is by exhaustively checking every element except annotations, then matching imports to "uses". IIRC, most uses are either ast.Value, ast.Name, or ast.Constant. Since annotation slot attributes can be, e.g., ast.Name, we return early when we're in an annotation node, in the visitor.

It's also important to map the line start/end of all type checking blocks, since use of a guarded import within the scope of that guard, is fine:

from typing import TYPE_CHECKING as some_alias

if some_alias:
    import pandas as pd

    df = pd.DataFrame()  # this is fine

df = pd.DataFrame()  # this is not

Then we need a good way of normalizing aliases, if possible. Identifying whether pd.DataFrame means the pandas import was used or not really just boils down to identifying that the pandas import is aliased, then matching that string. Imports, the typing module, and the TYPE_CHECKING value can all be aliased.

Finally there's minor things like keeping track of futures imports, function scopes, and classifying imports as local or remote. The exhaustive list of plugin state can be found here: https://github.com/snok/flake8-type-checking/blob/main/flake8_type_checking/checker.py#L372:L434

I'll read over the PR now 🙂

@charliermarsh
Copy link
Member

Then we need a good way of normalizing aliases, if possible. Identifying whether pd.DataFrame means the pandas import was used or not really just boils down to identifying that the pandas import is aliased, then matching that string. Imports, the typing module, and the TYPE_CHECKING value can all be aliased.

Cool -- we have support for and a pattern for this (checker.resolve_call_path(...) will detect aliases and normalize to the imported path, like ["pd", "DataFrame"]).

Finally there's minor things like keeping track of futures imports, function scopes, and classifying imports as local or remote.

👍 We do track future imports and function scopes as Checker state, and we do have logic for classifying imports via stuff in the isort directory. So hopefully we can leverage all that :)

@sondrelg
Copy link

Yeah I noticed, that's great!

What do you think about import classification? The plugin uses classify-imports to distinguish between local imports, builtin imports, and library imports. The rationale behind that mainly being that only local application imports lead to import circularity issues, so for some users it might be nice to limit the linting to that class of imports.

I was thinking I might start taking a stab at porting the import classification logic we need as rust code?

@charliermarsh
Copy link
Member

Our import classification logic is actually a port of classify-imports, for the most part! (It might deviate in some small ways at this point.) Though we also have support for isort-style known-first-party, etc. That code is in src/rules/isort/categorize.rs, if helpful.

The other question is whether we want to have separate rules for different import classes, as in your existing plugin. I don't feel strongly! If we have the ability to categorize imports, then it makes sense to me to support different rules. But I'm curious if people turns those on and off in practice, what's your sense from issues, etc.?

@charliermarsh
Copy link
Member

charliermarsh commented Jan 22, 2023

@sondrelg - I realize it's probably unhelpful for me to just keep saying "We have something that does X" 😅 so let me take a second to outline how this plugin might be implemented in a bit of detail... If you're still interested, definitely let me know. If not, no worries at all.

@charliermarsh
Copy link
Member

So, for the rules around whether an import should / shouldn't be a in TYPE_CHECKING block, we need a few pieces of information:

  1. Was the import itself in a type-checking block?
  2. Was the import used in a type annotation?
  3. Was the import used outside of a type annotation? (If an import isn't used at all, I think the plugin avoids flagging it?)

We probably want to have this information available to use when we run check_dead_scopes, which is also where we flag unused imports.

One way to implement this would be as follows:

  1. Add a type_checking_blocks field to Checker. This could be Vec<&'a Stmt>, or Vec<Range>, and would contain references to the if-blocks or their ranges.
  2. In src/ast/types.rs, we have a Binding struct that represents all bindings, and tracks usages. We could split that into two fields, used_runtime and used_annotation, or even some kind of Usages struct that has runtime and annotation fields, to track the most recent usage at runtime and as an annotation. Alternatively, we could make that a vector, and store all usages. Lots of options... (Note that we have checker.in_annotation, which we can leverage when setting those fields.)
  3. In check_dead_scopes, iterate over all import bindings (see the if self.settings.rules.enabled(&Rule::UnusedImport) { ... } block for an example). For each binding, check its runtime and annotation usages, then check if the binding was created within a type-checking block. (Every Binding has a range field that tells you where it was defined, so we can check if that range is within any of the type-checking blocks, or whatever the exact logic should be).

(All of this ignores how we should treat redefinitions, i.e., modules that are imported multiple times.)

@sondrelg
Copy link

That seems great.

If an import isn't used at all, I think the plugin avoids flagging it?

This is true, but has been a bit of a pain to get right. We essentially try to shadow pyflake's F401 and avoid flagging an error if we suspect F401 would have been reported already. This way a user, using default flake8 settings would get one error instead of two. The trade-off is that this requires us to track all annotation values, which we wouldn't otherwise have to do iirc.

Add a type_checking_blocks field to Checker

I think there might also be a need to track function scopes, but perhaps the functionality that hinges on that information can be added later, or done in another way. Essentially I think we handle this case by tracking function scopes:

if TYPE_CHECKING:
    from pandas import DataFrame

foo: DataFrame  # this is fine

def bar():
    from pandas import DataFrame

    baz = DataFrame()  # this is also fine, but only because of the function-scoped import

This is a bit of a contrived example, but there are cases where this type of behavior can happen; typically in large files with circular import problems.


Did you say there is tooling to classify imports as stdlib/application/third-party/future already as well? I know the futures import is mapped 👍

You are probably familiar, but TC001, TC002, and TC003 are the same error, just for different classes of imports.


Lastly, I'd be very happy to write some or most of the code, but to be realistic I think I'm down to being able to contribute 3-4 hours a week. If that's fine, I'm happy to commit to giving that a go, but if you or anyone else wants to jump on this and get it done, then I'd of course be happy to help wherever I can, but let you do most of the work 🙂

@charliermarsh
Copy link
Member

Did you say there is tooling to classify imports as stdlib/application/third-party/future already as well?

Yeah, it's all in src/rules/isort/categorize.rs. It's only used for import-sorting right now, but could be extracted out and generalized for this check!

Lastly, I'd be very happy to write some or most of the code, but to be realistic I think I'm down to being able to contribute 3-4 hours a week.

No prob at all. I may get to it, I may not -- if I do, I'll comment here so that we don't duplicate work.

@ngnpope
Copy link
Contributor

ngnpope commented Jan 24, 2023

For example, TC006 has not been as helpful as I'd hoped. Pycharm's built-in type checker (not sure about other IDEs) struggles to understand that stringified imports still need to be there when type checking, so will remove the imports related to casts if you run the auto-clean imports command. I don't think I would have accepted that rule had I known about this ahead of time.

Yeah. Sorry about that rule @sondrelg. In principle it was a nice idea, but there are certainly problems as you've mentioned that we perhaps didn't foresee.

The main thing it trying to achieve was to reduce the runtime performance hit from building complex types when they're only relevant for static type checking. An alternative approach is to use type aliases in global scope to avoid the hit for building these complex types repetitively in every function call, loop, or whatever. As such TC006 could be ignored for this port, but I wonder if there is some way we can still gain some benefit by highlighting that a cast type is "overly complex" and may benefit from being shoved in a type alias?

Ah, sweet hindsight...

@sondrelg
Copy link

It's really only the lack of IDE support that makes it hard to work with in my experience. The rationale for the rule itself was great haha 👏

@charliermarsh
Copy link
Member

charliermarsh commented Jan 25, 2023

Okay, I just knocked this out in #2146 and #2147. It'll go out as soon as we can release again (we're blocked by PyPI limits right now and waiting for an exemption -- see #2136).

The current version doesn't include any autofix capabilities (it's challenging because we need to remove imports from one part of the AST and re-add them elsewhere), but I tested on Dagster and the inference seems reliable.

\cc @schrockn @smackesey @ofek

@charliermarsh
Copy link
Member

My only hesitation here is that I used TYP as the prefix, but the codes conflict with flake8-typing-imports. I don't intend to implement that plugin, but it is weird to have overlapping rule codes. Any other suggestions here? We could use TC, since we used TRY for tryceratops.

@charliermarsh
Copy link
Member

On second thought: TC causes the same issue, since it conflicts with the actual tryceratops. So noqa: TC002 would be interpreted incorrectly in the wild.

@charliermarsh
Copy link
Member

charliermarsh commented Jan 25, 2023

I'm going to rename to TYC, which is an uglier prefix, but it avoids all of the possible conflicts. I've added a redirect from TYP to TYC for now (with a warning), since a few users are already using TYP.

@sondrelg
Copy link

I originally used TCH. Not sure if that's taken, but could be appropriate. Ugliest candidate yet though 😄

@charliermarsh
Copy link
Member

Oh, cool. If you used that originally, let's go with that over TYC. Neither seems to be used.

@smackesey
Copy link

Nice to see all the progress on this. We haven't turned it on yet because I think autofix is essential here (to avoid annoying scenario where you are writing type annotations, import something via auto-complete, then have to go fix a lint error and manually move into a TYPE_CHECKING block). But if this gets autofix and works smoothly, it will provide huge value for us.

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Apr 27, 2023

Yeah. Sorry about that rule @sondrelg. In principle it was a nice idea, but there are certainly problems as you've mentioned that we perhaps didn't foresee.

The main thing it trying to achieve was to reduce the runtime performance hit from building complex types when they're only relevant for static type checking. An alternative approach is to use type aliases in global scope to avoid the hit for building these complex types repetitively in every function call, loop, or whatever. [...]

Please don't apologize, I'm quite fond of that rule and am not worried about pycharm since I don't use it. :P

(I would be interested in using it from ruff, if it was implemented. It seems useful to avoid the runtime hit even once.)

@charliermarsh
Copy link
Member

FYI: autofix for these rules will be enabled in the next release.

@messense
Copy link
Contributor

messense commented Jul 4, 2023

It seems that this doesn't work well with mashumaro, I got mashumaro.exceptions.UnresolvedTypeReferenceError when running auto-fixed code.

<string>:2: in to_dict
    ???
.venv/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:884: in add_pack_method
    self._add_pack_method_lines(method_name)
.venv/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:674: in _add_pack_method_lines
    field_types = self.field_types
.venv/lib/python3.10/site-packages/mashumaro/core/meta/code/builder.py:155: in field_types
    return self.__get_field_types()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <mashumaro.core.meta.code.builder.CodeBuilder object at 0x7effba8e02e0>, recursive = True, include_extras = False

    def __get_field_types(
        self, recursive: bool = True, include_extras: bool = False
    ) -> typing.Dict[str, typing.Any]:
        fields = {}
        try:
            field_type_hints = typing_extensions.get_type_hints(
                self.cls, include_extras=include_extras
            )
        except NameError as e:
            name = get_name_error_name(e)
>           raise UnresolvedTypeReferenceError(self.cls, name) from None
E           mashumaro.exceptions.UnresolvedTypeReferenceError: Class CrossValidatedModel has unresolved type reference Self in some of its fields

@Fatal1ty
Copy link

Fatal1ty commented Jul 4, 2023

It seems that this doesn't work well with mashumaro, I got mashumaro.exceptions.UnresolvedTypeReferenceError when running auto-fixed code.

That’s interesting. Can you share the original and auto-fixed reproducible code? I think it’s worth to create a separate issue for that here.

P.S. From what I’ve read, putting imports to if TYPE_CHECKING block will be a problem not only for mashumaro but for other libraries that operate with types at runtime:

The plugin assumes that the imports you only use for type hinting are not required at runtime.

This assumption can’t be applied to all the existing code, so this dangerous feature in auto-fixing must be disabled by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin Implementing a known but unsupported plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.