False positives with common method names #81

deppen8 · 2020-02-07T17:30:53Z

This is a dedicated issue for the big discussion in #74

The problem is that many of our checks rely on the type of the object being a pandas object. This is a fundamental issue with static linting in Python because the AST doesn't know what type a thing is. This leads to false positives for things like re.sub() or dict.values()

I am open to suggestions on how to get around this, but it will likely be a big job. Some kind of integration with mypy or some other way to leverage type annotations might be a way to fix this, at least for folks who use those type annotations. What exactly that looks like is unclear to me, so please let me know if you have any ideas.

For now, the undesirable workaround is to turn off checks that are particularly bothersome.

The text was updated successfully, but these errors were encountered:

thomasjohns · 2020-02-10T19:45:39Z

Here is a snippet of the ast produced with type annotations in python 3.8 (I imagine you would get the same or similar output using typed_ast (https://github.com/python/typed_ast) in earlier python versions). Note the annotations are decorated in the function argument and in the return value. (ast printing with https://github.com/asottile/astpretty)

>>> import pandas as pd; import numpy as np; import ast; import astpretty
>>> src = "def f(s: pd.core.series.Series) -> np.ndarray: return s.values"
>>> astpretty.pprint(ast.parse(src).body[0])
FunctionDef(
    lineno=1,
    col_offset=0,
    end_lineno=1,
    end_col_offset=62,
    name='f',
    args=arguments(
        posonlyargs=[],
        args=[
            arg(
                lineno=1,
                col_offset=6,
                end_lineno=1,
                end_col_offset=30,
                arg='s',
                annotation=Attribute(
                    lineno=1,
                    col_offset=9,
                    end_lineno=1,
                    end_col_offset=30,
                    value=Attribute(
                        lineno=1,
                        col_offset=9,
                        end_lineno=1,
                        end_col_offset=23,
                        value=Attribute(
                            lineno=1,
                            col_offset=9,
                            end_lineno=1,
                            end_col_offset=16,
                            value=Name(lineno=1, col_offset=9, end_lineno=1, end_col_offset=11, id='pd', ctx=Load()),
                            attr='core',
                            ctx=Load(),
                        ),
                        attr='series',
                        ctx=Load(),
                    ),
                    attr='Series',
                    ctx=Load(),
                ),
                type_comment=None,
            ),
        ],
        vararg=None,
        kwonlyargs=[],
        kw_defaults=[],
        kwarg=None,
        defaults=[],
    ),
    body=[
        Return(
            lineno=1,
            col_offset=47,
            end_lineno=1,
            end_col_offset=62,
            value=Attribute(
                lineno=1,
                col_offset=54,
                end_lineno=1,
                end_col_offset=62,
                value=Name(lineno=1, col_offset=54, end_lineno=1, end_col_offset=55, id='s', ctx=Load()),
                attr='values',
                ctx=Load(),
            ),
        ),
    ],
    decorator_list=[],
    returns=Attribute(
        lineno=1,
        col_offset=35,
        end_lineno=1,
        end_col_offset=45,
        value=Name(lineno=1, col_offset=35, end_lineno=1, end_col_offset=37, id='np', ctx=Load()),
        attr='ndarray',
        ctx=Load(),
    ),
    type_comment=None,
)

However, we need the type information to be decorated on the Function.body's Attribute lookup to pick this up in visit_Attribute. The type information is not decorated down to this level of the tree. To get type information there, we'd need type inference.

Type inference is likely out of scope for this library, but mypy, pytype, jedi, etc. all do some amount of type inference internally to do their respective jobs. Here is a brief view into what is out there.

Mypy doesn't seem to expose a fully type inferred ast as something you can use in a library based on

pytype does have this functionality on its roadmap

Question: usage as a library google/pytype#385
https://github.com/google/pytype/tree/master/pytype/tools/annotate_ast
but it is reported as "in progress".

jedi seems to provide something like this feature, but I think its using its own ast and not the built-in ast module

[Question] Using Jedi for AST parsing + type deduction davidhalter/jedi#920

In summary, I don't see an obvious way forward without some effort, but I will continue to research this.

thomasjohns · 2020-02-10T21:28:14Z

I think this is difficult for mypy to support because mypy immediately converts the build-in ast into a mypy specific ast where it does type inference https://github.com/python/mypy/wiki/Implementation-Overview. I think the best option here in the short term is the new feature from Google's pytype, but it will require some defensive coding since the feature is "in progress".

I'll run some experiments with the pytype feature and see where it leads.

thomasjohns · 2020-02-10T21:34:39Z

Another project trying to solve this problem is https://github.com/mbdevpl/static-typing. It looks similar to pytype in regards to its "in progress" readme messaging.

lleites · 2020-02-11T19:12:44Z

Checking if pandas is imported will help, but is not super safe.
I see two options, disable check per line, if you have a conflicting line you mark it to ignore these checks, but this happens a lot with boto3 and dynamdb as an example.
Another possibility is to add a safe_pandas option that disables known to be problematic checks.
I will check some mid-size projects to try to get a list of these rules that are hard to deal with the current AST tooling.

simchuck · 2020-02-11T19:43:34Z

Could you also scan the file for multiple pandas "signatures", rather than relying on an explicit import statement? Might be able to check existing code using pandas to determine the most used methods and score the likelihood of the methods in question being on a pandas object.

…

On Tue, Feb 11, 2020 at 11:12 AM Leandro Leites Barrios < ***@***.***> wrote: Checking if pandas is imported will help, but is not super safe. I see two options, disable check per line, if you have a conflicting line you mark it to ignore these checks, but this happens a lot with boto3 and dynamdb as an example. Another possibility is to add a safe_pandas option that disables known to be problematic checks. I will check some mid-size projects to try to get a list of these rules that are hard to deal with the current AST tooling. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#81?email_source=notifications&email_token=AAO2YGWIUDOZ36AKKPZ7AGDRCL2C5A5CNFSM4KRSFLKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELNWALY#issuecomment-584802351>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAO2YGQMYJRHM6LM4T4MM6LRCL2C5ANCNFSM4KRSFLKA> .

deppen8 · 2020-02-11T20:12:36Z

Thanks for chiming in, @lleites .

Checking if pandas is imported will help, but is not super safe.

Just to confirm, by "not super safe" you mean it won't fix all issues, correct? Or is there some actual security vulnerability I am not seeing?

I see two options, disable check per line, if you have a conflicting line you mark it to ignore these checks, but this happens a lot with boto3 and dynamdb as an example.

Do you mean something like #no-qa comments? I haven't checked, but flake8 might already have this.

lleites · 2020-02-11T20:37:52Z

Sorry that I was not clear, I mean it won't fix all issues.
Yes, you can disable specific checks per line in flake8
example = lambda: 'example' # noqa: E731
http://flake8.pycqa.org/en/3.1.1/user/ignoring-errors.html
From my point of view documenting which are those rules that can have false positives and how to disable them in the README.md would be also a step in the right direction.
I will check these projects I mention and come with a list of rules and comment here.

deppen8 · 2020-02-11T20:43:46Z

Adding this to the README is an excellent idea. I will create a small issue to track that.

…

On Tue, Feb 11, 2020, 12:37 PM Leandro Leites Barrios < ***@***.***> wrote: Sorry that I was not clear, I mean it won't fix all issues. Yes, you can disable specific checks per line in flake8 example = lambda: 'example' # noqa: E731 http://flake8.pycqa.org/en/3.1.1/user/ignoring-errors.html From my point of view documenting which are those rules that can have false positives and how to disable them in the README.md would be also a step in the right direction. I will check these projects I mention and come with a list of rules and comment here. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#81?email_source=notifications&email_token=AG333FLQBTWIK7XM5OWJPSTRCMECDA5CNFSM4KRSFLKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELN7DHA#issuecomment-584839580>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG333FJUWCB2HQE7D7WZKXTRCMECDANCNFSM4KRSFLKA> .

deppen8 mentioned this issue Feb 11, 2020

Confirm pandas imported #93

Closed

4 tasks

deppen8 added the help wanted Extra attention is needed label Feb 11, 2020

deppen8 mentioned this issue Feb 11, 2020

Add notes about disabling checks to README #94

Closed

lsorber mentioned this issue Jan 28, 2021

False positive: dict().values() #106

Closed

deppen8 mentioned this issue Feb 20, 2021

False positive: PD005 for regex sub method #108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False positives with common method names #81

False positives with common method names #81

deppen8 commented Feb 7, 2020

thomasjohns commented Feb 10, 2020

thomasjohns commented Feb 10, 2020 •

edited

Loading

thomasjohns commented Feb 10, 2020

lleites commented Feb 11, 2020

simchuck commented Feb 11, 2020 via email

deppen8 commented Feb 11, 2020

lleites commented Feb 11, 2020

deppen8 commented Feb 11, 2020 via email

False positives with common method names #81

False positives with common method names #81

Comments

deppen8 commented Feb 7, 2020

thomasjohns commented Feb 10, 2020

thomasjohns commented Feb 10, 2020 • edited Loading

thomasjohns commented Feb 10, 2020

lleites commented Feb 11, 2020

simchuck commented Feb 11, 2020 via email

deppen8 commented Feb 11, 2020

lleites commented Feb 11, 2020

deppen8 commented Feb 11, 2020 via email

thomasjohns commented Feb 10, 2020 •

edited

Loading