Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positives with common method names #81

Open
deppen8 opened this issue Feb 7, 2020 · 8 comments
Open

False positives with common method names #81

deppen8 opened this issue Feb 7, 2020 · 8 comments
Labels
help wanted Extra attention is needed

Comments

@deppen8
Copy link
Owner

deppen8 commented Feb 7, 2020

This is a dedicated issue for the big discussion in #74

The problem is that many of our checks rely on the type of the object being a pandas object. This is a fundamental issue with static linting in Python because the AST doesn't know what type a thing is. This leads to false positives for things like re.sub() or dict.values()

I am open to suggestions on how to get around this, but it will likely be a big job. Some kind of integration with mypy or some other way to leverage type annotations might be a way to fix this, at least for folks who use those type annotations. What exactly that looks like is unclear to me, so please let me know if you have any ideas.

For now, the undesirable workaround is to turn off checks that are particularly bothersome.

@thomasjohns
Copy link

Here is a snippet of the ast produced with type annotations in python 3.8 (I imagine you would get the same or similar output using typed_ast (https://github.com/python/typed_ast) in earlier python versions). Note the annotations are decorated in the function argument and in the return value. (ast printing with https://github.com/asottile/astpretty)

>>> import pandas as pd; import numpy as np; import ast; import astpretty
>>> src = "def f(s: pd.core.series.Series) -> np.ndarray: return s.values"
>>> astpretty.pprint(ast.parse(src).body[0])
FunctionDef(
    lineno=1,
    col_offset=0,
    end_lineno=1,
    end_col_offset=62,
    name='f',
    args=arguments(
        posonlyargs=[],
        args=[
            arg(
                lineno=1,
                col_offset=6,
                end_lineno=1,
                end_col_offset=30,
                arg='s',
                annotation=Attribute(
                    lineno=1,
                    col_offset=9,
                    end_lineno=1,
                    end_col_offset=30,
                    value=Attribute(
                        lineno=1,
                        col_offset=9,
                        end_lineno=1,
                        end_col_offset=23,
                        value=Attribute(
                            lineno=1,
                            col_offset=9,
                            end_lineno=1,
                            end_col_offset=16,
                            value=Name(lineno=1, col_offset=9, end_lineno=1, end_col_offset=11, id='pd', ctx=Load()),
                            attr='core',
                            ctx=Load(),
                        ),
                        attr='series',
                        ctx=Load(),
                    ),
                    attr='Series',
                    ctx=Load(),
                ),
                type_comment=None,
            ),
        ],
        vararg=None,
        kwonlyargs=[],
        kw_defaults=[],
        kwarg=None,
        defaults=[],
    ),
    body=[
        Return(
            lineno=1,
            col_offset=47,
            end_lineno=1,
            end_col_offset=62,
            value=Attribute(
                lineno=1,
                col_offset=54,
                end_lineno=1,
                end_col_offset=62,
                value=Name(lineno=1, col_offset=54, end_lineno=1, end_col_offset=55, id='s', ctx=Load()),
                attr='values',
                ctx=Load(),
            ),
        ),
    ],
    decorator_list=[],
    returns=Attribute(
        lineno=1,
        col_offset=35,
        end_lineno=1,
        end_col_offset=45,
        value=Name(lineno=1, col_offset=35, end_lineno=1, end_col_offset=37, id='np', ctx=Load()),
        attr='ndarray',
        ctx=Load(),
    ),
    type_comment=None,
)

However, we need the type information to be decorated on the Function.body's Attribute lookup to pick this up in visit_Attribute. The type information is not decorated down to this level of the tree. To get type information there, we'd need type inference.

Type inference is likely out of scope for this library, but mypy, pytype, jedi, etc. all do some amount of type inference internally to do their respective jobs. Here is a brief view into what is out there.

Mypy doesn't seem to expose a fully type inferred ast as something you can use in a library based on

pytype does have this functionality on its roadmap

jedi seems to provide something like this feature, but I think its using its own ast and not the built-in ast module

In summary, I don't see an obvious way forward without some effort, but I will continue to research this.

@thomasjohns
Copy link

thomasjohns commented Feb 10, 2020

I think this is difficult for mypy to support because mypy immediately converts the build-in ast into a mypy specific ast where it does type inference https://github.com/python/mypy/wiki/Implementation-Overview. I think the best option here in the short term is the new feature from Google's pytype, but it will require some defensive coding since the feature is "in progress".

I'll run some experiments with the pytype feature and see where it leads.

@thomasjohns
Copy link

Another project trying to solve this problem is https://github.com/mbdevpl/static-typing. It looks similar to pytype in regards to its "in progress" readme messaging.

@deppen8 deppen8 mentioned this issue Feb 11, 2020
4 tasks
@lleites
Copy link

lleites commented Feb 11, 2020

Checking if pandas is imported will help, but is not super safe.
I see two options, disable check per line, if you have a conflicting line you mark it to ignore these checks, but this happens a lot with boto3 and dynamdb as an example.
Another possibility is to add a safe_pandas option that disables known to be problematic checks.
I will check some mid-size projects to try to get a list of these rules that are hard to deal with the current AST tooling.

@simchuck
Copy link
Collaborator

simchuck commented Feb 11, 2020 via email

@deppen8
Copy link
Owner Author

deppen8 commented Feb 11, 2020

Thanks for chiming in, @lleites .

Checking if pandas is imported will help, but is not super safe.

Just to confirm, by "not super safe" you mean it won't fix all issues, correct? Or is there some actual security vulnerability I am not seeing?

I see two options, disable check per line, if you have a conflicting line you mark it to ignore these checks, but this happens a lot with boto3 and dynamdb as an example.

Do you mean something like #no-qa comments? I haven't checked, but flake8 might already have this.

@deppen8 deppen8 added the help wanted Extra attention is needed label Feb 11, 2020
@lleites
Copy link

lleites commented Feb 11, 2020

Sorry that I was not clear, I mean it won't fix all issues.
Yes, you can disable specific checks per line in flake8
example = lambda: 'example' # noqa: E731
http://flake8.pycqa.org/en/3.1.1/user/ignoring-errors.html
From my point of view documenting which are those rules that can have false positives and how to disable them in the README.md would be also a step in the right direction.
I will check these projects I mention and come with a list of rules and comment here.

@deppen8
Copy link
Owner Author

deppen8 commented Feb 11, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants