Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a structured list reporter to pyflakes.reporter #462

Closed
wants to merge 3 commits into from

Conversation

mjbommar
Copy link

@mjbommar mjbommar commented Aug 12, 2019

Adding a simple reporter class to log to a list of dictionaries with a consistent format across all message types and sources:

  • message_class: analogous to E/W/L, e.g., "error"
  • message_type: more specific type or description, e.g., "syntax_error"
  • filename: file name if provided, else None
  • lineno: line number if provided, else None
  • message: "message" from the perspective of the sender
  • source: source code if provided, else None

Sample output:

>>> reporter = pyflakes.reporter.ListReporter()
>>> pyflakes.api.check(source_bytes, file_name, reporter)
>>> print(reporter.entries)
[{'message_class': 'warning',
  'message_type': 'flake_warning',
  'filename': '0-core-client-1.1.0a8/zeroos/core0/client/__init__.py',
  'lineno': 1,
  'offset': 0,
  'message': "'.client.Client' imported but unused",
  'source': None},
 {'message_class': 'warning',
  'message_type': 'flake_warning',
  'filename': '0-core-client-1.1.0a8/zeroos/core0/client/__init__.py',
  'lineno': 1,
  'offset': 0,
  'message': "'.client.ResultError' imported but unused",
  'source': None},
 {'message_class': 'warning',
  'message_type': 'flake_warning',
  'filename': '0-core-client-1.1.0a8/zeroos/core0/client/__init__.py',
  'lineno': 1,
  'offset': 0,
  'message': "'.client.JobNotFoundError' imported but unused",
  'source': None},
 {'message_class': 'warning',
  'message_type': 'flake_warning',
  'filename': '0-core-client-1.1.0a8/zeroos/core0/client/client.py',
  'lineno': 742,
  'offset': 8,
  'message': "'raise NotImplemented' should be 'raise NotImplementedError'",
  'source': None}]

Copy link
Member

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code you added seems unused outside of tests -- or am I reading this incorrectly?

@mjbommar
Copy link
Author

Correct. The goal was to provide a convenient interface for those who might want to use pyflakes to generate structured information (e.g., populate a database or CSV record) instead of simply printing to stdout/stderr.

Is there a preferred way to do this currently? Apologies if so, but we didn't find any cleaner way.

@asottile
Copy link
Member

pyflakes doesn't take any options so even with this code added here I'm not sure it really helps anything?

probably the easiest way (and what I would suggest) would be to use a formatter via flake8 instead, for example:

$ pip install flake8 flake8-json
...
$ echo 'import os' > t.py
$ flake8 --select=F --format=json t.py  # --select=F limits it to pyflakes codes only
{"t.py": [{"code": "F401", "filename": "t.py", "line_number": 1, "column_number": 1, "text": "'os' imported but unused", "physical_line": "import os\n"}]}

@mjbommar
Copy link
Author

mjbommar commented Aug 12, 2019

For our purposes, we're trying to run flake8 and pycodestyle on millions of files. We're trying to avoid the overhead of starting new processes and piping data around, and flake8 does more than pyflakes that may not be necessary in some cases.

I can understand if you don't see the point for the use case if pyflakes is a library only for flake8.
But then perhaps pyflakes should be vendored inside of flake8?

@asottile
Copy link
Member

For our purposes, we're trying to run flake8 and pycodestyle on millions of files. We're trying to avoid the overhead of starting new processes and piping data around, and flake8 does more than pyflakes that may not be necessary in some cases.

this is why I'm suggesting --select=F -- that'll only do pyflakes. if you don't want the multiprocessing use -j1

@mjbommar
Copy link
Author

  1. The files are loaded from network into memory and never created on disk.
  2. Even if we write into stdin for flake8, why launch millions of additional interpreter processes and create a second copy of data already?
  3. Is pyflakes meant to be a library for use solely by flake8? If not, how are users expected to interact with pyflakes?

@asottile
Copy link
Member

1. The files are loaded from network into memory and never created on disk.

2. Even if we write into stdin for flake8, why launch millions of additional interpreter processes and create a second copy of data already?

could probably do something clever with ramdisks, though I don't know your usecase (it seems quite different from normal uses of these tools)

3. Is pyflakes meant to be a library for use solely by flake8?  If not, how are users expected to interact with pyflakes?

no it's not, I was just suggesting a way that already exists. pyflakes is usuable as a cli tool (and presumably to a limited extent as a library via the api in pyflakes.api) on its own. note that pyflakes tends to do a single thing and has no options -- generally steering towards more comprehensive tools such as flake8 for output manipulation, inclusion, exclusion, filtering, ignoring, etc.

@sigmavirus24
Copy link
Member

As @asottile said, pyflakes doesn't accept command-line arguments. This is because it doesn't accept checks that could be false-positives. As such, there's little need for command-line arguments and people who want to disable things always do so using Flake8.

Completely disregarding whether you could even use this if it were part of PyFlakes, Python reprs are barely structured data.

@sigmavirus24
Copy link
Member

It occurred to me after finishing my last comment that

Even if we write into stdin for flake8, why launch millions of additional interpreter processes and create a second copy of data already?

Is likely because you think that you can't disable Flake8's use of multiprocessing. You can. Flake8 hooks directly into pyflakes and pycodestyle and doesn't launch sub-processes or sub-interpreters for those tools (if anything, Flake8 reduces the amount of processing since each tool can generate AST from the same file, tokenize it, etc.).

Since you can bypass the chief performance bottleneck you seem worried about, I think it's safe to close this.

@mjbommar
Copy link
Author

We're parsing literally all of PyPI - every source release. And flake8's approach to creating copies of the data and AST parsing is not helpful for our use case.

This is such a simple PR with no mainline code impact and huge performance improvement for our use case; it's hard to understand your position as anything other than a desire not to support pyflakes as a Python library for developers to use directly. It would be nice if this was made clear to the community before anyone else dedicates time to PRs that are unwanted.

@mjbommar mjbommar closed this Aug 12, 2019
@sigmavirus24
Copy link
Member

sigmavirus24 commented Aug 12, 2019

This is such a simple PR with no mainline code impact and huge performance improvement for our use case; it's hard to understand your position as anything other than a desire not to support pyflakes as a Python library for developers to use directly.

This is hardly contributing in good faith. It would seem that you care more about landing a pull request than the burden it may place on everyone else who maintains this project, present and future.

The default stance for any open source project should roughly amount to "No is temporary. Yes is forever". The default answer should always be "No". PyFlakes can be used as a python library one can import. You're already doing that and may projects also do that.

Just because it can be imported doesn't mean it's a library that has to include everything and then the kitchen sink. In fact, it's a better library if it doesn't. It's easier to maintain. It's more likely to be maintained into the future so that whatever you're doing doesn't need to be rewritten.

But yes, absolutely. We all hate you. We hate our users who import PyFlakes. And whatever other things you want to believe that make it easier for you to go on with the rest of your day.*


* This statement is sarcasm. We don't hate anyone.

@mjbommar
Copy link
Author

I understand; our research is in part an attempt to measure dead code and maintenance burden across the entire ecosystem. Unfortunately, we're going to maintain our own fork to help do this.

However, there is literally no way to interact with your modules other than CLI or reverse-engineering undocumented APIs, which only makes it harder for others to help the team. It would be nice if the pyflakes/pycodestyle docs at least included something like these Reporter classes as examples for those with similar needs.

@sigmavirus24
Copy link
Member

As someone who has written an integration with PyFlakes, I agree there should be docs to make it easier. I think a PR to add those would be tremendously helpful and valuable and would definitely be merged.

@mjbommar
Copy link
Author

Once more unto the breach - in terms of what one might "expect" of a library and what another PyCQA project is comfortable maintaining:
https://github.com/PyCQA/bandit/blob/master/bandit/core/manager.py#L69

Integrating with bandit, which shares a huge amount of very similar code and is also primarily a CLI application, is this simple:

bandit_config = BanditConfig()
bandit_manager = BanditManager(bandit_config, 'file')
bandit_manager._parse_file(file_name, io.BytesIO(source_bytes), [file_name])
for result in bandit_manager.results:
    # do something with structured bandit.core.issue.Issue object

I understand this is a much larger "feature (set)" required for this kind of example to be mirrored by pyflakes, but would it be worth backlogging/roadmapping?

(PS, as an entirely different can of worms - has the PyCQA group as a whole considered whether the common infrastructure - walking paths, parsing ASTs, etc. - could be shared across the many similar package needs? We have some code related to reliable AST parsing across legacy Python versions from within Python 3.x that we would like to contribute, but it would be much nicer to contribute to one place instead of three.)

@asottile
Copy link
Member

is this simple

You're using a private api

@mjbommar
Copy link
Author

Yes, a risk we are willing to take. Compared to maintaining a completely stand-alone fork or monkey-patching, I'll take a private API.

@asottile
Copy link
Member

I'm confused why you think you need this PR or a fork or monkeypatching, can't you just do:

from your_own_module import ListReporter
from pyflakes.api import check

reporter = ListReporter()
check(source_bytes, file_name, reporter=reporter)

@mjbommar
Copy link
Author

mjbommar commented Aug 13, 2019

  1. Yes, we can. We simply wanted to commit it to mainline since, as a user, it was otherwise unclear how to access results efficiently. Now, we'll just submit a doc PR under examples somewhere. (PS don't you think all of the pyflakes unit tests would be so much cleaner with a structured Reporter class?)

  2. This latest comment was equal parts "would you accept a feature request issue asking for an interface like bandit?" and "why is everyone in PyCQA re-writing the same code to walk packages and parse ASTs instead of sharing the burden?"

@asottile
Copy link
Member

The pyflakes unit tests do use structured responses, they utilize the Checker directly which has error instances

Most of the PyCQA tools act independently, some of us are involved in multiple projects but for the most part they're separate

@sigmavirus24
Copy link
Member

The PyCQA was also born to be a loose group of projects sharing a goal of improving code quality. To attempt to share resources would require far more coordination than anyone in the group presently has time or energy for. Further, many of us (myself, doc8, bandit) have seen the downsides of trying to have shared infrastructure like that which works everyone and trying to enforce people use that.

While, as @asottile mentions, there are very few of us who work on many projects, most people here have only ever signed up to maintain 1. Asking they start using another dependency and potentially contributing to it/maintaining it under some forced pretense of "shared libraries" that isn't strictly necessary as 99% of our users interact with us through a CLI or Continuous Integration tool, seems demoralizing.

If others were interested in this, I would have expected it would have come up already more widely.

@asmeurer
Copy link
Contributor

There was discussion about this a while back at davidhalter/jedi#630.

parso I'm sure would simplify the pyflakes code a lot, but I'm not sure if that is enough motivation to transition the codebase to it. I don't know if any additional features that are now impossible would be unlocked by doing so, but that would be the sort of thing to motivate this.

@asmeurer
Copy link
Contributor

I guess one sort of big feature would be the ability to parse Python versions other than the one pyflakes is running in. But now that Python 2 is going away this isn't as important as it was.

@mjbommar
Copy link
Author

If you go out into enterprise, there's an awful lot of Python 2 out there, and I suspect there will be for another decade or two to come. Plenty of applications bundled it. Go ask an AS/400 or Fortran developer for their opinion...

Exception-handling and 1.x/2.x code handling are exactly what we were looking to submit via PR, but I think you've already done a way better job of what we were trying to do, @asmeurer! I'll ask one of our team to do a feature comparison and we'll submit an issue/PR to your repo if we think there's something we can add.

@asmeurer
Copy link
Contributor

To be clear, I'm not the author of parso (unless you are referring to something else).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants