Add a structured list reporter to pyflakes.reporter #462

mjbommar · 2019-08-12T01:05:47Z

Adding a simple reporter class to log to a list of dictionaries with a consistent format across all message types and sources:

message_class: analogous to E/W/L, e.g., "error"
message_type: more specific type or description, e.g., "syntax_error"
filename: file name if provided, else None
lineno: line number if provided, else None
message: "message" from the perspective of the sender
source: source code if provided, else None

Sample output:

>>> reporter = pyflakes.reporter.ListReporter()
>>> pyflakes.api.check(source_bytes, file_name, reporter)
>>> print(reporter.entries)
[{'message_class': 'warning',
  'message_type': 'flake_warning',
  'filename': '0-core-client-1.1.0a8/zeroos/core0/client/__init__.py',
  'lineno': 1,
  'offset': 0,
  'message': "'.client.Client' imported but unused",
  'source': None},
 {'message_class': 'warning',
  'message_type': 'flake_warning',
  'filename': '0-core-client-1.1.0a8/zeroos/core0/client/__init__.py',
  'lineno': 1,
  'offset': 0,
  'message': "'.client.ResultError' imported but unused",
  'source': None},
 {'message_class': 'warning',
  'message_type': 'flake_warning',
  'filename': '0-core-client-1.1.0a8/zeroos/core0/client/__init__.py',
  'lineno': 1,
  'offset': 0,
  'message': "'.client.JobNotFoundError' imported but unused",
  'source': None},
 {'message_class': 'warning',
  'message_type': 'flake_warning',
  'filename': '0-core-client-1.1.0a8/zeroos/core0/client/client.py',
  'lineno': 742,
  'offset': 8,
  'message': "'raise NotImplemented' should be 'raise NotImplementedError'",
  'source': None}]

asottile

the code you added seems unused outside of tests -- or am I reading this incorrectly?

mjbommar · 2019-08-12T02:04:31Z

Correct. The goal was to provide a convenient interface for those who might want to use pyflakes to generate structured information (e.g., populate a database or CSV record) instead of simply printing to stdout/stderr.

Is there a preferred way to do this currently? Apologies if so, but we didn't find any cleaner way.

asottile · 2019-08-12T02:19:38Z

pyflakes doesn't take any options so even with this code added here I'm not sure it really helps anything?

probably the easiest way (and what I would suggest) would be to use a formatter via flake8 instead, for example:

$ pip install flake8 flake8-json
...
$ echo 'import os' > t.py
$ flake8 --select=F --format=json t.py  # --select=F limits it to pyflakes codes only
{"t.py": [{"code": "F401", "filename": "t.py", "line_number": 1, "column_number": 1, "text": "'os' imported but unused", "physical_line": "import os\n"}]}

mjbommar · 2019-08-12T02:23:28Z

For our purposes, we're trying to run flake8 and pycodestyle on millions of files. We're trying to avoid the overhead of starting new processes and piping data around, and flake8 does more than pyflakes that may not be necessary in some cases.

I can understand if you don't see the point for the use case if pyflakes is a library only for flake8.
But then perhaps pyflakes should be vendored inside of flake8?

asottile · 2019-08-12T02:24:37Z

For our purposes, we're trying to run flake8 and pycodestyle on millions of files. We're trying to avoid the overhead of starting new processes and piping data around, and flake8 does more than pyflakes that may not be necessary in some cases.

this is why I'm suggesting --select=F -- that'll only do pyflakes. if you don't want the multiprocessing use -j1

mjbommar · 2019-08-12T02:28:50Z

The files are loaded from network into memory and never created on disk.
Even if we write into stdin for flake8, why launch millions of additional interpreter processes and create a second copy of data already?
Is pyflakes meant to be a library for use solely by flake8? If not, how are users expected to interact with pyflakes?

asottile · 2019-08-12T02:33:04Z

1. The files are loaded from network into memory and never created on disk.

2. Even if we write into stdin for flake8, why launch millions of additional interpreter processes and create a second copy of data already?

could probably do something clever with ramdisks, though I don't know your usecase (it seems quite different from normal uses of these tools)

3. Is pyflakes meant to be a library for use solely by flake8?  If not, how are users expected to interact with pyflakes?

no it's not, I was just suggesting a way that already exists. pyflakes is usuable as a cli tool (and presumably to a limited extent as a library via the api in pyflakes.api) on its own. note that pyflakes tends to do a single thing and has no options -- generally steering towards more comprehensive tools such as flake8 for output manipulation, inclusion, exclusion, filtering, ignoring, etc.

sigmavirus24 · 2019-08-12T12:28:15Z

As @asottile said, pyflakes doesn't accept command-line arguments. This is because it doesn't accept checks that could be false-positives. As such, there's little need for command-line arguments and people who want to disable things always do so using Flake8.

Completely disregarding whether you could even use this if it were part of PyFlakes, Python reprs are barely structured data.

sigmavirus24 · 2019-08-12T12:34:40Z

It occurred to me after finishing my last comment that

Even if we write into stdin for flake8, why launch millions of additional interpreter processes and create a second copy of data already?

Is likely because you think that you can't disable Flake8's use of multiprocessing. You can. Flake8 hooks directly into pyflakes and pycodestyle and doesn't launch sub-processes or sub-interpreters for those tools (if anything, Flake8 reduces the amount of processing since each tool can generate AST from the same file, tokenize it, etc.).

Since you can bypass the chief performance bottleneck you seem worried about, I think it's safe to close this.

mjbommar · 2019-08-12T12:41:02Z

We're parsing literally all of PyPI - every source release. And flake8's approach to creating copies of the data and AST parsing is not helpful for our use case.

This is such a simple PR with no mainline code impact and huge performance improvement for our use case; it's hard to understand your position as anything other than a desire not to support pyflakes as a Python library for developers to use directly. It would be nice if this was made clear to the community before anyone else dedicates time to PRs that are unwanted.

sigmavirus24 · 2019-08-12T12:58:18Z

This is such a simple PR with no mainline code impact and huge performance improvement for our use case; it's hard to understand your position as anything other than a desire not to support pyflakes as a Python library for developers to use directly.

This is hardly contributing in good faith. It would seem that you care more about landing a pull request than the burden it may place on everyone else who maintains this project, present and future.

The default stance for any open source project should roughly amount to "No is temporary. Yes is forever". The default answer should always be "No". PyFlakes can be used as a python library one can import. You're already doing that and may projects also do that.

Just because it can be imported doesn't mean it's a library that has to include everything and then the kitchen sink. In fact, it's a better library if it doesn't. It's easier to maintain. It's more likely to be maintained into the future so that whatever you're doing doesn't need to be rewritten.

But yes, absolutely. We all hate you. We hate our users who import PyFlakes. And whatever other things you want to believe that make it easier for you to go on with the rest of your day.*

* This statement is sarcasm. We don't hate anyone.

mjbommar · 2019-08-12T13:04:41Z

I understand; our research is in part an attempt to measure dead code and maintenance burden across the entire ecosystem. Unfortunately, we're going to maintain our own fork to help do this.

However, there is literally no way to interact with your modules other than CLI or reverse-engineering undocumented APIs, which only makes it harder for others to help the team. It would be nice if the pyflakes/pycodestyle docs at least included something like these Reporter classes as examples for those with similar needs.

sigmavirus24 · 2019-08-12T13:08:47Z

As someone who has written an integration with PyFlakes, I agree there should be docs to make it easier. I think a PR to add those would be tremendously helpful and valuable and would definitely be merged.

mjbommar · 2019-08-13T15:01:58Z

Once more unto the breach - in terms of what one might "expect" of a library and what another PyCQA project is comfortable maintaining:
https://github.com/PyCQA/bandit/blob/master/bandit/core/manager.py#L69

Integrating with bandit, which shares a huge amount of very similar code and is also primarily a CLI application, is this simple:

bandit_config = BanditConfig()
bandit_manager = BanditManager(bandit_config, 'file')
bandit_manager._parse_file(file_name, io.BytesIO(source_bytes), [file_name])
for result in bandit_manager.results:
    # do something with structured bandit.core.issue.Issue object

I understand this is a much larger "feature (set)" required for this kind of example to be mirrored by pyflakes, but would it be worth backlogging/roadmapping?

(PS, as an entirely different can of worms - has the PyCQA group as a whole considered whether the common infrastructure - walking paths, parsing ASTs, etc. - could be shared across the many similar package needs? We have some code related to reliable AST parsing across legacy Python versions from within Python 3.x that we would like to contribute, but it would be much nicer to contribute to one place instead of three.)

asottile · 2019-08-13T15:03:55Z

is this simple

You're using a private api

mjbommar · 2019-08-13T15:11:49Z

Yes, a risk we are willing to take. Compared to maintaining a completely stand-alone fork or monkey-patching, I'll take a private API.

asottile · 2019-08-13T15:19:14Z

I'm confused why you think you need this PR or a fork or monkeypatching, can't you just do:

from your_own_module import ListReporter
from pyflakes.api import check

reporter = ListReporter()
check(source_bytes, file_name, reporter=reporter)

mjbommar · 2019-08-13T15:24:44Z

Yes, we can. We simply wanted to commit it to mainline since, as a user, it was otherwise unclear how to access results efficiently. Now, we'll just submit a doc PR under examples somewhere. (PS don't you think all of the pyflakes unit tests would be so much cleaner with a structured Reporter class?)
This latest comment was equal parts "would you accept a feature request issue asking for an interface like bandit?" and "why is everyone in PyCQA re-writing the same code to walk packages and parse ASTs instead of sharing the burden?"

asottile · 2019-08-13T15:28:58Z

The pyflakes unit tests do use structured responses, they utilize the Checker directly which has error instances

Most of the PyCQA tools act independently, some of us are involved in multiple projects but for the most part they're separate

sigmavirus24 · 2019-08-14T13:25:14Z

The PyCQA was also born to be a loose group of projects sharing a goal of improving code quality. To attempt to share resources would require far more coordination than anyone in the group presently has time or energy for. Further, many of us (myself, doc8, bandit) have seen the downsides of trying to have shared infrastructure like that which works everyone and trying to enforce people use that.

While, as @asottile mentions, there are very few of us who work on many projects, most people here have only ever signed up to maintain 1. Asking they start using another dependency and potentially contributing to it/maintaining it under some forced pretense of "shared libraries" that isn't strictly necessary as 99% of our users interact with us through a CLI or Continuous Integration tool, seems demoralizing.

If others were interested in this, I would have expected it would have come up already more widely.

asmeurer · 2019-08-14T17:18:46Z

There was discussion about this a while back at davidhalter/jedi#630.

parso I'm sure would simplify the pyflakes code a lot, but I'm not sure if that is enough motivation to transition the codebase to it. I don't know if any additional features that are now impossible would be unlocked by doing so, but that would be the sort of thing to motivate this.

asmeurer · 2019-08-14T17:20:02Z

I guess one sort of big feature would be the ability to parse Python versions other than the one pyflakes is running in. But now that Python 2 is going away this isn't as important as it was.

mjbommar · 2019-08-14T17:26:23Z

If you go out into enterprise, there's an awful lot of Python 2 out there, and I suspect there will be for another decade or two to come. Plenty of applications bundled it. Go ask an AS/400 or Fortran developer for their opinion...

Exception-handling and 1.x/2.x code handling are exactly what we were looking to submit via PR, but I think you've already done a way better job of what we were trying to do, @asmeurer! I'll ask one of our team to do a feature comparison and we'll submit an issue/PR to your repo if we think there's something we can add.

asmeurer · 2019-08-14T19:16:47Z

To be clear, I'm not the author of parso (unless you are referring to something else).

mjbommar added 3 commits August 11, 2019 20:49

Adding structured reporter for pyflakes output in pyflakes.reporter

6272b54

Adding test cases for ListReporter

7fa28e3

Hard-breaking docstring < max line length

cb7f129

asottile reviewed Aug 12, 2019

View reviewed changes

mjbommar mentioned this pull request Aug 12, 2019

Structured reporter PyCQA/pycodestyle#881

Closed

mjbommar closed this Aug 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a structured list reporter to pyflakes.reporter #462

Add a structured list reporter to pyflakes.reporter #462

mjbommar commented Aug 12, 2019 •

edited

Loading

asottile left a comment

mjbommar commented Aug 12, 2019

asottile commented Aug 12, 2019

mjbommar commented Aug 12, 2019 •

edited

Loading

asottile commented Aug 12, 2019

mjbommar commented Aug 12, 2019

asottile commented Aug 12, 2019

sigmavirus24 commented Aug 12, 2019

sigmavirus24 commented Aug 12, 2019

mjbommar commented Aug 12, 2019

sigmavirus24 commented Aug 12, 2019 •

edited by asottile

Loading

mjbommar commented Aug 12, 2019

sigmavirus24 commented Aug 12, 2019

mjbommar commented Aug 13, 2019

asottile commented Aug 13, 2019

mjbommar commented Aug 13, 2019

asottile commented Aug 13, 2019

mjbommar commented Aug 13, 2019 •

edited

Loading

asottile commented Aug 13, 2019

sigmavirus24 commented Aug 14, 2019

asmeurer commented Aug 14, 2019

asmeurer commented Aug 14, 2019

mjbommar commented Aug 14, 2019

asmeurer commented Aug 14, 2019

Add a structured list reporter to pyflakes.reporter #462

Add a structured list reporter to pyflakes.reporter #462

Conversation

mjbommar commented Aug 12, 2019 • edited Loading

asottile left a comment

Choose a reason for hiding this comment

mjbommar commented Aug 12, 2019

asottile commented Aug 12, 2019

mjbommar commented Aug 12, 2019 • edited Loading

asottile commented Aug 12, 2019

mjbommar commented Aug 12, 2019

asottile commented Aug 12, 2019

sigmavirus24 commented Aug 12, 2019

sigmavirus24 commented Aug 12, 2019

mjbommar commented Aug 12, 2019

sigmavirus24 commented Aug 12, 2019 • edited by asottile Loading

mjbommar commented Aug 12, 2019

sigmavirus24 commented Aug 12, 2019

mjbommar commented Aug 13, 2019

asottile commented Aug 13, 2019

mjbommar commented Aug 13, 2019

asottile commented Aug 13, 2019

mjbommar commented Aug 13, 2019 • edited Loading

asottile commented Aug 13, 2019

sigmavirus24 commented Aug 14, 2019

asmeurer commented Aug 14, 2019

asmeurer commented Aug 14, 2019

mjbommar commented Aug 14, 2019

asmeurer commented Aug 14, 2019

mjbommar commented Aug 12, 2019 •

edited

Loading

mjbommar commented Aug 12, 2019 •

edited

Loading

sigmavirus24 commented Aug 12, 2019 •

edited by asottile

Loading

mjbommar commented Aug 13, 2019 •

edited

Loading