-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a structured list reporter to pyflakes.reporter #462
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code you added seems unused outside of tests -- or am I reading this incorrectly?
Correct. The goal was to provide a convenient interface for those who might want to use pyflakes to generate structured information (e.g., populate a database or CSV record) instead of simply printing to stdout/stderr. Is there a preferred way to do this currently? Apologies if so, but we didn't find any cleaner way. |
pyflakes doesn't take any options so even with this code added here I'm not sure it really helps anything? probably the easiest way (and what I would suggest) would be to use a formatter via flake8 instead, for example: $ pip install flake8 flake8-json
...
$ echo 'import os' > t.py
$ flake8 --select=F --format=json t.py # --select=F limits it to pyflakes codes only
{"t.py": [{"code": "F401", "filename": "t.py", "line_number": 1, "column_number": 1, "text": "'os' imported but unused", "physical_line": "import os\n"}]} |
For our purposes, we're trying to run flake8 and pycodestyle on millions of files. We're trying to avoid the overhead of starting new processes and piping data around, and flake8 does more than pyflakes that may not be necessary in some cases. I can understand if you don't see the point for the use case if pyflakes is a library only for flake8. |
this is why I'm suggesting |
|
could probably do something clever with ramdisks, though I don't know your usecase (it seems quite different from normal uses of these tools)
no it's not, I was just suggesting a way that already exists. pyflakes is usuable as a cli tool (and presumably to a limited extent as a library via the api in |
As @asottile said, pyflakes doesn't accept command-line arguments. This is because it doesn't accept checks that could be false-positives. As such, there's little need for command-line arguments and people who want to disable things always do so using Flake8. Completely disregarding whether you could even use this if it were part of PyFlakes, Python reprs are barely structured data. |
It occurred to me after finishing my last comment that
Is likely because you think that you can't disable Flake8's use of multiprocessing. You can. Flake8 hooks directly into pyflakes and pycodestyle and doesn't launch sub-processes or sub-interpreters for those tools (if anything, Flake8 reduces the amount of processing since each tool can generate AST from the same file, tokenize it, etc.). Since you can bypass the chief performance bottleneck you seem worried about, I think it's safe to close this. |
We're parsing literally all of PyPI - every source release. And flake8's approach to creating copies of the data and AST parsing is not helpful for our use case. This is such a simple PR with no mainline code impact and huge performance improvement for our use case; it's hard to understand your position as anything other than a desire not to support pyflakes as a Python library for developers to use directly. It would be nice if this was made clear to the community before anyone else dedicates time to PRs that are unwanted. |
This is hardly contributing in good faith. It would seem that you care more about landing a pull request than the burden it may place on everyone else who maintains this project, present and future. The default stance for any open source project should roughly amount to "No is temporary. Yes is forever". The default answer should always be "No". PyFlakes can be used as a python library one can import. You're already doing that and may projects also do that. Just because it can be imported doesn't mean it's a library that has to include everything and then the kitchen sink. In fact, it's a better library if it doesn't. It's easier to maintain. It's more likely to be maintained into the future so that whatever you're doing doesn't need to be rewritten. But yes, absolutely. We all hate you. We hate our users who import PyFlakes. And whatever other things you want to believe that make it easier for you to go on with the rest of your day.* * This statement is sarcasm. We don't hate anyone. |
I understand; our research is in part an attempt to measure dead code and maintenance burden across the entire ecosystem. Unfortunately, we're going to maintain our own fork to help do this. However, there is literally no way to interact with your modules other than CLI or reverse-engineering undocumented APIs, which only makes it harder for others to help the team. It would be nice if the pyflakes/pycodestyle docs at least included something like these Reporter classes as examples for those with similar needs. |
As someone who has written an integration with PyFlakes, I agree there should be docs to make it easier. I think a PR to add those would be tremendously helpful and valuable and would definitely be merged. |
Once more unto the breach - in terms of what one might "expect" of a library and what another PyCQA project is comfortable maintaining: Integrating with
I understand this is a much larger "feature (set)" required for this kind of example to be mirrored by pyflakes, but would it be worth backlogging/roadmapping? (PS, as an entirely different can of worms - has the PyCQA group as a whole considered whether the common infrastructure - walking paths, parsing ASTs, etc. - could be shared across the many similar package needs? We have some code related to reliable AST parsing across legacy Python versions from within Python 3.x that we would like to contribute, but it would be much nicer to contribute to one place instead of three.) |
You're using a private api |
Yes, a risk we are willing to take. Compared to maintaining a completely stand-alone fork or monkey-patching, I'll take a private API. |
I'm confused why you think you need this PR or a fork or monkeypatching, can't you just do: from your_own_module import ListReporter
from pyflakes.api import check
reporter = ListReporter()
check(source_bytes, file_name, reporter=reporter) |
|
The pyflakes unit tests do use structured responses, they utilize the Most of the PyCQA tools act independently, some of us are involved in multiple projects but for the most part they're separate |
The PyCQA was also born to be a loose group of projects sharing a goal of improving code quality. To attempt to share resources would require far more coordination than anyone in the group presently has time or energy for. Further, many of us (myself, doc8, bandit) have seen the downsides of trying to have shared infrastructure like that which works everyone and trying to enforce people use that. While, as @asottile mentions, there are very few of us who work on many projects, most people here have only ever signed up to maintain 1. Asking they start using another dependency and potentially contributing to it/maintaining it under some forced pretense of "shared libraries" that isn't strictly necessary as 99% of our users interact with us through a CLI or Continuous Integration tool, seems demoralizing. If others were interested in this, I would have expected it would have come up already more widely. |
There was discussion about this a while back at davidhalter/jedi#630. parso I'm sure would simplify the pyflakes code a lot, but I'm not sure if that is enough motivation to transition the codebase to it. I don't know if any additional features that are now impossible would be unlocked by doing so, but that would be the sort of thing to motivate this. |
I guess one sort of big feature would be the ability to parse Python versions other than the one pyflakes is running in. But now that Python 2 is going away this isn't as important as it was. |
If you go out into enterprise, there's an awful lot of Python 2 out there, and I suspect there will be for another decade or two to come. Plenty of applications bundled it. Go ask an AS/400 or Fortran developer for their opinion... Exception-handling and 1.x/2.x code handling are exactly what we were looking to submit via PR, but I think you've already done a way better job of what we were trying to do, @asmeurer! I'll ask one of our team to do a feature comparison and we'll submit an issue/PR to your repo if we think there's something we can add. |
To be clear, I'm not the author of parso (unless you are referring to something else). |
Adding a simple reporter class to log to a list of dictionaries with a consistent format across all message types and sources:
message_class
: analogous to E/W/L, e.g.,"error"
message_type
: more specific type or description, e.g.,"syntax_error"
filename
: file name if provided, else Nonelineno
: line number if provided, else Nonemessage
: "message" from the perspective of the sendersource
: source code if provided, else NoneSample output: