New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking for non-ast syntax errors #20
Conversation
This required removing a test that checked that a lone return didn't crash pyflakes (the new test added for the error should test the same), and changing the doctest scope to a module scope instead of a function scope.
Some cases are not caught, such as while True: def f(): continue but there should be no false positives. Also the special syntax error "SyntaxError: 'continue' not supported inside 'finally' clause" is not handled yet either.
Just pushed a basic implementation of catching |
Just to be sure, I pulled down the source for Python 2.5 and it looks like the relevant code for continue and break outside of loops is the same as in CPython master. There will definitely be differences in Python versions if I move on to some of the more esoteric errors. |
Looks like a good start. Thanks for taking on this work! |
For example, while True: def f(): continue is invalid syntax.
This is a special syntax error in Python with its own error message. Pyflakes should now, I believe, cover every syntax error surrounding the 'continue' and 'break' statements.
OK, I believe I've handled every syntax error surrounding the I do intend to finish this with every syntax error, but if you've noticed that I haven't pushed anything in a while and are happy with what's here, feel free to merge early. |
Your tests failed on flake8: https://travis-ci.org/pyflakes/pyflakes/jobs/64644431#L252 |
Yeah, I noticed that, but didn't get a chance to fix it yet. I want to reiterate #8 (comment) For some reason, when I run flake8 locally I get a lot more errors:
which is why I missed this (I didn't notice the one of those that was added by my changes). |
if isinstance(node, ast.Continue): | ||
self.report(messages.ContinueOutsideLoop, node) | ||
else: # ast.Break | ||
self.report(messages.BreakOutsideLoop, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be broken up into functions so it can be more clear what the overall algorithm is?
Flat is better than nested.
-- from The Zen of Python, by Tim Peters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this (I told you I would get stalled). It's not clear to me how to break this up. Would it be clearer if it were written recursively instead of using a while loop? I can definitely add more comments explaining the algorithm and what it's checking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than comments, I'd look for loops which could be broken out into specific functions. Then the name of the functions can serve as the documentation.
The high-level algorithm here, as I understand it, is "when a continue or break is encountered, it's an error if that's not in a loop". So I might expect to find a function named isInLoop()
. Then maybe CONTINUE()
might be something like:
def CONTINUE(self, node):
if self.isInLoop(node):
return
if isinstance(node, ast.Continue):
self.report(messages.ContinueOutsideLoop, node)
else: # ast.Break
self.report(messages.BreakOutsideLoop, node)
Also, the operation, "traverse the parents until finding a node matching some predicate" has become a pretty common operation, so that can be a function too. Just grepping for parent
I find a few, the poorly named getParent
, and also on_conditional_branch()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's roughly the algorithm (and more or less what I had in the first pass), but not completely correct. More accurate is "continue or break is only valid if it is in a loop, and if it a function or class definition, the loop must also be inside the function or class definition. And also for continue if it is in a finally block the loop must also be inside the finally block." I'm not sure if that can be written as a simple predicate (but I may just need to think about it harder).
I guess the algorithm itself is not too complicated (it's just, walk up the tree and note what you see first, a loop, a function definition, a class definition, or (for continue
) a finally block). The complications are arising from the fact that for loops and finally blocks, the nodes have different subtrees, so just saying "the continue
is in the ast.For
is not enough because it could be in the orelse
part. So maybe that's what really should be abstracted here.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also occurs to me that this algorithm is a rather inefficient way of dealing with this issue. When I see a loop, I don't know if I am in the loop or the orelse part, so I walk the whole tree to make sure I'm not in the orelse part. But I really ought to be keeping track of the previous node and checking that way. Fixing this may also make the whole algorithm clearer (and I also probably should comment the code a little more). Let me give it a try and you can see what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, the latest version should be much easier to read. I don't think it's necessary to split it into a function at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus the algorithm itself no longer uses the for/else construct that it was so concerned about, which is a plus, as few people understand how to read that correctly.
Sounds like you have a very new version of pep8 installed |
OK, I've got everything now except for some errors around starred assignment unpacking. I'm a little confused about the exact logic for them, as it isn't immediately obvious from reading compile.c. Furthermore, the things that are allowed are changing in Python 3.5 (https://www.python.org/dev/peps/pep-0448/). There are also some 3.5 errors around the new await and async syntax that I haven't handled. I'm probably going to punt on these last few errors. I don't want to bother with 3.5 stuff when it isn't even released yet. I'm fairly certain I understand how starred assignments work in 3.4, but PEP 448 confuses things. I say merge this pull request and open issues for those things. |
Those flake8 warnings are bullshit. I'm not going to fix that. Read the PEP. |
Sorry about the pep8 annoyance. I can appreciate the irony, but also pyflakes is probably most popular as a dependency for flake8, or at least that's what I infer from the download numbers on PyPI. Overall this looks pretty good. Thanks for writing good tests. I'll want to verify that this still performs well before merging it. I'm out of town and pretty busy this week so I'll try to tackle next week. |
Also comment on what it is doing.
These actually make the code harder to read, and not to mention that these examples were copied exactly from the CPython source, but we've got to make the robots happy (and also the checker tools ;)
OK, I've fixed the merge conflicts, and even the stupid flake8 "errors" (which actually make the code harder to read and no longer even follow what PEP 8 actually says, not to mention that the code in question was copied exactly from the CPython source). |
I don't see any reason this couldn't be merged if we can demonstrate that everything still performs well. I'll get to that eventually, but if someone else wanted to expedite the process, it could be a simple as running and timing a few runs of pyflakes against some large codebase with and without this patch. |
I'll test this today against some OpenStack projects because most are absolutely gigantic. |
I did a little test against one of my repos
I ran those a few times and there was a little bit of variance. I suspect the runtime in any case is dominated by IO (meaning if you really care about performance you might look at doing things asynchronously). |
OK, that looks good. Would you mind squashing and rebasing? Then we are good to merge. |
I'd prefer not to, if it's all the same to you. |
I'm happy to squash and rebase for you, though probably you would be able to write better descriptions. It's a big enough change that it could be a couple commits, as long as each is a complete and coherent unit of work. We don't need a whole bunch of merge commits and commits fixing the errors in the previous commits in the history, though. |
I'm going to give @sigmavirus24 an opportunity to complete his testing, then I'll proceed on squashing (if necessary) and merging this. |
I'm of the opinion that rebasing and squashing are generally bad things. I prefer for my commit history to be just that. Of course, it's your project, so you are completely free to destroy that history if you so desire. I personally just want this feature to get into pyflakes (because I'm tired of it not catching syntax errors). |
Results: With this branch:
Without the patch:
So there doesn't seem to be a measurable difference. (Also I'm testing on Python 2.7.10 on OSX.) |
would you mind resolving conflicts with master please? |
Done. |
Committed as 7f751be |
This is the successor to #8.
So far, I have only extended #13 to check for return at a module scope.
I have worked on checking for
break
andcontinue
outside of loops, but it's tricky to get all the corner cases right. I'll need to double check what Python does with every type of ast node to make sure I don't miss anything. For instance, a naive "bubble up until we see a loop" is wrong, because the following are all syntax errorsThe compile.c code in CPython uses some different kind of ast object that already knows what kinds of valid blocks a node is in, so doesn't seem to be directly helpful in this case.
I do plan to push more changes up here, though, including this one when I finish it. Feel free to merge early, however, if the work so far is fine (I may get stalled on this, so don't feel that you need to hold off until everything is finished).