Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking for non-ast syntax errors #20

Closed
wants to merge 21 commits into from
Closed

Conversation

asmeurer
Copy link
Contributor

This is the successor to #8.

So far, I have only extended #13 to check for return at a module scope.

I have worked on checking for break and continue outside of loops, but it's tricky to get all the corner cases right. I'll need to double check what Python does with every type of ast node to make sure I don't miss anything. For instance, a naive "bubble up until we see a loop" is wrong, because the following are all syntax errors

while True:
    def f():
        continue
while True:
    pass
else:
    continue
while True:
    try:
        pass
    finally:
        continue

The compile.c code in CPython uses some different kind of ast object that already knows what kinds of valid blocks a node is in, so doesn't seem to be directly helpful in this case.

I do plan to push more changes up here, though, including this one when I finish it. Feel free to merge early, however, if the work so far is fine (I may get stalled on this, so don't feel that you need to hold off until everything is finished).

This required removing a test that checked that a lone return didn't crash
pyflakes (the new test added for the error should test the same), and changing
the doctest scope to a module scope instead of a function scope.
Some cases are not caught, such as

while True:
    def f():
        continue

but there should be no false positives. Also the special syntax error
"SyntaxError: 'continue' not supported inside 'finally' clause" is not handled
yet either.
@asmeurer
Copy link
Contributor Author

Just pushed a basic implementation of catching continue and break outside of loops. It doesn't handle some cases (such as the finally thing or function definitions inside a loop), but I believe there are no false positives. But do check the logic to see if anything looks wrong.

@asmeurer
Copy link
Contributor Author

Just to be sure, I pulled down the source for Python 2.5 and it looks like the relevant code for continue and break outside of loops is the same as in CPython master. There will definitely be differences in Python versions if I move on to some of the more esoteric errors.

@bitglue
Copy link
Member

bitglue commented May 28, 2015

Looks like a good start. Thanks for taking on this work!

For example,

    while True:
        def f():
            continue

is invalid syntax.
This is a special syntax error in Python with its own error message.

Pyflakes should now, I believe, cover every syntax error surrounding the
'continue' and 'break' statements.
@asmeurer
Copy link
Contributor Author

OK, I believe I've handled every syntax error surrounding the continue and break statements.

I do intend to finish this with every syntax error, but if you've noticed that I haven't pushed anything in a while and are happy with what's here, feel free to merge early.

@sigmavirus24
Copy link
Member

Your tests failed on flake8: https://travis-ci.org/pyflakes/pyflakes/jobs/64644431#L252

@asmeurer
Copy link
Contributor Author

Yeah, I noticed that, but didn't get a chance to fix it yet. I want to reiterate #8 (comment)

For some reason, when I run flake8 locally I get a lot more errors:

$ flake8 pyflakes
pyflakes/scripts/pyflakes.py:8:1: E402 module level import not at top of file
pyflakes/test/harness.py:11:5: E731 do not assign a lambda expression, use a def
pyflakes/test/harness.py:12:5: E731 do not assign a lambda expression, use a def
pyflakes/messages.py:152:1: E302 expected 2 blank lines, found 1
pyflakes/checker.py:26:1: E402 module level import not at top of file
pyflakes/checker.py:230:21: W503 line break before binary operator
pyflakes/checker.py:231:21: W503 line break before binary operator
pyflakes/checker.py:725:21: W503 line break before binary operator

which is why I missed this (I didn't notice the one of those that was added by my changes).

if isinstance(node, ast.Continue):
self.report(messages.ContinueOutsideLoop, node)
else: # ast.Break
self.report(messages.BreakOutsideLoop, node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be broken up into functions so it can be more clear what the overall algorithm is?

Flat is better than nested.

-- from The Zen of Python, by Tim Peters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this (I told you I would get stalled). It's not clear to me how to break this up. Would it be clearer if it were written recursively instead of using a while loop? I can definitely add more comments explaining the algorithm and what it's checking for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than comments, I'd look for loops which could be broken out into specific functions. Then the name of the functions can serve as the documentation.

The high-level algorithm here, as I understand it, is "when a continue or break is encountered, it's an error if that's not in a loop". So I might expect to find a function named isInLoop(). Then maybe CONTINUE() might be something like:

def CONTINUE(self, node):
    if self.isInLoop(node):
        return
    if isinstance(node, ast.Continue):
        self.report(messages.ContinueOutsideLoop, node)
     else:  # ast.Break
        self.report(messages.BreakOutsideLoop, node)

Also, the operation, "traverse the parents until finding a node matching some predicate" has become a pretty common operation, so that can be a function too. Just grepping for parent I find a few, the poorly named getParent, and also on_conditional_branch().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's roughly the algorithm (and more or less what I had in the first pass), but not completely correct. More accurate is "continue or break is only valid if it is in a loop, and if it a function or class definition, the loop must also be inside the function or class definition. And also for continue if it is in a finally block the loop must also be inside the finally block." I'm not sure if that can be written as a simple predicate (but I may just need to think about it harder).

I guess the algorithm itself is not too complicated (it's just, walk up the tree and note what you see first, a loop, a function definition, a class definition, or (for continue) a finally block). The complications are arising from the fact that for loops and finally blocks, the nodes have different subtrees, so just saying "the continue is in the ast.For is not enough because it could be in the orelse part. So maybe that's what really should be abstracted here.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also occurs to me that this algorithm is a rather inefficient way of dealing with this issue. When I see a loop, I don't know if I am in the loop or the orelse part, so I walk the whole tree to make sure I'm not in the orelse part. But I really ought to be keeping track of the previous node and checking that way. Fixing this may also make the whole algorithm clearer (and I also probably should comment the code a little more). Let me give it a try and you can see what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the latest version should be much easier to read. I don't think it's necessary to split it into a function at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus the algorithm itself no longer uses the for/else construct that it was so concerned about, which is a plus, as few people understand how to read that correctly.

@sigmavirus24
Copy link
Member

Sounds like you have a very new version of pep8 installed

@asmeurer
Copy link
Contributor Author

OK, I've got everything now except for some errors around starred assignment unpacking. I'm a little confused about the exact logic for them, as it isn't immediately obvious from reading compile.c. Furthermore, the things that are allowed are changing in Python 3.5 (https://www.python.org/dev/peps/pep-0448/). There are also some 3.5 errors around the new await and async syntax that I haven't handled.

I'm probably going to punt on these last few errors. I don't want to bother with 3.5 stuff when it isn't even released yet. I'm fairly certain I understand how starred assignments work in 3.4, but PEP 448 confuses things. I say merge this pull request and open issues for those things.

@asmeurer
Copy link
Contributor Author

Those flake8 warnings are bullshit. I'm not going to fix that. Read the PEP.

@bitglue
Copy link
Member

bitglue commented Jul 22, 2015

Sorry about the pep8 annoyance. I can appreciate the irony, but also pyflakes is probably most popular as a dependency for flake8, or at least that's what I infer from the download numbers on PyPI.

Overall this looks pretty good. Thanks for writing good tests. I'll want to verify that this still performs well before merging it. I'm out of town and pretty busy this week so I'll try to tackle next week.

These actually make the code harder to read, and not to mention that these
examples were copied exactly from the CPython source, but we've got to make
the robots happy (and also the checker tools ;)
@asmeurer
Copy link
Contributor Author

OK, I've fixed the merge conflicts, and even the stupid flake8 "errors" (which actually make the code harder to read and no longer even follow what PEP 8 actually says, not to mention that the code in question was copied exactly from the CPython source).

@bitglue
Copy link
Member

bitglue commented Nov 5, 2015

I don't see any reason this couldn't be merged if we can demonstrate that everything still performs well. I'll get to that eventually, but if someone else wanted to expedite the process, it could be a simple as running and timing a few runs of pyflakes against some large codebase with and without this patch.

@sigmavirus24
Copy link
Member

I'll test this today against some OpenStack projects because most are absolutely gigantic.

@asmeurer
Copy link
Contributor Author

asmeurer commented Nov 5, 2015

I did a little test against one of my repos

master:

real    1m4.563s
user    0m59.102s

nonast2 (this branch):

real    1m4.555s
user    0m59.319s

nonast (my older branch):

real    1m16.832s
user    1m9.387s

I ran those a few times and there was a little bit of variance. I suspect the runtime in any case is dominated by IO (meaning if you really care about performance you might look at doing things asynchronously).

@bitglue
Copy link
Member

bitglue commented Nov 5, 2015

OK, that looks good. Would you mind squashing and rebasing? Then we are good to merge.

@asmeurer
Copy link
Contributor Author

asmeurer commented Nov 5, 2015

I'd prefer not to, if it's all the same to you.

@bitglue
Copy link
Member

bitglue commented Nov 5, 2015

I'm happy to squash and rebase for you, though probably you would be able to write better descriptions. It's a big enough change that it could be a couple commits, as long as each is a complete and coherent unit of work. We don't need a whole bunch of merge commits and commits fixing the errors in the previous commits in the history, though.

@bitglue
Copy link
Member

bitglue commented Nov 5, 2015

I'm going to give @sigmavirus24 an opportunity to complete his testing, then I'll proceed on squashing (if necessary) and merging this.

@asmeurer
Copy link
Contributor Author

asmeurer commented Nov 5, 2015

I'm of the opinion that rebasing and squashing are generally bad things. I prefer for my commit history to be just that.

Of course, it's your project, so you are completely free to destroy that history if you so desire. I personally just want this feature to get into pyflakes (because I'm tired of it not catching syntax errors).

@sigmavirus24
Copy link
Member

Results:

With this branch:

~/openstack/nova $ for i in {1..3} ; do  time pyflakes nova ; end
real    0m16.336s
user    0m15.684s
sys     0m0.509s

real    0m15.164s
user    0m14.809s
sys     0m0.347s

real    0m15.766s
user    0m15.371s
sys     0m0.361s
~/openstack/glance $ for i in {1..3} ; do time pyflakes glance ; end
real    0m3.622s
user    0m3.355s
sys     0m0.186s

real    0m3.497s
user    0m3.357s
sys     0m0.130s

real    0m3.329s
user    0m3.195s
sys     0m0.128s

Without the patch:

~/openstack/nova $ for i in {1..3}; do time pyflakes nova ; end
real    0m15.705s
user    0m15.251s
sys     0m0.434s

real    0m16.423s
user    0m15.975s
sys     0m0.386s

real    0m15.914s
user    0m15.520s
sys     0m0.348s
~/openstack/glance $ for i in {1..3} ; do time pyflakes glance ; end
real    0m3.440s
user    0m3.300s
sys     0m0.121s

real    0m3.338s
user    0m3.222s
sys     0m0.113s

real    0m3.270s
user    0m3.157s
sys     0m0.105s

So there doesn't seem to be a measurable difference. (Also I'm testing on Python 2.7.10 on OSX.)

@bitglue
Copy link
Member

bitglue commented Nov 12, 2015

would you mind resolving conflicts with master please?

@asmeurer
Copy link
Contributor Author

Done.

@bitglue
Copy link
Member

bitglue commented Nov 13, 2015

Committed as 7f751be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants