Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End column/lineno #454

Open
asmeurer opened this issue Jul 15, 2019 · 13 comments · May be fixed by #649
Open

End column/lineno #454

asmeurer opened this issue Jul 15, 2019 · 13 comments · May be fixed by #649

Comments

@asmeurer
Copy link
Contributor

It would be useful to have end_col and end_lineno attributes on Message objects. The way it would work is that (col, lineno) to (end_col, end_lineno) would represent the range of characters for the error. For instance, for undefined names, the col, lineno already points to the first character, and the end versions would point to the last character in the word.

I haven't looked yet how easy this is to implement. I wanted to post it here to see if anyone had any thoughts. I might make a pull request for it later (time permitting).

@asottile
Copy link
Member

This would only be useful / easy in python3.8+ where end_lineno / end_col_offset are part of the ast nodes -- elsewhere I believe this to be intractably difficult

@asmeurer
Copy link
Contributor Author

I hadn't even considered whether the ast itself had this built in. I was just thinking of computing it manually. For nodes that represent names or keywords, there is a natural length.I guess for other nodes it could be hard. Just glossing over messages.py, I'd say over half the messages relate to either a name or a keyword, so for those at least it would be quite easy to get this.

I forgot to mention that my use case here is that I'm writing a tool that shows errors in an interactive program, and I want to highlight the exact chunk of code that has a warning associated with it. I don't think the command line API should use this, unless there is an already accepted way to do so.

Another question is whether (end_col, end_lineno) should point to the last character of a word or the character after the last character of a word (for "fake" 0-indexing). What does the Python 3.8 AST do?

@asottile
Copy link
Member

here's some output from py38, I haven't had time to look at the indexes and what they mean

$ cat t.py 
print(
    a + b +
    """\
    foo
    """ +
    a.b.c(d).e()
)
$ astpretty t.py 
Module(
    body=[
        Expr(
            lineno=1,
            col_offset=0,
            end_lineno=7,
            end_col_offset=1,
            value=Call(
                lineno=1,
                col_offset=0,
                end_lineno=7,
                end_col_offset=1,
                func=Name(lineno=1, col_offset=0, end_lineno=1, end_col_offset=5, id='print', ctx=Load()),
                args=[
                    BinOp(
                        lineno=5,
                        col_offset=8,
                        end_lineno=6,
                        end_col_offset=16,
                        left=BinOp(
                            lineno=2,
                            col_offset=10,
                            end_lineno=5,
                            end_col_offset=7,
                            left=BinOp(
                                lineno=2,
                                col_offset=4,
                                end_lineno=2,
                                end_col_offset=9,
                                left=Name(lineno=2, col_offset=4, end_lineno=2, end_col_offset=5, id='a', ctx=Load()),
                                op=Add(),
                                right=Name(lineno=2, col_offset=8, end_lineno=2, end_col_offset=9, id='b', ctx=Load()),
                            ),
                            op=Add(),
                            right=Constant(lineno=3, col_offset=4, end_lineno=5, end_col_offset=7, value='    foo\n    ', kind=None),
                        ),
                        op=Add(),
                        right=Call(
                            lineno=6,
                            col_offset=4,
                            end_lineno=6,
                            end_col_offset=16,
                            func=Attribute(
                                lineno=6,
                                col_offset=4,
                                end_lineno=6,
                                end_col_offset=14,
                                value=Call(
                                    lineno=6,
                                    col_offset=4,
                                    end_lineno=6,
                                    end_col_offset=12,
                                    func=Attribute(
                                        lineno=6,
                                        col_offset=4,
                                        end_lineno=6,
                                        end_col_offset=9,
                                        value=Attribute(
                                            lineno=6,
                                            col_offset=4,
                                            end_lineno=6,
                                            end_col_offset=7,
                                            value=Name(lineno=6, col_offset=4, end_lineno=6, end_col_offset=5, id='a', ctx=Load()),
                                            attr='b',
                                            ctx=Load(),
                                        ),
                                        attr='c',
                                        ctx=Load(),
                                    ),
                                    args=[Name(lineno=6, col_offset=10, end_lineno=6, end_col_offset=11, id='d', ctx=Load())],
                                    keywords=[],
                                ),
                                attr='e',
                                ctx=Load(),
                            ),
                            args=[],
                            keywords=[],
                        ),
                    ),
                ],
                keywords=[],
            ),
        ),
    ],
    type_ignores=[],
)

@asottile
Copy link
Member

(I happen to know BinOp is slightly broken, but will be fixed in the next beta)

@asmeurer
Copy link
Contributor Author

It looks like the end column is after the last character, so that end_col - col gives the length of the item (if it's on the same line). For instance: Name(lineno=1, col_offset=0, end_lineno=1, end_col_offset=5, id='print', ctx=Load()).

I would suggest just using this. For < 3.8 or things where offsets don't make sense, we can set the end column/lineno to be the same as the start.

I'm curious where this work in CPython is going? Will there someday be a standard roundtrippable AST? It would also be nice if this sort of thing made its way to the actual parser, so that syntax errors could point to ranges instead of just positions in the source.

@asottile
Copy link
Member

I've been ~loosely following it, it seems there's two directions right now:

  • improve the current ast for diagnostic purposes (mypy, tracebacks, etc.)
  • build a replacement CST (essentially a rewrite of the ast and/or lib2to3's CST)

the latter I don't think has a concrete (:laughing:) plan yet

@Akuli
Copy link

Akuli commented Aug 26, 2020

would be nice if python-language-server would show pyflakes unused import errors so that it ends in the right place

for example, if I currently do from typing import List, Tuple, Dict, and I don't use Tuple anywhere but I use List and Dict, then my editor highlights all of Tuple, Dict with an error message that says "Tuple imported but unused"

@krassowski
Copy link

Would it be a good idea to expose the original loc as a member of Message and allow reporters to use it as-is?

class Message(object):
message = ''
message_args = ()
def __init__(self, filename, loc):
self.filename = filename
self.lineno = loc.lineno
self.col = getattr(loc, 'col_offset', 0)

Alternatively, would you accept a PR that adds .end_col and .end_lineno conditionally on presence of end_col_offset and end_lineno? This seems like a minimal change which could be as simple as adding:

self.end_col_offset = getattr(loc, 'end_col_offset', None)
self.end_lineno = getattr(loc, 'end_lineno', None)

I would really like to improve reporting from pyflakes in editors using revived python-lsp-server such as Spyder and JupyterLab-LSP.

@asmeurer
Copy link
Contributor Author

My understanding from the discussions here is that that the pyflakes team is in support of this change. I think adding end_offset attributes is the better way to go (although IMO a better default would be the same as the start rather than None).

@krassowski
Copy link

Thank you for the quick reply.

although IMO a better default would be the same as the start rather than None

I don't have a strong feeling, but a slight preference to None as we cannot draw zero character width underline to highlight such errors. If we see None we can just default to end of line (as we do currently). If it is indeed present and the same as the start then we would likely want to fix it (except for very specific situations where we indeed cannot pinpoint any range whatsoever, like truncation errors - and we would want to invoke a different code path to highlight those).

@krassowski krassowski linked a pull request Jul 17, 2021 that will close this issue
3 tasks
@sigmavirus24
Copy link
Member

I would disagree with @asmeurer. I see no statement of support. I see statements of fact (this is possible in 3.8+ but not before it). It would be more possible if we replaced ast but that's not tractable.

I think there's going to be a problem here where projects supporting multiple versions with multiple developers across those versions will see different information about the project. I don't think changing the metadata about reports across versions of Python is going to be a good user experience and I'm not confident we should add this until we can drop Python versions prior to 3.8

@krassowski
Copy link

:(

I think there's going to be a problem here where projects supporting multiple versions with multiple developers across those versions will see different information about the project.

The proposed change does not influence default reporters at all, it only enables custom reporters to make use of it. As a researcher and as an instructor I would like to get better highlighting of errors and warnings. We are currently looking at PEP 657 in Python 3.11 which makes wonders for quick understanding of tracebacks. I would have thought this PEP is a statement of direction towards making user life easier (and it is directly related to this requests as it pertains narrowing down where a flagged issue arises in the code).

I don't think changing the metadata about reports across versions of Python is going to be a good user experience and I'm not confident we should add this until we can drop Python versions prior to 3.8

What is you policy on dropping support? I was a bit confused by seeing that you still support versions 2.7, 3.4 and 3.5 although those no longer receive security fixes. I am genuinely asking because if the pre-condition of dropping versions prior to Python 3.8 will only become true in 4 years or 5 years, I will just monkey-patch it downstream.

@asmeurer
Copy link
Contributor Author

I don't have a strong feeling, but a slight preference to None as we cannot draw zero character width underline to highlight such errors. If we see None we can just default to end of line (as we do currently). If it is indeed present and the same as the start then we would likely want to fix it (except for very specific situations where we indeed cannot pinpoint any range whatsoever, like truncation errors - and we would want to invoke a different code path to highlight those).

Why can't you draw a zero width line? Most code that you would write to draw a line would just do the right thing (draw nothing) if you give it an equal start and stop. On the other hand, it would raise a TypeError if you pass it None.

I would disagree with @asmeurer. I see no statement of support. I see statements of fact (this is possible in 3.8+ but not before it). It would be more possible if we replaced ast but that's not tractable.

This comment very much reads like a statement of support to me ("useful") #454 (comment).

I think there's going to be a problem here where projects supporting multiple versions with multiple developers across those versions will see different information about the project. I don't think changing the metadata about reports across versions of Python is going to be a good user experience and I'm not confident we should add this until we can drop Python versions prior to 3.8

As @krassowski has pointed out, this doesn't change any metadata, only adds new attributes. Existing code that doesn't know about these attributes will do nothing (just to be clear, the suggestion here is only to add new attributes to the Python classes, not to add end column information to the command line output, which I agree would be a backwards incompatible change).

As for different versions of Python doing different things, this is already how pyflakes works. If you run pyflakes in Python 3.7 on code that uses assignment expressions, it will fail (and as you well know, this used to be a huge problem back when Python 2 was more commonly used). The general consensus has always been that that's just the way things are. The only way to avoid that would be to manually backport all new Python syntax features, which would entail basically using a custom parser (I've suggested using parso do to this in the past, which does in fact support parsing multiple versions of Python from a single version).

Incidentally, for messages that involve a variable name (like unused variable warnings), it's pretty easy to manually create the end column. Just add len(variable) to the start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants