End column/lineno #454

asmeurer · 2019-07-15T23:25:48Z

It would be useful to have end_col and end_lineno attributes on Message objects. The way it would work is that (col, lineno) to (end_col, end_lineno) would represent the range of characters for the error. For instance, for undefined names, the col, lineno already points to the first character, and the end versions would point to the last character in the word.

I haven't looked yet how easy this is to implement. I wanted to post it here to see if anyone had any thoughts. I might make a pull request for it later (time permitting).

The text was updated successfully, but these errors were encountered:

asottile · 2019-07-15T23:30:21Z

This would only be useful / easy in python3.8+ where end_lineno / end_col_offset are part of the ast nodes -- elsewhere I believe this to be intractably difficult

asmeurer · 2019-07-15T23:34:42Z

I hadn't even considered whether the ast itself had this built in. I was just thinking of computing it manually. For nodes that represent names or keywords, there is a natural length.I guess for other nodes it could be hard. Just glossing over messages.py, I'd say over half the messages relate to either a name or a keyword, so for those at least it would be quite easy to get this.

I forgot to mention that my use case here is that I'm writing a tool that shows errors in an interactive program, and I want to highlight the exact chunk of code that has a warning associated with it. I don't think the command line API should use this, unless there is an already accepted way to do so.

Another question is whether (end_col, end_lineno) should point to the last character of a word or the character after the last character of a word (for "fake" 0-indexing). What does the Python 3.8 AST do?

asottile · 2019-07-15T23:36:59Z

here's some output from py38, I haven't had time to look at the indexes and what they mean

$ cat t.py 
print(
    a + b +
    """\
    foo
    """ +
    a.b.c(d).e()
)
$ astpretty t.py 
Module(
    body=[
        Expr(
            lineno=1,
            col_offset=0,
            end_lineno=7,
            end_col_offset=1,
            value=Call(
                lineno=1,
                col_offset=0,
                end_lineno=7,
                end_col_offset=1,
                func=Name(lineno=1, col_offset=0, end_lineno=1, end_col_offset=5, id='print', ctx=Load()),
                args=[
                    BinOp(
                        lineno=5,
                        col_offset=8,
                        end_lineno=6,
                        end_col_offset=16,
                        left=BinOp(
                            lineno=2,
                            col_offset=10,
                            end_lineno=5,
                            end_col_offset=7,
                            left=BinOp(
                                lineno=2,
                                col_offset=4,
                                end_lineno=2,
                                end_col_offset=9,
                                left=Name(lineno=2, col_offset=4, end_lineno=2, end_col_offset=5, id='a', ctx=Load()),
                                op=Add(),
                                right=Name(lineno=2, col_offset=8, end_lineno=2, end_col_offset=9, id='b', ctx=Load()),
                            ),
                            op=Add(),
                            right=Constant(lineno=3, col_offset=4, end_lineno=5, end_col_offset=7, value='    foo\n    ', kind=None),
                        ),
                        op=Add(),
                        right=Call(
                            lineno=6,
                            col_offset=4,
                            end_lineno=6,
                            end_col_offset=16,
                            func=Attribute(
                                lineno=6,
                                col_offset=4,
                                end_lineno=6,
                                end_col_offset=14,
                                value=Call(
                                    lineno=6,
                                    col_offset=4,
                                    end_lineno=6,
                                    end_col_offset=12,
                                    func=Attribute(
                                        lineno=6,
                                        col_offset=4,
                                        end_lineno=6,
                                        end_col_offset=9,
                                        value=Attribute(
                                            lineno=6,
                                            col_offset=4,
                                            end_lineno=6,
                                            end_col_offset=7,
                                            value=Name(lineno=6, col_offset=4, end_lineno=6, end_col_offset=5, id='a', ctx=Load()),
                                            attr='b',
                                            ctx=Load(),
                                        ),
                                        attr='c',
                                        ctx=Load(),
                                    ),
                                    args=[Name(lineno=6, col_offset=10, end_lineno=6, end_col_offset=11, id='d', ctx=Load())],
                                    keywords=[],
                                ),
                                attr='e',
                                ctx=Load(),
                            ),
                            args=[],
                            keywords=[],
                        ),
                    ),
                ],
                keywords=[],
            ),
        ),
    ],
    type_ignores=[],
)

asottile · 2019-07-15T23:38:42Z

(I happen to know BinOp is slightly broken, but will be fixed in the next beta)

asmeurer · 2019-07-16T16:59:00Z

It looks like the end column is after the last character, so that end_col - col gives the length of the item (if it's on the same line). For instance: Name(lineno=1, col_offset=0, end_lineno=1, end_col_offset=5, id='print', ctx=Load()).

I would suggest just using this. For < 3.8 or things where offsets don't make sense, we can set the end column/lineno to be the same as the start.

I'm curious where this work in CPython is going? Will there someday be a standard roundtrippable AST? It would also be nice if this sort of thing made its way to the actual parser, so that syntax errors could point to ranges instead of just positions in the source.

asottile · 2019-07-16T17:04:54Z

I've been ~loosely following it, it seems there's two directions right now:

improve the current ast for diagnostic purposes (mypy, tracebacks, etc.)
build a replacement CST (essentially a rewrite of the ast and/or lib2to3's CST)

the latter I don't think has a concrete (:laughing:) plan yet

Akuli · 2020-08-26T09:52:51Z

would be nice if python-language-server would show pyflakes unused import errors so that it ends in the right place

for example, if I currently do from typing import List, Tuple, Dict, and I don't use Tuple anywhere but I use List and Dict, then my editor highlights all of Tuple, Dict with an error message that says "Tuple imported but unused"

krassowski · 2021-07-16T16:11:21Z

Would it be a good idea to expose the original loc as a member of Message and allow reporters to use it as-is?

pyflakes/pyflakes/messages.py

Lines 6 to 13 in 4cf2189

    
           class Message(object): 
        
               message = '' 
        
               message_args = () 
        
               def __init__(self, filename, loc): 
        
                   self.filename = filename 
        
                   self.lineno = loc.lineno 
        
                   self.col = getattr(loc, 'col_offset', 0)

Alternatively, would you accept a PR that adds .end_col and .end_lineno conditionally on presence of end_col_offset and end_lineno? This seems like a minimal change which could be as simple as adding:

self.end_col_offset = getattr(loc, 'end_col_offset', None)
self.end_lineno = getattr(loc, 'end_lineno', None)

I would really like to improve reporting from pyflakes in editors using revived python-lsp-server such as Spyder and JupyterLab-LSP.

asmeurer · 2021-07-16T20:20:21Z

My understanding from the discussions here is that that the pyflakes team is in support of this change. I think adding end_offset attributes is the better way to go (although IMO a better default would be the same as the start rather than None).

krassowski · 2021-07-17T12:59:08Z

Thank you for the quick reply.

although IMO a better default would be the same as the start rather than None

I don't have a strong feeling, but a slight preference to None as we cannot draw zero character width underline to highlight such errors. If we see None we can just default to end of line (as we do currently). If it is indeed present and the same as the start then we would likely want to fix it (except for very specific situations where we indeed cannot pinpoint any range whatsoever, like truncation errors - and we would want to invoke a different code path to highlight those).

sigmavirus24 · 2021-07-17T14:20:44Z

I would disagree with @asmeurer. I see no statement of support. I see statements of fact (this is possible in 3.8+ but not before it). It would be more possible if we replaced ast but that's not tractable.

I think there's going to be a problem here where projects supporting multiple versions with multiple developers across those versions will see different information about the project. I don't think changing the metadata about reports across versions of Python is going to be a good user experience and I'm not confident we should add this until we can drop Python versions prior to 3.8

krassowski · 2021-07-17T15:02:47Z

:(

I think there's going to be a problem here where projects supporting multiple versions with multiple developers across those versions will see different information about the project.

The proposed change does not influence default reporters at all, it only enables custom reporters to make use of it. As a researcher and as an instructor I would like to get better highlighting of errors and warnings. We are currently looking at PEP 657 in Python 3.11 which makes wonders for quick understanding of tracebacks. I would have thought this PEP is a statement of direction towards making user life easier (and it is directly related to this requests as it pertains narrowing down where a flagged issue arises in the code).

I don't think changing the metadata about reports across versions of Python is going to be a good user experience and I'm not confident we should add this until we can drop Python versions prior to 3.8

What is you policy on dropping support? I was a bit confused by seeing that you still support versions 2.7, 3.4 and 3.5 although those no longer receive security fixes. I am genuinely asking because if the pre-condition of dropping versions prior to Python 3.8 will only become true in 4 years or 5 years, I will just monkey-patch it downstream.

asmeurer · 2021-07-18T00:27:39Z

I don't have a strong feeling, but a slight preference to None as we cannot draw zero character width underline to highlight such errors. If we see None we can just default to end of line (as we do currently). If it is indeed present and the same as the start then we would likely want to fix it (except for very specific situations where we indeed cannot pinpoint any range whatsoever, like truncation errors - and we would want to invoke a different code path to highlight those).

Why can't you draw a zero width line? Most code that you would write to draw a line would just do the right thing (draw nothing) if you give it an equal start and stop. On the other hand, it would raise a TypeError if you pass it None.

I would disagree with @asmeurer. I see no statement of support. I see statements of fact (this is possible in 3.8+ but not before it). It would be more possible if we replaced ast but that's not tractable.

This comment very much reads like a statement of support to me ("useful") #454 (comment).

I think there's going to be a problem here where projects supporting multiple versions with multiple developers across those versions will see different information about the project. I don't think changing the metadata about reports across versions of Python is going to be a good user experience and I'm not confident we should add this until we can drop Python versions prior to 3.8

As @krassowski has pointed out, this doesn't change any metadata, only adds new attributes. Existing code that doesn't know about these attributes will do nothing (just to be clear, the suggestion here is only to add new attributes to the Python classes, not to add end column information to the command line output, which I agree would be a backwards incompatible change).

As for different versions of Python doing different things, this is already how pyflakes works. If you run pyflakes in Python 3.7 on code that uses assignment expressions, it will fail (and as you well know, this used to be a huge problem back when Python 2 was more commonly used). The general consensus has always been that that's just the way things are. The only way to avoid that would be to manually backport all new Python syntax features, which would entail basically using a custom parser (I've suggested using parso do to this in the past, which does in fact support parsing multiple versions of Python from a single version).

Incidentally, for messages that involve a variable name (like unused variable warnings), it's pretty easy to manually create the end column. Just add len(variable) to the start.

krassowski linked a pull request Jul 17, 2021 that will close this issue

Add end coordinates for column and line number #649

Open

3 tasks

krassowski mentioned this issue Aug 24, 2022

Unused tag applied to too big range python-lsp/python-lsp-server#253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End column/lineno #454

End column/lineno #454

asmeurer commented Jul 15, 2019

asottile commented Jul 15, 2019

asmeurer commented Jul 15, 2019

asottile commented Jul 15, 2019

asottile commented Jul 15, 2019

asmeurer commented Jul 16, 2019

asottile commented Jul 16, 2019

Akuli commented Aug 26, 2020

krassowski commented Jul 16, 2021

asmeurer commented Jul 16, 2021

krassowski commented Jul 17, 2021

sigmavirus24 commented Jul 17, 2021

krassowski commented Jul 17, 2021

asmeurer commented Jul 18, 2021

End column/lineno #454

End column/lineno #454

Comments

asmeurer commented Jul 15, 2019

asottile commented Jul 15, 2019

asmeurer commented Jul 15, 2019

asottile commented Jul 15, 2019

asottile commented Jul 15, 2019

asmeurer commented Jul 16, 2019

asottile commented Jul 16, 2019

Akuli commented Aug 26, 2020

krassowski commented Jul 16, 2021

asmeurer commented Jul 16, 2021

krassowski commented Jul 17, 2021

sigmavirus24 commented Jul 17, 2021

krassowski commented Jul 17, 2021

asmeurer commented Jul 18, 2021