Split/Remove Backward tokenization #6363

MichaReiser · 2023-08-05T07:39:51Z

The SimpleTokenizer supports backward lexing. The implementation will break when supporting Python 3.12's new F-String parsing because comments can now appear in parts that appear to be strings.

f"test{more  # quote
}"

My preferred option would be to remove backward lexing altogether. But I'm unsure how to support is_parenthesized_expression without it in the formatter. The problem is that we need to look back from the start of the expression. One option I could think of is to integrate the parenthesize detection into the CommentsVisitor where we already track the parent nodes (necessary to avoid mistaking a as a parenthesized expression in call(a)), and we could store the last start/position and start lexing from there. This would require skipping over some tokens, which I'm not sure we can handle.

The other alternative is to split the SimpleTokenizer into one implementation that only supports forward lexing and a BackwardTokenizer that supports backward lexing, but takes the CommentRanges as a second argument.

CC: @dhruvmanila

The text was updated successfully, but these errors were encountered:

dhruvmanila · 2023-09-14T08:52:27Z

/cc @konstin who's working on this, thanks for taking this up!

charliermarsh · 2023-09-21T00:54:47Z

I believe this landed.

charliermarsh added the internal An internal refactor or improvement label Aug 8, 2023

dhruvmanila mentioned this issue Aug 24, 2023

Support PEP 701: Syntactic formalization of f-strings #6502

Closed

5 tasks

dhruvmanila assigned dhruvmanila and konstin and unassigned dhruvmanila Sep 12, 2023

charliermarsh closed this as completed Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split/Remove Backward tokenization #6363

Split/Remove Backward tokenization #6363

MichaReiser commented Aug 5, 2023 •

edited

dhruvmanila commented Sep 14, 2023

charliermarsh commented Sep 21, 2023

Split/Remove Backward tokenization #6363

Split/Remove Backward tokenization #6363

Comments

MichaReiser commented Aug 5, 2023 • edited

dhruvmanila commented Sep 14, 2023

charliermarsh commented Sep 21, 2023

MichaReiser commented Aug 5, 2023 •

edited