Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewriting within f-strings #572

Closed
adamchainz opened this issue Nov 25, 2021 · 6 comments · Fixed by #837
Closed

Rewriting within f-strings #572

adamchainz opened this issue Nov 25, 2021 · 6 comments · Fixed by #837

Comments

@adamchainz
Copy link
Sponsor Contributor

This issue also affects django-upgrade: adamchainz/django-upgrade#97

pyupgrade rewrites some names, so we might expect it to do so inside f-strings:

-f"{six.text_type(1)}"'
+f"{str(1)}"

Unfortunately it fails to do so. This is because there's no tokens for the contents of the f-string. Whilst the AST includes the f-strings, the tokenizer leaves the f-string as a single STRING. It's later tokenized inside ast.parse:

In [1]: import ast, tokenize_rt

In [2]: print(ast.dump(ast.parse('f"{x()}"'), indent=2))
Module(
  body=[
    Expr(
      value=JoinedStr(
        values=[
          FormattedValue(
            value=Call(
              func=Name(id='x', ctx=Load()),
              args=[],
              keywords=[]),
            conversion=-1)]))],
  type_ignores=[])

In [3]: tokenize_rt.src_to_tokens('f"{x()}"')
Out[3]:
[Token(name='STRING', src='f"{x()}"', line=1, utf8_byte_offset=0),
 Token(name='NEWLINE', src='', line=1, utf8_byte_offset=8),
 Token(name='ENDMARKER', src='', line=2, utf8_byte_offset=0)]

Perhaps a fix would be to recursively invoke rewriting on f-string nodes.

@asottile
Copy link
Owner

asottile commented Nov 25, 2021

the first hard blocker is the ast does not accurately indicate positions inside f-strings inside the current versions supported so it's difficult to even start approaching this problem (as such I haven't even attempted it)

an idea though, since we have a format string parser already, is to identify the f-strings during a tokenize pass and rewrite each of their format chunks separately? it may work (and potentially avoids the incorrect offsets)

@adamchainz
Copy link
Sponsor Contributor Author

an idea though, since we have a format string parser already, is to identify the f-strings during a tokenize pass and rewrite each of their format chunks separately? it may work (and potentially avoids the incorrect offsets)

Yeah that was what I poorly explained as "recursively invoke rewriting". I will look at trying it in django-upgrade first.

@terryjreedy
Copy link

I believe https://peps.python.org/pep-0701/ Syntactic formalization of f-strings
python/cpython#102856
may facilitate this proposal by parsing fields separate from string parts.
Not sure if will all make it into 3.12.

@sunmy2019
Copy link

I believe https://peps.python.org/pep-0701/ Syntactic formalization of f-strings python/cpython#102856 may facilitate this proposal by parsing fields separate from string parts. Not sure if will all make it into 3.12.

Seems no one is working on the new tokenizer. The tokenizer in Python will still emit the old tokens. (sometimes buggy with new grammar)

@pablogsal
Copy link

I believe https://peps.python.org/pep-0701/ Syntactic formalization of f-strings python/cpython#102856 may facilitate this proposal by parsing fields separate from string parts. Not sure if will all make it into 3.12.

Seems no one is working on the new tokenizer. The tokenizer in Python will still emit the old tokens. (sometimes buggy with new grammar)

This is not correct, we are indeed working on changes to the python tokenizer and we plan to emit the new tokens there

@adamchainz
Copy link
Sponsor Contributor Author

Thanks Python 3.12!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants