What's the status? Help needed? #35

lysnikolaou · 2023-10-13T09:59:38Z

Hey folks! 👋

I was wondering what the status on this project is. I'm interested in helping out, if it's still going on.

I know that there's an initial implementation, but it should probably be changed to work well with PEP 701. I could offer a helping hand and then assist in writing the PEP as well.

jimbaker · 2023-10-14T00:19:33Z

@lysnikolaou thanks for reaching out! It's very much still going on, with slow, steady work by @pauleveritt and myself to figure out exactly how we want to present this as a rather useful addition to Python. We can very much use your help.

First, @gvanrossum already got us to something that's close to PEP 701 as of his branch, but that branch needs a number of updates:

Update with respect to main/any bug fixes around PEP 701, and subsequent work.
Determine exactly how we want to handle named Unicode with \N. Given

def tag(*args):
  return args

>>> fr'\N{{GRINNING FACE}}'
'\\N{GRINNING FACE}'
>>> tag'\N{{GRINNING FACE}}'
('\\N{', 'GRINNING FACE}')
>>> f'\N{GRINNING FACE}'
'😀'

So probably it should be what fr tokenizes to, we should be consistent in using raw strings.

Currently the args are constructed either as a raw str or a tuple. Arguably we want to provide a better UX where it's either a Chunk, subclassing str:

class Chunk(str):
    def __new__(cls, value: str) -> Self:
        chunk = super().__new__(cls, value)
        chunk._decoded = None
        return chunk

    @property
    def decoded(self) -> str:
        """Apply unicode decoding and cache the result, then return.

        Uses the same internal code functionality as Python's parser
        does to perform the actual decode.
        """
        if self._decoded is None:
            self._decoded = self.encode('utf-8').decode('unicode-escape')
        return self._decoded

The specific usage of str.encode('utf-8').decode('unicode-escape') here is correct, it follows what is done in PyTokenizer_translate_into_utf8, recently refactored in the last 2 days in main, but certainly obscure.

Also instead of a tuple, use a Thunk, subclassing the C internal equivalent to NamedTuple:

Conversion = Literal['a', 'r', 's'] | None

class Thunk(NamedTuple):
    getvalue: Callable[[], Any]
    text: str
    conv: Conversion = None
    formatspec: str | None = None

In both cases Chunk and Thunk should be implemented as core types. Note that specific attribute names need to be finalized, note that we use format_spec in PEP 701, but then symmetry argues it should be get_value. I simply have kept it as-is since we started this work.

As for the PEP itself, I have some work that I'm in the process of wrapping up to figure out how to build production quality tag functions for target DSLs like HTML. I will pushing this up shortly. Note that it's not code we would present directly in the PEP, but the approach is something I believe should be mentioned.

In a nutshell:

Parse to an AST, preserving interpolations by using an appropriate placeholder that is compatible with the parser. (For HTML or SQL, this can be x$x; for Python, x_x; etc.) html.parser in the stdlib works well; https://github.com/tobymao/sqlglot is popular for SQL; and so forth. We have lots of parsers available for Python that we can use.
Compile the AST to target code. For HTML, this is targeting WSGI with a code-generated Python generator. (After the fact, I noticed that Jinja 2 uses a similar approach for code generation.
Cache the tag string -> compiled code function with a cache key that depends on the (conv, format spec) from the thunks and string literals (or string chunks).

Alternatively, the parse tree can be directly walked, so long as it ensures that the interpolations are appropriately handled in the context - in other words, we are driving this with a parser for the target DSL. Fundamentally this is what we need to ensure that we have f-string like template syntax, without code injections possible. But as we have seen in the JavaScript space, this is incredibly popular if available, especially for HTML. I even recently saw this support for SQL tagged template literals for JS: https://marketplace.visualstudio.com/items?itemName=frigus02.vscode-sql-tagged-template-literals and https://dev.to/newbie012/please-dont-manually-parameterize-your-sql-queries-3m7k are some examples.

pauleveritt · 2023-10-14T13:40:40Z

It’s great that you are joining in. You in particular will be a huge help, given your background, and now is the ideal time.

For the PEP, I only had one thing that I wanted to finish writing. I feel strongly that we need a solid tag string site at first public mention. Explainers, demos, Docker images, even videos. I project a loud minority pushback.

There’s some machinery needed, like better doctest setup and usage.

beyond that, like Jim, my interest lies in building a compelling system as an example. I have a big body of work I hope to move over.

gvanrossum · 2023-10-14T15:09:48Z

Wow. Are you sure? This would be the first PR with its own PR campaign. It could even be backfiring?

pauleveritt · 2023-10-14T15:49:56Z

I hope it doesn't feel at all like a PR campaign. Rather, putting in extra work in the why. But point taken, it could backfire.

lysnikolaou · 2023-10-15T10:56:12Z

Thanks for all the info everyone! I guess that first step for me would be to have a look at Guido's branch, get a feel for the code (which I'll do next week) and then slowly start to work on any fixes needed.

Regarding the tag string site, another thing we need to consider is whether it'd be putting unnecessary pressure on the SC, cause of the noise the loud crowds will inevitably create, either for or against. One better possibility might be to split it into multiple PEPs, like the team had done for the pattern matching PEPs, which feel similar in size and/or controversy potential?

pauleveritt · 2023-10-15T16:25:44Z

Points taken, thank you both for steering on this. I will focus on the linked examples.

pauleveritt · 2024-05-22T21:21:12Z

I propose moving 1/2/3 above to separate tickets and marking then all as post-PEP implementation. @jimbaker if you agree, I'll do so.

jimbaker · 2024-05-22T23:36:23Z

I propose moving 1/2/3 above to separate tickets and marking then all as post-PEP implementation. @jimbaker if you agree, I'll do so.

+1, let's do this.

pauleveritt · 2024-05-23T19:15:37Z

Actually, I think that's all handled. I'm closing this, but we can re-open if you disagree.

@lysnikolaou look for some text to review, really soon.

pauleveritt closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the status? Help needed? #35

What's the status? Help needed? #35

lysnikolaou commented Oct 13, 2023

jimbaker commented Oct 14, 2023

pauleveritt commented Oct 14, 2023

gvanrossum commented Oct 14, 2023 •

edited

pauleveritt commented Oct 14, 2023

lysnikolaou commented Oct 15, 2023 •

edited

pauleveritt commented Oct 15, 2023

pauleveritt commented May 22, 2024

jimbaker commented May 22, 2024

pauleveritt commented May 23, 2024

What's the status? Help needed? #35

What's the status? Help needed? #35

Comments

lysnikolaou commented Oct 13, 2023

jimbaker commented Oct 14, 2023

pauleveritt commented Oct 14, 2023

gvanrossum commented Oct 14, 2023 • edited

pauleveritt commented Oct 14, 2023

lysnikolaou commented Oct 15, 2023 • edited

pauleveritt commented Oct 15, 2023

pauleveritt commented May 22, 2024

jimbaker commented May 22, 2024

pauleveritt commented May 23, 2024

gvanrossum commented Oct 14, 2023 •

edited

lysnikolaou commented Oct 15, 2023 •

edited