Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the status? Help needed? #35

Closed
lysnikolaou opened this issue Oct 13, 2023 · 9 comments
Closed

What's the status? Help needed? #35

lysnikolaou opened this issue Oct 13, 2023 · 9 comments

Comments

@lysnikolaou
Copy link
Contributor

Hey folks! 馃憢

I was wondering what the status on this project is. I'm interested in helping out, if it's still going on.

I know that there's an initial implementation, but it should probably be changed to work well with PEP 701. I could offer a helping hand and then assist in writing the PEP as well.

@jimbaker
Copy link
Owner

@lysnikolaou thanks for reaching out! It's very much still going on, with slow, steady work by @pauleveritt and myself to figure out exactly how we want to present this as a rather useful addition to Python. We can very much use your help.

First, @gvanrossum already got us to something that's close to PEP 701 as of his branch, but that branch needs a number of updates:

  1. Update with respect to main/any bug fixes around PEP 701, and subsequent work.

  2. Determine exactly how we want to handle named Unicode with \N. Given

def tag(*args):
  return args
>>> fr'\N{{GRINNING FACE}}'
'\\N{GRINNING FACE}'
>>> tag'\N{{GRINNING FACE}}'
('\\N{', 'GRINNING FACE}')
>>> f'\N{GRINNING FACE}'
'馃榾'

So probably it should be what fr tokenizes to, we should be consistent in using raw strings.

  1. Currently the args are constructed either as a raw str or a tuple. Arguably we want to provide a better UX where it's either a Chunk, subclassing str:
class Chunk(str):
    def __new__(cls, value: str) -> Self:
        chunk = super().__new__(cls, value)
        chunk._decoded = None
        return chunk

    @property
    def decoded(self) -> str:
        """Apply unicode decoding and cache the result, then return.

        Uses the same internal code functionality as Python's parser
        does to perform the actual decode.
        """
        if self._decoded is None:
            self._decoded = self.encode('utf-8').decode('unicode-escape')
        return self._decoded

The specific usage of str.encode('utf-8').decode('unicode-escape') here is correct, it follows what is done in PyTokenizer_translate_into_utf8, recently refactored in the last 2 days in main, but certainly obscure.

Also instead of a tuple, use a Thunk, subclassing the C internal equivalent to NamedTuple:

Conversion = Literal['a', 'r', 's'] | None

class Thunk(NamedTuple):
    getvalue: Callable[[], Any]
    text: str
    conv: Conversion = None
    formatspec: str | None = None

In both cases Chunk and Thunk should be implemented as core types. Note that specific attribute names need to be finalized, note that we use format_spec in PEP 701, but then symmetry argues it should be get_value. I simply have kept it as-is since we started this work.

As for the PEP itself, I have some work that I'm in the process of wrapping up to figure out how to build production quality tag functions for target DSLs like HTML. I will pushing this up shortly. Note that it's not code we would present directly in the PEP, but the approach is something I believe should be mentioned.

In a nutshell:

  1. Parse to an AST, preserving interpolations by using an appropriate placeholder that is compatible with the parser. (For HTML or SQL, this can be x$x; for Python, x_x; etc.) html.parser in the stdlib works well; https://github.com/tobymao/sqlglot is popular for SQL; and so forth. We have lots of parsers available for Python that we can use.

  2. Compile the AST to target code. For HTML, this is targeting WSGI with a code-generated Python generator. (After the fact, I noticed that Jinja 2 uses a similar approach for code generation.

  3. Cache the tag string -> compiled code function with a cache key that depends on the (conv, format spec) from the thunks and string literals (or string chunks).

Alternatively, the parse tree can be directly walked, so long as it ensures that the interpolations are appropriately handled in the context - in other words, we are driving this with a parser for the target DSL. Fundamentally this is what we need to ensure that we have f-string like template syntax, without code injections possible. But as we have seen in the JavaScript space, this is incredibly popular if available, especially for HTML. I even recently saw this support for SQL tagged template literals for JS: https://marketplace.visualstudio.com/items?itemName=frigus02.vscode-sql-tagged-template-literals and https://dev.to/newbie012/please-dont-manually-parameterize-your-sql-queries-3m7k are some examples.

@pauleveritt
Copy link
Collaborator

It鈥檚 great that you are joining in. You in particular will be a huge help, given your background, and now is the ideal time.

For the PEP, I only had one thing that I wanted to finish writing. I feel strongly that we need a solid tag string site at first public mention. Explainers, demos, Docker images, even videos. I project a loud minority pushback.

There鈥檚 some machinery needed, like better doctest setup and usage.

beyond that, like Jim, my interest lies in building a compelling system as an example. I have a big body of work I hope to move over.

@gvanrossum
Copy link
Collaborator

gvanrossum commented Oct 14, 2023

Wow. Are you sure? This would be the first PR with its own PR campaign. It could even be backfiring?

@pauleveritt
Copy link
Collaborator

I hope it doesn't feel at all like a PR campaign. Rather, putting in extra work in the why. But point taken, it could backfire.

@lysnikolaou
Copy link
Contributor Author

lysnikolaou commented Oct 15, 2023

Thanks for all the info everyone! I guess that first step for me would be to have a look at Guido's branch, get a feel for the code (which I'll do next week) and then slowly start to work on any fixes needed.

Regarding the tag string site, another thing we need to consider is whether it'd be putting unnecessary pressure on the SC, cause of the noise the loud crowds will inevitably create, either for or against. One better possibility might be to split it into multiple PEPs, like the team had done for the pattern matching PEPs, which feel similar in size and/or controversy potential?

@pauleveritt
Copy link
Collaborator

Points taken, thank you both for steering on this. I will focus on the linked examples.

@pauleveritt
Copy link
Collaborator

I propose moving 1/2/3 above to separate tickets and marking then all as post-PEP implementation. @jimbaker if you agree, I'll do so.

@jimbaker
Copy link
Owner

I propose moving 1/2/3 above to separate tickets and marking then all as post-PEP implementation. @jimbaker if you agree, I'll do so.

+1, let's do this.

@pauleveritt
Copy link
Collaborator

Actually, I think that's all handled. I'm closing this, but we can re-open if you disagree.

@lysnikolaou look for some text to review, really soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants