Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bikesheds #3

Open
gvanrossum opened this issue May 2, 2022 · 22 comments
Open

Bikesheds #3

gvanrossum opened this issue May 2, 2022 · 22 comments

Comments

@gvanrossum
Copy link
Collaborator

gvanrossum commented May 2, 2022

Eventually these may each need their own issue? For now, some bullets, some with recommendations.

Syntax

  • Space between tag and string literal? I.e. is tag "abc" the same as tag"abc"? [rec: disallow]
    • If yes, do we recommend writing a space between tag and string? I.e. do we prefer tag"abc" or tag "abc"?
  • Multi-string? I.e. is tag"a" "b" "c" allowed, presumably meaning tag"abc"? [rec: disallow]
  • Raw strings? I suppose we should disallow things like tag r"abc" -- but perhaps we should not interpret backslashes? I.e. tag "\a\b\c" would call effectively tag(r"\a\b\c"), i.e. a string of six characters.
  • Do we only allow a single identifier in front of the string, or do we allow a dotted name? Or perhaps any atomic expression (e.g. a parenthesized expression) or even any primary (e.g. foo().tag"abc")? [rec: single name only]
  • Do we allow a call, subscript or attribute after the string literal, e.g. tag "abc" [i]? (You can write f"xyz"[i].) [rec: yes]
  • Do we support all the special syntax allowed in f-string interpolations, i.e. {x!r}, {x!s}, {x!a}, {x = }? [rec: yes]
  • It definitely looks a little weird to see len"foo" (which is the same as len("foo")). I'm guessing that's why JavaScript uses backticks for tagged templates?

Semantics

  • Should the string we put in the thunk representing the raw string include the enclosing curly braces? [rec: it should just be the evaluatable expression]
  • We need to specify the name and fields of the Thunk class.
  • We need to specify how the tag function gets called. [rec: with a sequence of non-empty strings and Thunks]
@ericvsmith
Copy link
Collaborator

ericvsmith commented May 2, 2022

Re: Multi-string? I.e. is tag "a" "b" "c" allowed, presumably meaning tag "abc"?

f-strings do not do this. If x=3, then f"{x}" "{x}" is "3{x}".

@jimbaker
Copy link
Owner

jimbaker commented May 2, 2022

Re: Space between tag and string literal? No, this is not possible with other prefixes. This also is a potential point of confusion where users may want to combine prefixes together, such as tag1 tag2"abc". This is also addresses raw strings with tag r"abc" -- all literal strings in the tagstring will be raw, and then decoded by the tag. This allows someone to write a LaTeX tag, such as latex"{member} \in X".

@jimbaker
Copy link
Owner

jimbaker commented May 4, 2022

Do we support all the special syntax allowed in f-string interpolations, i.e. {x!r}, {x!s}, {x!a}, {x = }?

Yes. Note that for some text {x=}, this is rewritten at the parse stage as being the equivalent of some text x={x}. So we will need to add some extra support to recover the original raw string.

@ericvsmith
Copy link
Collaborator

Yes. Note that for some text {x=}, this is rewritten at the parse stage as being the equivalent of some text x={x}. So we will need to add some extra support to recover the original raw string.

This is not in general reversible, though. We had this same issue with stringized annotations. It didn't matter there, since we weren't concerned with complete fidelity. f'x={x!r}' and f'{x=}' generate the same AST.

>>> ast.dump(ast.parse("f'x={x!r}'"))
"Module(body=[Expr(value=JoinedStr(values=[Constant(value='x=', kind=None), FormattedValue(value=Name(id='x', ctx=Load()), conversion=114, format_spec=None)]))], type_ignores=[])"
>>> ast.dump(ast.parse("f'{x=}'"))
"Module(body=[Expr(value=JoinedStr(values=[Constant(value='x=', kind=None), FormattedValue(value=Name(id='x', ctx=Load()), conversion=114, format_spec=None)]))], type_ignores=[])"

@jimbaker
Copy link
Owner

jimbaker commented Jun 13, 2022

formatspec should be an empty string '', not None. This allows for direct usage of the formatspec with

format(getvalue(), formatspec)

as opposed to requiring

format(getvalue(), '' if formatspec is None else formatspec)

Interestingly I haven't really being working with formatspec except to explore creative 😁 usages, but this is not so likely for real usage.

@rmorshea
Copy link
Collaborator

rmorshea commented Jun 21, 2022

As I'm writing I'm finding the name "tag string" to be a bit awkward to use. I don't have the grammatical chops to justify it, but having two nouns right next to each other doesn't feel right to me. Given that, I think it would be better if we used the term "tagged string" as that uses an adjective to describe the string as being "tagged". This also aligns more closely with the name "f-string" which is meant to abbreviate "formatted string". For comparison, the JS equivalent is similarly called a "tagged template".

I had also been using "string tag" to refer to the thing which interpolates a "tagged string". However, I think @jimbaker has been using "tag function" and, even though it could technically be an object with a __call__ method, I think "string tag" could easily get confused for "tagged string" as you're reading. Thus, I'll be using "tag function" from now on.

@jimbaker
Copy link
Owner

@rmorshea So we have two choices for "tag string" per this insightful Wikipedia article on noun adjuncts, https://en.wikipedia.org/wiki/Noun_adjunct:

  • Use a noun adjunct construction. So this is "tag string" or possibly "tag-string" (noun adjuncts are occasionally hyphenated with the noun being modified).
  • Use an adjectivally inflected construction. So this is "tagged string".

I had not really thought about this aspect of English before!

@rmorshea
Copy link
Collaborator

rmorshea commented Jun 23, 2022

In retrospect, I don't know why "tag function" was fine to my eye, but "tag string" was not, since they're effectively the same grammatical structure. Your original noun adjunct naming also has a nice symmetry I didn't appreciate before with the common "tag" prefix that we could use in naming other things.

@rmorshea
Copy link
Collaborator

rmorshea commented Jun 23, 2022

Also, while we're on the topic of naming, where did "Thunk" come from? I've been assuming it's a combination of "tuple" and "chunk", but I haven't seen any explicit reference to that.

@gvanrossum
Copy link
Collaborator Author

Thunk is a very old term used for a compiler-generated piece of code representing a parameter. It dates back to Algol-60 (of which I have very fond memories). https://en.wikipedia.org/wiki/Thunk

@benji-york
Copy link

There was a discussion on Lobste.rs today that lead to some interesting thoughts that seem closely related to this work, so I thought I would share: https://dotat.at/@/2020-09-17-generalized-string-literal-syntax-10-years-later.html

@Archmonger
Copy link

Archmonger commented Feb 9, 2023

Deleting/moving this comment from #20


I prefer a html @ "<div>" syntax for all tagstrs more than the proposed html"<div>" syntax.

Single letter keywords, such as f,r, and b are allowed to be visually attached to a string since they are not a PEP8 variable names. But html"..." looks like someone accidentally forgot a space between the variable html and the string literal "...".

A couple of other benefits come to mind

  1. It reads better. When voicing it, it sounds like: "My HTML is at 'string'"
  2. Would allow for compatibility with f-strings such as html @ f"<div>{my_val}</div>
  3. Pattern feels comfortable due to the existence of email addresses trivializing the @ symbol.

@jimbaker
Copy link
Owner

The problem with @ is that it already has a valid parse in Python. So let's write the following:

>>> class HTML:
...   def __matmul__(self, other):
...     print(f'Multiplying {self=} with {other=}')
...     return 42
...
>>>
>>> html = HTML()
>>> html @ '<div>'
Multiplying self=<__main__.HTML object at 0x7faa5602f0d0> with other='<div>'
42

Obviously we can do something with this functionality with @, but we lose laziness, interpolation control more generally, etc.

The advantage of using the tag string approach is that it is not currently valid syntax, so we can use it in this interesting way.

@Archmonger
Copy link

But html'mystr' visually looks confusing/ambiguous. It's possible the Python community will have the same initial visceral reaction over it as they did with the PEP for narwhal operators :=.

@rmorshea
Copy link
Collaborator

It seems a little hyperbolic to assume that's how people will react as I think that could be true of almost any change to the language. But regardless, this syntax has the advantage that Python already has string prefixes and that Javascript's template literals do the same, but with backticks. I'm not sure that we ruled out the possibility of using backticks, but that's another way this could be made visually distinct from normal string declarations.

@gvanrossum
Copy link
Collaborator Author

Could we update this issue with (tentative) decisions made on various issues brought up above?

Or perhaps close this in favor of more pointed issues for the remaining open issues? E.g. \N{...}. We already have separate issues about some points, e.g. #4, #5.

@jimbaker
Copy link
Owner

Yes, I will take care of that, along with other issues. There's too much that's been left open, and we can always reopen any if it comes up.

@arogozhnikov
Copy link

Following this discussion for f-strings, it worth documenting that tagged strings can't be used as docstrings.

@gvanrossum
Copy link
Collaborator Author

Following this discussion for f-strings, it worth documenting that tagged strings can't be used as docstrings.

Do you have a use case?

@arogozhnikov
Copy link

Do you have a use case?

Maybe:

  • trivial checks for right names in docstrings, to ease refactoring, or maybe better links:
    def repeat(string, n_repetitions: int):
        doc'''
            {string} - template
            {n_repetitions} - number of times to repeat
        # these are turned into links in documentation
        See also: {repeat_twice} and {string_utils}
        '''
        return string * n_repetitions
  • conditional reformatting (e.g. rest docstrings produce ok documentation, but are not readable IMO). If a special flag passed, a ReST can be produced
  • (if supported) syntax highlight for in-doc examples

Neither are really important.

@Jacob-Flasheye
Copy link

Jacob-Flasheye commented Dec 11, 2023

Hi, I stumbled upon this repo and I think the features proposed in this PEP are nice, but I have some thoughts and questions. I hope this is the right place for them :)

My understanding if the word "Thunk" is that it means any deferred computation unit (at least that's how I understand it from the little Haskell I've done.) It seems a shame to use the name of such a generic concept on this very specific use case. Maybe Thunk should be some sort of base class and the Thunks in the PEP could be subclasses?

And related to the last syntax point in the OP, I also find foobar"a b c" a bit jarring, and the syntax in PEP 501 looks more natural (in Python at least) to me. Why did you decide to go with PEP 501 syntax (foobar(i"a {b} c"))?

I'm also a bit concerned that this is essentially just another way of evaluating a function, with double quotes instead of parentheses. like wouldn't foobar(a, b, c) and foobar"{a}{b}{c}" be equivalent? Continuing that though, wouldn't this essentially give people all the power of lazy evaluation, just with a funky function call, thus potentially making it used for non-string things? I don't think that's the purpose of the PEP but it seems like a natural consequence to me. But maybe there's something I'm not understanding?

EDIT: @merwok explained that I should go somewhere else with my questions, editing the post so I (hopefully) don't disturb more people than I already have!

@merwok
Copy link

merwok commented Dec 11, 2023

The right place would be here I think: https://discuss.python.org/t/allow-for-arbitrary-string-prefix-of-strings/19740/6

This tracker (I think) is used to work out some issues between the people working on the proposal, and this ticket specifically is for some cosmetic details («bikesheds») and not the project in general.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants