[RFC] String Interpolation #77

sttaft · 2021-06-01T15:32:31Z

This is the first draft of an RFC for the string interpolation feature.

First draft of RFC

Modest formatting fixes

Use code block formatting.

More work on formatting.

glacambre · 2021-06-01T15:48:12Z

Link to the rendered version: https://github.com/sttaft/ada-spark-rfcs/blob/topic/rfc-string-interpolation/considered/rfc-string-interpolation.rst

sttaft · 2021-06-01T16:26:04Z

Rendered version also here (click on "Files changed" at the top, and then "View File" on the "..." menu on the right):
https://github.com/AdaCore/ada-spark-rfcs/blob/7559ac5bd340373b344c4b92d755aadc682f47cb/considered/rfc-string-interpolation.rst

Note that I just fixed a few typos, so this rendered version is somewhat different than what was linked to originally.

Fix a number of typographical errors.

Fabien-Chouteau · 2021-06-16T09:03:56Z

Thanks @sttaft ,

My two cents on this proposal.

1. Referencing variable names

I am not a huge fan of referencing variable names in the template. I think especially with Ada and its most common programming style, names tend to be very long and will result in a template string that is not readable.
If I take an example from the original issue:

Put_Line ($"(Name) is a (Profession)"$)

I think it's going to be pretty rare to have such a clean template. In practice you will have names that are part of a record, or something else more complex. So in my opinion we will most like see patterns that will look like this:

Put_Line ($"$(My_Object.Name) is a $(Professions_Img_Array (My_Object.Profession_Index))"$);

Of course there will be the options to declare renames for strings:

declare
   Name : String renames My_Object.Name;
   Profession : String renames Professions_Img_Array (My_Object.Profession_Index
begin
   Put_Line ($"$(Name) is a $(Profession))"$);
end;

But in the end I don't think this brings as much readability improvement as we could get.
Of course I would like to have other opinions on this.

I see two other way around, and I am introducing a placeholder "syntax" $" "$ (...) just for the examples:

a. Positional:

Put_Line ($"$(0) is $(1)"'$ --  This is a lot of `$` I must say
          (My_Object.Name, 
           Professions_Img_Array (My_Object.Profession_Index));

b. Another variation on naming (I saw this in Python at least)

Put_Line ($"$(Name) is $(Profession)"'$
          (Name       => My_Object.Name, 
           Profession => Professions_Img_Array (My_Object.Profession_Index));

2. Expression in templates:

Put_Line ($"Plop is $(X + Y)"$);

I the same vein as my comment above, I think this is likely to generate templates that will be hard to read. And I am not an implementer, but I am guessing that this will introduce complexity that can be avoided in my opinion.

3. Formatting options

In the RFC $(X+Y, Width => 13) is given as an example. I am personally a fan of printf's %2.3f, %5x, %04d, etc.
If we can make something as compact as that it would be great.

Looking in Python's direction, it could be something like:

Put_Line($"Plop is $(0.2f:Angle)"$)

3. Embedded

As most of you know my focus is on embedded and in particular bare-metal and small device. So I am always worried about how a given feature can be implemented and used in this kind of situation. I don't know how compatible with embedded the current proposal is, but I just want to state that it is important to keep that in mind.

sttaft · 2021-06-16T13:01:26Z

On Wed, Jun 16, 2021 at 5:04 AM Fabien Chouteau ***@***.***> wrote: Thanks @sttaft <https://github.com/sttaft> , My two cents on this proposal. 1. Referencing variable names I am not a huge fan of referencing variable names in the template. I think especially with Ada and its most common programming style, names tend to be very long and will result in a template string that is not readable. If I take an example from the original issue: Put_Line ($"(Name) is a (Profession)"$) I think it's going to be pretty rare to have such a clean template. In practice you will have names that are part of a record, or something else more complex. So in my opinion we will most like see patterns that will look like this: Put_Line ($"$(My_Object.Name) is a $(Professions_Img_Array (My_Object.Profession_Index))"$); Of course there will be the options to declare renames for strings: declare Name : String renames My_Object.Name; Profession : String renames Professions_Img_Array (My_Object.Profession_Indexbegin Put_Line ($"$(Name) is a $(Profession))"$);end; But in the end I don't think this brings as much readability improvement as we could get. Of course I would like to have other opinions on this.

I have been using a language with string interpolation over the past few years, and my experience is that it is much more readable and quite intuitive. This is something where I would suggest we do some prototyping, and then take some existing programs that do a lot of concatenation of literal strings and 'Image (or Put) calls, and see how they look with various approaches to string interpolation.

I see two other way around, and I am introducing a placeholder "syntax" $" "$ (...) just for the examples: - a. Positional: Put_Line ($"$(0) is $(1)"'$ -- This is a lot of `$` I must say (My_Object.Name, Professions_Img_Array (My_Object.Profession_Index)); - b. Another variation on naming (I saw this in Python at least) Put_Line ($"$(Name) is $(Profession)"'$ (Name => My_Object.Name, Profession => Professions_Img_Array (My_Object.Profession_Index));

Using integers as place holders followed by a list of names adds complexity and gets us into the kinds of bugs that are common with printf, where you need to match up parameters with placeholders, and it is easy to have problems during maintenance. Using named parameters is an interesting alternative, but it feels like an extra step and creates a case where any use of an interpolated string must be in a context where it makes sense to add named parameters. String literals can appear in many contexts in Ada, and trying to figure out how to define exactly where the extra named parameters would go would be a completely new concept in Ada, and potentially add complexity to overload analysis, which is already pretty complex in Ada.

- 2. Expression in templates: Put_Line ($"Plop is $(X + Y)"$); I the same vein as my comment above, I think this is likely to generate templates that will be hard to read. And I am not an implementer, but I am guessing that this will introduce complexity that can be avoided in my opinion.

I don't see the added implementation complexity here, since our general approach is to apply <type>'Image, and whether it is a name or an expression makes little difference in Ada, particularly since function calls are considered "name"s in Ada, and both can involve overloading. You have to be able to resolve the type without external context, but that is true in other contexts in Ada (e.g. the operand of a type conversion), so this is not an added complexity.

3. Formatting options In the RFC $(X+Y, Width => 13) is given as an example. I am personally a fan of printf's %2.3f, %5x, %04d, etc. If we can make something as compact as that it would be great. Looking in Python's direction, it could be something like: Put_Line($"Plop is $(0.2f:Angle)"$)

Seems like an interesting approach. It is somewhat redundant to specify the type (with the "f") in this case, and using letters for specifying the radix seems a bit odd given that Ada currently never does that, but some kind of Ada-oriented pattern such as "3.4E2" for specifying the format for a floating-point number, and "16#5#" for specifying a hex integer might be pretty intuitive. "*" might be used for cases where the programmer doesn't care, such as "16#*#" for hex, but no specified number of digits. We probably still need to think about whether we want to pass this to the 'Image function, or to do it all with postprocessing (which is clearly harder for some of these patterns).

3. Embedded As most of you know my focus is on embedded and in particular bare-metal and small device. So I am always worried about how a given feature can be implemented and used in this kind of situation. I don't know how compatible with embedded the current proposal is, but I just want to state that it is important to keep that in mind.

This is syntactic sugar, so there is no magic here. It is going to be equivalent to a string concatenation and calls on 'Image, and the programmer will need to keep that in mind.

…

-Tuck

mgrojo · 2021-06-19T12:16:09Z

I hope editors start to highlight the interpolated variables different to the characters around, otherwise this would be less readable instead of more.

I suppose escape characters (i.e. "\t\n") will not work in traditional string literals, for compatibility. But then people new to the language will start to complain about being unable to write "\tHello, World!\n". Is the introduction of this C-ism really needed?, wouldn't using ASCII entities be more respective with current Ada: use ASCII; $"$(HT)Hello, World!$(LF)"$?

This is syntactic sugar, so there is no magic here. It is going to be equivalent to a string concatenation and calls on 'Image, and the programmer will need to keep that in mind.

Does that mean that interpolated positive integers will have a leading space? I would understand it for the shake of consistency, but that would shock many.

sttaft · 2021-06-19T14:13:44Z

On Sat, Jun 19, 2021 at 8:16 AM Manuel ***@***.***> wrote: ... This is syntactic sugar, so there is no magic here. It is going to be equivalent to a string concatenation and calls on 'Image, and the programmer will need to keep that in mind. Does that mean that interpolated positive integers will have a leading space? I would understand it for the shake of consistency, but that would shock many.

No, there would be a "Trim" applied to the result of 'Image so no extra spaces. The RFC should have made that clear.

…

-Tuck

mgrojo · 2021-06-19T16:04:23Z

considered/rfc-string-interpolation.rst

+of string literal syntaxes, chosen by a prefix letter, but our sense is that
+the string interpolation syntax has emerged as the favorite.  We do not want
+to have lots of different syntaxes, so we have included the escape mechanism
+as part of both of the new string literal syntaxes.  We have chosen '\' as the


This is rendered as ''. It probably needs to be quoted using backticks like this: \

Escape the escape character, and mention that leading and trailing whitespace is trimmed when interpolating the result of 'Image.

Fix the other place we had an un-escaped escape character.

sttaft · 2021-06-19T16:25:26Z

Thanks. I fixed it in both places, and added a mention of trimming leading and trailing white space.

…

-Tuck

On Sat, Jun 19, 2021 at 12:04 PM Manuel ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In considered/rfc-string-interpolation.rst <#77 (comment)>: > +the beginning of all of them. + +Drawbacks +========= + +Hopefully the semantics will be fairly intuitive, but this is certainly +adding complexity to string literals, and + +Prior art +========= + +String interpolation has begun to show up in many languages. Python has a number +of string literal syntaxes, chosen by a prefix letter, but our sense is that +the string interpolation syntax has emerged as the favorite. We do not want +to have lots of different syntaxes, so we have included the escape mechanism +as part of both of the new string literal syntaxes. We have chosen '\' as the This is rendered as ''. It probably needs to be quoted using backticks like this: \ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#77 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AANZ4FMOORPOOQJ75F5TMK3TTS5ZFANCNFSM4545OUPA> .

Some minor clarifications. Fix the sentence fragment in the Drawbacks section, and include a concern mentioned by Fabien.

Two more places needing doubled backslashes.

raph-amiard · 2021-07-06T14:49:35Z

Overall, looking really good!
‘Image doesn’t help when you want to interpolate say an unbounded string into a string, because it’ll typically have quotes around it, but you don’t want that. How do we accommodate this common use case?
One drawback about multiline strings is that they’ll be the first multi-line token in the Ada language. Single line tokens only makes it very easy to tokenize/highlight Ada so far.
Not a fan of the terminating $

Fabien-Chouteau · 2021-08-14T08:41:58Z

@zertovitch I was just thinking about this, maybe it would be interesting to prototype this string interpolation in HAC.

zertovitch · 2021-08-15T15:47:31Z

Why not? A few hints on how to do it: there is an internal type called String_Literals. For adding the string interpolation feature, look at the lines 163+ (currently) in src/hac_sys-parser-expressions.adb . You can add a boolean mode switch, say `Interpolated` activated by the front '$', then parse the literal string in the special mode if Interpolated = True. procedure Primary (FSys_Prim : Symset; X : out Exact_Typ) is -- RM 4.4 (7) F : Opcode; Ident_Index : Integer; begin X := Type_Undefined; Test (CD, Primary_Begin_Symbol + StrCon, FSys_Prim, err_primary_unexpected_symbol); case CD.Sy is >>> when StrCon => X.TYP := String_Literals; Emit_1 (CD, k_Push_Discrete_Literal, Operand_2_Type (CD.SLeng)); -- String Literal Length Emit_1 (CD, k_Push_Discrete_Literal, Operand_2_Type (CD.INum)); -- Index To String IdTab InSymbol (CD); when IDent =>

dsibai · 2021-09-29T08:03:34Z

As discussed with Yannick/Tucker, the 'double $' syntax is extremely difficult to read.

I'd rather either have a letter indicating a formatting string such as F :
F"This is $X and $(X+Y)"

or a whole word :
Format"This is $X and $(X+Y)"

The main problem with the word approach is that it looks like a mistyped function call, so a special, compiler-supported function such as
Format("This is $X and $(X+Y)")
might be more readable.

I'll also add that bash/python/rust/etc reflexes make we want to write:
F"This is ${X} and ${X+Y}" or Format("This is ${X} and ${X+Y}")

yannickmoy · 2021-09-29T08:21:22Z

As discussed with Yannick/Tucker, the 'double $' syntax is extremely difficult to read.

agreed

I'd rather either have a letter indicating a formatting string such as F : F"This is $X and $(X+Y)"

let's see what others think?

or a whole word : Format"This is $X and $(X+Y)"

I also like that, as it is clearer that this starts a format string.

The main problem with the word approach is that it looks like a mistyped function call, so a special, compiler-supported function such as Format("This is $X and $(X+Y)") might be more readable.

That would conflict with a user function called Format taking a string and returning a string, so I don't think we can do this.

I'll also add that bash/python/rust/etc reflexes make we want to write: F"This is ${X} and ${X+Y}" or Format("This is ${X} and ${X+Y}")

I'd rather avoid useless curly braces, unless there is both an advantage in readability and uniformity.

glacambre · 2021-09-29T08:48:05Z

I'd rather either have a letter indicating a formatting string such as F : F"This is $X and $(X+Y)"

I'm not a fan of format specifiers. What do you think of using backticks? e.g.

Put_Line (`X = $(X)`);

This way interpolated strings are as light as regular strings in terms of syntax but still distinguishable from regular strings.

raph-amiard · 2021-09-29T09:45:14Z

I'm not a fan of backticks because they introduce a new token in the lexer, and probably new escaping rules in string. I prefer prefixed strings simply because they're simpler to implement, and also pretty familiar to people because of Python. They're also a more generic syntax extension, that you might be able to reuse if you need another type of literals someday.

kevlar700 · 2021-09-29T09:47:15Z

I can't say that I have analysed this discussion in great detail but I shall throw in a couple of cents from my experience with Go and Dart.

Dart has string interpolation and Go does not.

Backticks allow multi line strings in Go, perhaps a multiline feature, could allow the nicer interpolation that Dart has with longer names?

Go has a Printf but it's generally used Print is type aware and so you can do this but I often miss spaces out.

fmt.Print("Hello", worldVar, "oops worldVar has no leading space")

To be clear, Gos multiline strings are not compatible with formatting but used for constants etc..

However you can do multiline strings with formatting, so long as the last character of the line is + (& for Ada). I switched it out for my SQL statements for a string builder though anyway. Which strangely, seems nicer as well as being more efficient (when pre-bufferred).

Ada has terminating semi colons that might help?

dsibai · 2021-09-29T13:50:07Z

I found this interesting comment by Claire Dross:
#26 (comment)
She suggests having an attribute 'Format.

So we could have an interpolation function as an attribute:
String'Format("$(X+Y) = $(X) + $(Y)")

We'd have a readable word
It would look like a function and wouldn't look mistyped like Format"..."
More importantly, the function would be allowed to be magical, since it's an attribute
There couldn't be any confusion with a regular Format(String)-> String function.

mhatzl · 2021-09-29T15:40:13Z

So we could have an interpolation function as an attribute: String'Format("$(X+Y) = $(X) + $(Y)")

I like this syntax mentioned by @dsibai

As an addition, the formatting options mentioned by @sttaft could optionally be used like String'Format("$(X+Y, "3.4E2") = $(X) + $(Y)")

Seems like an interesting approach. It is somewhat redundant to specify
the type (with the "f") in this case, and using letters for specifying the
radix seems a bit odd given that Ada currently never does that, but some
kind of Ada-oriented pattern such as "3.4E2" for specifying the format for
a floating-point number, and "16#5#" for specifying a hex integer might be
pretty intuitive. "" might be used for cases where the programmer doesn't
care, such as "16##" for hex, but no specified number of digits. We
probably still need to think about whether we want to pass this to the
'Image function, or to do it all with postprocessing (which is clearly
harder for some of these patterns).

Having the option to set formatting options by passing a variable instead of a literal might also be useful.
Like String'Format("$(X+Y, myFormat) = $(X) + $(Y)"). However, I am not sure how wrong formatting options should be handled in this case.

pyjarrett · 2021-09-30T02:37:50Z

A major strength of Ada is the ability to intuit what code is doing due to minimum symbology and that many of the things you come across can imminently be thrown into a search engine with good results. Adding a dollar-sign introduces additional symbology while reducing readability, and complicating searching for this syntax. From one of these, I also can't tell the type of the formatted string. In this way, I don't think it agrees with the Ada principles of readability or strong typing.

Both Rust and C++ allow a format function, with contained elements in braces, followed by a variable number of parameters in the call. Indexing parameters by integers tends to be very error prone.

In usual fashion, I would want to describe exactly what I'm doing, with double braces being a literal brace:

Put_Line (String'Format ("{X + Y} = {Y} + {Y}"));

For a multi-line string, it's formatting multiple lines, it would be nice to just have String'Format_Multiline, but there probably does need to be a new type of string literal to cross lines. Python uses triple quotes. The $" ... " syntax might be useful here.

Format specifiers are convenient and terse, but can be painful to remember to do correctly and unintuitive. Combining this with named keys like in Rust or python f-strings makes this read like what it does without any googling:

-- Using "when" to indicate named parameters with attached aspect-like
Put_Line (String'Format("{Sum} = {X} + {Y}",
    when Sum => X + Y with Justify => Right, Fill => '0', Width => 5, Precision => 3,
    when X => X with Justify => Right, Fill => ' ', Width => 4,
    when Y => Y with Justify => Left,
    when others => <> -- use interpolation for anything left over, only needed if any renames exist
   ));

-- Renames to indicate named parameters with attached aspect-like
-- Parameters with aggregate-like to remove ambiguities due to commas
Put_Line (String'Format("{Sum} = {X} + {Y}",
    renames
    Sum => X + Y with (Justify => Right, Fill => '0', Width => 5, Precision => 3),
    X => X with (Justify => Right, Fill => ' ', Width => 4),
    Y => Y with (Justify => Left),
    others => <> -- use interpolation for anything left over, only needed if any renames exist
   ));

Obviously neither is a final syntax, just an idea.

pyjarrett · 2021-10-07T21:19:00Z

I don't understand where {"..."} comes from as a syntax, I've used quite a few languages and don't ever remember coming across something like that before. I understand they probably want a new token to simplify the language tooling, but it would preclude a in-line dictionary (map) syntax later on. Prefixing strings isn't unheard of for special string types: f-strings in Python, L"..." for multi-byte strings in C++, R"( for raw strings in C++, Rust raw string literals, etc.

onox · 2021-10-09T12:42:04Z

Does {"..."} provide some safety benefit over F"..." or is it just a preference? I think F"..." is preferred because it is similar to how other languages do it as @pyjarrett said. (New) Ada users who would see the syntax for the first time can probably guess what it does if they have experience with one of these languages.

yannickmoy · 2021-10-09T18:29:42Z

Does {"..."} provide some safety benefit over F"..." or is it just a preference?

It's just a syntax discussion, no change in semantics.

I think F"..." is preferred because it is similar to how other languages do it as @pyjarrett said. (New) Ada users who would see the syntax for the first time can probably guess what it does if they have experience with one of these languages.

That's also my preference. Others prefer {"..."} because Ada does not otherwise use letters as syntactic elements, but I think that's the same for other languages using these F-strings.

mgrojo · 2021-10-09T19:25:44Z

The first time I saw those prefixes in other languages, they seemed like odd syntax to me. Now they have a little advantage of familiarity, but forgetting about that, I like the symmetry of {"..."}. Ideally, they should be another pair of quote symbols, but there are no more in ASCII. Some languages use the `grave accents` for some string constructs, but I suppose they have been disregarded because they are difficult to input in some local keyboards, and they are not actually quotes.

We have adjusted the discussion so that {" ... "} and F" ... " are on a more even footing, as there seems to be significant interest in both syntaxes.

Use code-block formatting for the multi-line string literal example.

yannickmoy · 2021-10-14T11:04:20Z

I agree with @reznikmm that the current proposal for using \ as escaping character, in particular for newline as \n is not convincing at all. I'd rather use only {{ and }} as additional escaping sequence, like in other languages, in addition to the existing "" sequence in Ada for a double quote " inside a string.

I agree with @pyjarrett that the use of a different syntax {"..."} than all other mainstream programming languages is an issue if we want to reach out to people outside of the existing small Ada user community, especially as multiple of these languages have adopted the same syntax F"...". Do we really want Ada to be an outlier again?

I disagree with the current proposal to trim space characters in front of a digit during replacement, I think this should be reserved for the insertion of a single integer value, so in effect using a special version of the predefined T'Image in that case that does not have this initial space problem, rather than forcing that solution on all T'Image of all types including non-integer ones.

sttaft · 2021-10-14T12:20:46Z

Yannick Moy wrote:

I disagree with the current proposal to trim space characters in front of a digit during replacement, I think this should be reserved for the insertion of a single integer value, so in effect using a special version of the predefined T'Image ...

It is likely that 'Image for a private type that is used to represent a numeric type (e.g. Ada 2022's big numbers) will also include a space in front of the first digit, to be consistent with Ada's builtin numeric types. So it doesn't make sense to omit the space when interpolating integer'Image and float'image, but not omit the space when interpolating big_integer'image. So it is safer to base the removal of the leading space on the string representation of the 'Image, rather than the type of the value. Furthermore, by default a private type uses the 'Image of its full type, so for any private type that happens to be implemented using a numeric type (e.g. something like GNAT's node-ids), you are breaking privacy if you say the leading space will be stripped if it happens to be implemented by a numeric type, but not otherwise.

sttaft · 2021-10-14T12:30:09Z

By the way, it seems a bit inconsistent to avoid use of "\" even though it is widely used in other languages, and then say that we should not use {" ... "} because it is not used widely. I believe we should focus on readability and usability for Ada, and make choices that are sensitive to both Ada's tradition (e.g. mirrored syntaxes for bracketing syntax) and to conventions adopted widely (e.g. "\"), with readability being a very high priority.

One problem with doubling is that you always have to know exactly which characters are and are not to be doubled, whereas with "\" you can presumably use it with characters that you are not sure about. So for example, in these new strings, would you have to double "}"? It is not really necessary. And what about single apostrophe (')?

yannickmoy · 2021-10-14T12:30:20Z

If the only concern is to deal with the standard Ada 2022 big numbers, we can include them in the special case. And we can adapt the RM wording to deal with the privacy issue, I don't see it being an issue. The surprising removal of initial space for arbitrary T'Image looks like a bad idea, which is sure to surprise users.

yannickmoy · 2021-10-14T12:32:55Z

Regarding doubling, other languages are happy with doubling { and } to escape them, why do differently? Regarding the use of \, precisely it is not used for the newer format strings in these other languages, and @reznikmm did a good job at pointing at all the problems with it.

raph-amiard · 2021-10-14T12:41:33Z

Regarding the use of , precisely it is not used for the newer format strings in these other languages, and @reznikmm did a good job at pointing at all the problems with it.

That's incorrect, most languages have escape sequences (Python, Rust, Swift, C/C++, Javascript). In fact it's harder to find a language that doesn't have them. I disagree completely with @reznikmm justification for not adding them. Newlines are hard yes. So let's not let the user solve that problem ...

I agree with @pyjarrett that the use of a different syntax {"..."} than all other mainstream programming languages is an issue if we want to reach out to people outside of the existing small Ada user community, especially as multiple of these languages have adopted the same syntax F"...". Do we really want Ada to be an outlier again?

I agree with this. I'd rather use F"", not because I vastly prefer it to {"..."}, but just because it will be more familiar for people coming from other languages, and statistical familiarity is almost the only objective thing you can base yourself on with syntax.

sttaft · 2021-10-14T12:49:56Z

My own preference is to strip both leading and trailing whitespace, as it seems the simplest and most uniform rule, allowing the implementor of 'Image to use extra whitespace for standalone situations where some kind of alignment might be appropriate.

I think it might be a bigger surprise for a user if they write:

 {"The input is {X} and the result is {F(X)}."}

and they end up with extra white space before or after the interpolation, when clearly the user did not expect that since they allowed for exactly one space on either side.

One simple way, if you really want to see all of the whitespace that 'Image produces, might be to write:

{"The image of the input is {X'Image} and the image of the result is {F(X)'Image}"}

But normally I would say you are more interested in seeing the value of X or F(X), not its verbatim image which might have additional whitespace to provide some kind of formatting in a standalone usage.

The other option, as we have discussed, would be to allow for a second parameter inside the { ... } to provide control over how the 'Image is transformed during interpolation:

{"The image of the input is '{X, Verbatim}' and the image of the result is '{F(X), Verbatim}'"}

yannickmoy · 2021-10-14T13:04:34Z

That's incorrect, most languages have escape sequences (Python, Rust, Swift, C/C++, Javascript). In fact it's harder to find a language that doesn't have them. I disagree completely with @reznikmm justification for not adding them. Newlines are hard yes. So let's not let the user solve that problem ...

Thanks for correcting, indeed it's not mentioned explicitly in most description pages I've seen (like in the Rust online doc https://doc.rust-lang.org/std/fmt/index.html) but indeed \n is interpreted as newline. But what's the purpose of allowing a large number of escape sequences, if this is only meant for newline? In particular, why adopt this \{ and \} which are not used elsewhere? (but maybe you'll correct me on that too!)

If newline is to be supported, the RFC should describe how it is handled on various platforms (Unix/Windows/other). Plus if that's only to support this case, we could have a different syntax, e.g. use concatenation between F-strings to indicate newline: F"hello" & F"world" instead of F"hello\nworld".

Fabien-Chouteau · 2021-10-14T14:42:57Z

My own preference is to strip both leading and trailing whitespace,

I am worried about the run-time penalty for this.

onox · 2021-10-16T07:57:48Z

e.g. use concatenation between F-strings to indicate newline: F"hello" & F"world" instead of F"hello\nworld".

I think it would be surprising to the user if implicit newlines appear in the output because of concatenation. Also how would you avoid an undesired newline if you need to split some F-strings into multiple parts to avoid long lines (> 79 chars) because of style checks (-gnatyM)?

pyjarrett · 2021-10-18T11:45:46Z

it seems a bit inconsistent to avoid use of "" even though it is widely used in other languages, and then say that we should not use {" ... "} because it is not used widely.

My argument against escape characters is based on it breaking conceptual integrity within the language of the behavior within strings, not on popularity. As a C++ programmer who came to Ada recently, this behavior confused me, but it would be confusing for it to work for one type of strings, but not another one. On the other hand, there are languages which differ in this behavior for raw and plain strings, so perhaps it's fine to do anyways?

My argument against {"..."} is due to never recalling this construct in the dozen or so languages I've worked in. If the argument for escape characters is popularity, it seems odd to purposely include a syntax which has no precedent in any mainstream language. I also feel like braces should be reserved for as long as possible, since some of the clunkiness of Ada syntax I've dealt with reminds me of Objective-C before they added the @[a, b, ...] and @{ ...} syntax for NSArray and NSDictionary literals respectively. The only languages I can think of which might even have a valid {" "} syntax would be as a implicit function return in Rust fn foo() -> 'static &str { "foo" } or as a braced initializer in C++: StringType foo { "foo" };.

There's merit to wanting a balanced delimeter, but " itself isn't a balanced element, in the manner of () or {} or <>. With all of these considerations from the previous discussion, the original recommendation of $"..." might actually best since it's a non-lettered prefix formatted string, albeit with another non-standard syntax, as it preserves usage of braces for the future.

Incorporate the "language-design circle" decision about using {" ... "} syntax. Move things around a bit so alternatives are discussed in the appropriate sections.

We updated the RFC to incorporate decisions on trimming the leading space for "numeric" types where types with Integer_Literal or Real_Literal aspects are considered numeric. We also included the multi-line string literal, and confirmed the decision to use backslash as the escape mechanism.

yannickmoy · 2022-01-04T16:30:33Z

Hi Tuck, currently it reads "trimmed of a leading space if the second character is a digit", I assume that's just a leftover from the previous version?

Also, you allowed the use of "" for double-quote character in format strings, and we discussed that it was also a leftover.

Add a Reference-Level section that clarifies the static and dynamic semantics of the feature. Clarify that doubled characters are not treated as anything special, and in particular, that only backslash is used for escaping.

yannickmoy · 2022-01-05T08:10:23Z

@sttaft you mention in the first paragraph the use of backslash to enter unicode characters, is that intentional? It's not described in the rest of the RFC, and we did not discuss it.

yannickmoy · 2022-01-05T08:16:18Z

You also mention \n as the character used for newline in multi-line strings, but shouldn't it depend on the platform, like the characters inserted by New_Line, so that it would be \r\n on Windows?

yannickmoy · 2022-01-05T08:19:19Z

small fix: the rule for interpolated_string_literal should have double-quotes at start and end, not only curly braces.

yannickmoy · 2022-01-05T08:24:49Z

You say: "An escaped_character represents the given graphic_character" which seems to indicate that \k would be interpreted as just the character k. Shouldn't we ask for an error to be issued in such a case, and have explicitly a rule that \\ and \" denote respectively the backslash and double quote?

yakobowski · 2022-01-05T08:51:22Z

considered/rfc-string-interpolation.rst

+the value of the variable or the expression is "interpolated" directly
+into the value of the enclosing string upon use at run-time.  In addition,
+an escape character ('\\') is provided for inserting certain standard control
+characters (such as newline) or unicode characters within


This is the unicode reference @yannickmoy is talking about.

I moved this discussion to "future possibilities".

Added a legality rule restricting what characters may follow backslash. Moved the discussion about unicode to future possibilities. Indicated that newline might be represented by a sequence of characters.

sttaft added 5 commits June 1, 2021 10:52

Create rfc-string-interpolation

2a20ab8

First draft of RFC

Rename rfc-string-interpolation to rfc-string-interpolation.rst

b472fcc

Update rfc-string-interpolation.rst

6763fe6

Modest formatting fixes

Update rfc-string-interpolation.rst

c4b83a4

Use code block formatting.

Update rfc-string-interpolation.rst

bfe06ab

More work on formatting.

Update rfc-string-interpolation.rst

7559ac5

Fix a number of typographical errors.

sttaft changed the title ~~Topic/rfc string interpolation~~ [RFC] Topic/rfc string interpolation Jun 1, 2021

mgrojo reviewed Jun 19, 2021

View reviewed changes

sttaft added 2 commits June 19, 2021 12:20

Fix formatting of backslash, and mention trimming

aaa69b9

Escape the escape character, and mention that leading and trailing whitespace is trimmed when interpolating the result of 'Image.

Another escape for backslash

0f1c4ab

Fix the other place we had an un-escaped escape character.

sttaft added 2 commits July 6, 2021 10:35

Update rfc-string-interpolation.rst

b38bb61

Some minor clarifications. Fix the sentence fragment in the Drawbacks section, and include a concern mentioned by Fabien.

Update rfc-string-interpolation.rst

c71907a

Two more places needing doubled backslashes.

sttaft added 2 commits October 13, 2021 22:51

Make {" ... "} a full-fledged option.

a07c4c0

We have adjusted the discussion so that {" ... "} and F" ... " are on a more even footing, as there seems to be significant interest in both syntaxes.

Fix up the multi-line literal example

abc461e

Use code-block formatting for the multi-line string literal example.

sttaft added 2 commits November 2, 2021 15:41

Incorporate decision about use of {" ... "}

42c2056

Incorporate the "language-design circle" decision about using {" ... "} syntax. Move things around a bit so alternatives are discussed in the appropriate sections.

Update to include detailed semantics

1ffcfc4

Add a Reference-Level section that clarifies the static and dynamic semantics of the feature. Clarify that doubled characters are not treated as anything special, and in particular, that only backslash is used for escaping.

yakobowski changed the title ~~[RFC] Topic/rfc string interpolation~~ [RFC] String Interpolation Jan 5, 2022

yakobowski reviewed Jan 5, 2022

View reviewed changes

Additional updates based on comments

6baed8c

Added a legality rule restricting what characters may follow backslash. Moved the discussion about unicode to future possibilities. Indicated that newline might be represented by a sequence of characters.

raph-amiard approved these changes Nov 23, 2022

View reviewed changes

raph-amiard merged commit 67aadc0 into AdaCore:master Nov 23, 2022

[RFC] String Interpolation #77

[RFC] String Interpolation #77

Conversation

sttaft commented Jun 1, 2021

glacambre commented Jun 1, 2021

sttaft commented Jun 1, 2021 • edited

Fabien-Chouteau commented Jun 16, 2021

1. Referencing variable names

a. Positional:

b. Another variation on naming (I saw this in Python at least)

2. Expression in templates:

3. Formatting options

3. Embedded

sttaft commented Jun 16, 2021 via email

mgrojo commented Jun 19, 2021

sttaft commented Jun 19, 2021 via email

mgrojo Jun 19, 2021

Choose a reason for hiding this comment

sttaft commented Jun 19, 2021 via email

raph-amiard commented Jul 6, 2021

Fabien-Chouteau commented Aug 14, 2021

zertovitch commented Aug 15, 2021 via email • edited

dsibai commented Sep 29, 2021

yannickmoy commented Sep 29, 2021

glacambre commented Sep 29, 2021

raph-amiard commented Sep 29, 2021

kevlar700 commented Sep 29, 2021 • edited

dsibai commented Sep 29, 2021 • edited

mhatzl commented Sep 29, 2021

pyjarrett commented Sep 30, 2021

pyjarrett commented Oct 7, 2021

onox commented Oct 9, 2021

yannickmoy commented Oct 9, 2021

mgrojo commented Oct 9, 2021

yannickmoy commented Oct 14, 2021

sttaft commented Oct 14, 2021

sttaft commented Oct 14, 2021

yannickmoy commented Oct 14, 2021

yannickmoy commented Oct 14, 2021

raph-amiard commented Oct 14, 2021

sttaft commented Oct 14, 2021

yannickmoy commented Oct 14, 2021

Fabien-Chouteau commented Oct 14, 2021

onox commented Oct 16, 2021

pyjarrett commented Oct 18, 2021 • edited

yannickmoy commented Jan 4, 2022

yannickmoy commented Jan 5, 2022

yannickmoy commented Jan 5, 2022

yannickmoy commented Jan 5, 2022

yannickmoy commented Jan 5, 2022

yakobowski Jan 5, 2022

Choose a reason for hiding this comment

sttaft Jan 5, 2022

Choose a reason for hiding this comment

sttaft commented Jun 1, 2021 •

edited

zertovitch commented Aug 15, 2021 via email •

edited

kevlar700 commented Sep 29, 2021 •

edited

dsibai commented Sep 29, 2021 •

edited

pyjarrett commented Oct 18, 2021 •

edited