Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] String Interpolation #77

Merged
merged 17 commits into from Nov 23, 2022

Conversation

sttaft
Copy link
Contributor

@sttaft sttaft commented Jun 1, 2021

This is the first draft of an RFC for the string interpolation feature.

@glacambre
Copy link

@sttaft
Copy link
Contributor Author

sttaft commented Jun 1, 2021

Rendered version also here (click on "Files changed" at the top, and then "View File" on the "..." menu on the right):
https://github.com/AdaCore/ada-spark-rfcs/blob/7559ac5bd340373b344c4b92d755aadc682f47cb/considered/rfc-string-interpolation.rst

Note that I just fixed a few typos, so this rendered version is somewhat different than what was linked to originally.

Fix a number of typographical errors.
@sttaft sttaft changed the title Topic/rfc string interpolation [RFC] Topic/rfc string interpolation Jun 1, 2021
@Fabien-Chouteau
Copy link
Member

Thanks @sttaft ,

My two cents on this proposal.

1. Referencing variable names

I am not a huge fan of referencing variable names in the template. I think especially with Ada and its most common programming style, names tend to be very long and will result in a template string that is not readable.
If I take an example from the original issue:

Put_Line ($"(Name) is a (Profession)"$)

I think it's going to be pretty rare to have such a clean template. In practice you will have names that are part of a record, or something else more complex. So in my opinion we will most like see patterns that will look like this:

Put_Line ($"$(My_Object.Name) is a $(Professions_Img_Array (My_Object.Profession_Index))"$);

Of course there will be the options to declare renames for strings:

declare
   Name : String renames My_Object.Name;
   Profession : String renames Professions_Img_Array (My_Object.Profession_Index
begin
   Put_Line ($"$(Name) is a $(Profession))"$);
end;

But in the end I don't think this brings as much readability improvement as we could get.
Of course I would like to have other opinions on this.

I see two other way around, and I am introducing a placeholder "syntax" $" "$ (...) just for the examples:

  • a. Positional:

    Put_Line ($"$(0) is $(1)"'$ --  This is a lot of `$` I must say
              (My_Object.Name, 
               Professions_Img_Array (My_Object.Profession_Index));
  • b. Another variation on naming (I saw this in Python at least)

    Put_Line ($"$(Name) is $(Profession)"'$
              (Name       => My_Object.Name, 
               Profession => Professions_Img_Array (My_Object.Profession_Index));

2. Expression in templates:

Put_Line ($"Plop is $(X + Y)"$);

I the same vein as my comment above, I think this is likely to generate templates that will be hard to read. And I am not an implementer, but I am guessing that this will introduce complexity that can be avoided in my opinion.

3. Formatting options

In the RFC $(X+Y, Width => 13) is given as an example. I am personally a fan of printf's %2.3f, %5x, %04d, etc.
If we can make something as compact as that it would be great.

Looking in Python's direction, it could be something like:

Put_Line($"Plop is $(0.2f:Angle)"$)

3. Embedded

As most of you know my focus is on embedded and in particular bare-metal and small device. So I am always worried about how a given feature can be implemented and used in this kind of situation. I don't know how compatible with embedded the current proposal is, but I just want to state that it is important to keep that in mind.

@sttaft
Copy link
Contributor Author

sttaft commented Jun 16, 2021 via email

@mgrojo
Copy link

mgrojo commented Jun 19, 2021

I hope editors start to highlight the interpolated variables different to the characters around, otherwise this would be less readable instead of more.

I suppose escape characters (i.e. "\t\n") will not work in traditional string literals, for compatibility. But then people new to the language will start to complain about being unable to write "\tHello, World!\n". Is the introduction of this C-ism really needed?, wouldn't using ASCII entities be more respective with current Ada: use ASCII; $"$(HT)Hello, World!$(LF)"$?

This is syntactic sugar, so there is no magic here. It is going to be equivalent to a string concatenation and calls on 'Image, and the programmer will need to keep that in mind.

Does that mean that interpolated positive integers will have a leading space? I would understand it for the shake of consistency, but that would shock many.

@sttaft
Copy link
Contributor Author

sttaft commented Jun 19, 2021 via email

of string literal syntaxes, chosen by a prefix letter, but our sense is that
the string interpolation syntax has emerged as the favorite. We do not want
to have lots of different syntaxes, so we have included the escape mechanism
as part of both of the new string literal syntaxes. We have chosen '\' as the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rendered as ''. It probably needs to be quoted using backticks like this: \

Escape the escape character, and mention that leading and trailing whitespace is trimmed when interpolating the result of 'Image.
Fix the other place we had an un-escaped escape character.
@sttaft
Copy link
Contributor Author

sttaft commented Jun 19, 2021 via email

Some minor clarifications.  Fix the sentence fragment in the Drawbacks section, and include a concern mentioned by Fabien.
Two more places needing doubled backslashes.
@raph-amiard
Copy link
Member

  • Overall, looking really good!
  • ‘Image doesn’t help when you want to interpolate say an unbounded string into a string, because it’ll typically have quotes around it, but you don’t want that. How do we accommodate this common use case?
  • One drawback about multiline strings is that they’ll be the first multi-line token in the Ada language. Single line tokens only makes it very easy to tokenize/highlight Ada so far.
  • Not a fan of the terminating $

@Fabien-Chouteau
Copy link
Member

@zertovitch I was just thinking about this, maybe it would be interesting to prototype this string interpolation in HAC.

@zertovitch
Copy link

zertovitch commented Aug 15, 2021 via email

@dsibai
Copy link

dsibai commented Sep 29, 2021

As discussed with Yannick/Tucker, the 'double $' syntax is extremely difficult to read.

I'd rather either have a letter indicating a formatting string such as F :
F"This is $X and $(X+Y)"

or a whole word :
Format"This is $X and $(X+Y)"

The main problem with the word approach is that it looks like a mistyped function call, so a special, compiler-supported function such as
Format("This is $X and $(X+Y)")
might be more readable.

I'll also add that bash/python/rust/etc reflexes make we want to write:
F"This is ${X} and ${X+Y}" or Format("This is ${X} and ${X+Y}")

@yannickmoy
Copy link
Collaborator

As discussed with Yannick/Tucker, the 'double $' syntax is extremely difficult to read.

agreed

I'd rather either have a letter indicating a formatting string such as F : F"This is $X and $(X+Y)"

let's see what others think?

or a whole word : Format"This is $X and $(X+Y)"

I also like that, as it is clearer that this starts a format string.

The main problem with the word approach is that it looks like a mistyped function call, so a special, compiler-supported function such as Format("This is $X and $(X+Y)") might be more readable.

That would conflict with a user function called Format taking a string and returning a string, so I don't think we can do this.

I'll also add that bash/python/rust/etc reflexes make we want to write: F"This is ${X} and ${X+Y}" or Format("This is ${X} and ${X+Y}")

I'd rather avoid useless curly braces, unless there is both an advantage in readability and uniformity.

@glacambre
Copy link

I'd rather either have a letter indicating a formatting string such as F : F"This is $X and $(X+Y)"

I'm not a fan of format specifiers. What do you think of using backticks? e.g.

Put_Line (`X = $(X)`);

This way interpolated strings are as light as regular strings in terms of syntax but still distinguishable from regular strings.

@raph-amiard
Copy link
Member

I'm not a fan of backticks because they introduce a new token in the lexer, and probably new escaping rules in string. I prefer prefixed strings simply because they're simpler to implement, and also pretty familiar to people because of Python. They're also a more generic syntax extension, that you might be able to reuse if you need another type of literals someday.

@kevlar700
Copy link

kevlar700 commented Sep 29, 2021

I can't say that I have analysed this discussion in great detail but I shall throw in a couple of cents from my experience with Go and Dart.

Dart has string interpolation and Go does not.

Backticks allow multi line strings in Go, perhaps a multiline feature, could allow the nicer interpolation that Dart has with longer names?

Go has a Printf but it's generally used Print is type aware and so you can do this but I often miss spaces out.

fmt.Print("Hello", worldVar, "oops worldVar has no leading space")

To be clear, Gos multiline strings are not compatible with formatting but used for constants etc..

However you can do multiline strings with formatting, so long as the last character of the line is + (& for Ada). I switched it out for my SQL statements for a string builder though anyway. Which strangely, seems nicer as well as being more efficient (when pre-bufferred).

Ada has terminating semi colons that might help?

@dsibai
Copy link

dsibai commented Sep 29, 2021

I found this interesting comment by Claire Dross:
#26 (comment)
She suggests having an attribute 'Format.

So we could have an interpolation function as an attribute:
String'Format("$(X+Y) = $(X) + $(Y)")

  • We'd have a readable word
  • It would look like a function and wouldn't look mistyped like Format"..."
  • More importantly, the function would be allowed to be magical, since it's an attribute
  • There couldn't be any confusion with a regular Format(String)-> String function.

@mhatzl
Copy link

mhatzl commented Sep 29, 2021

So we could have an interpolation function as an attribute: String'Format("$(X+Y) = $(X) + $(Y)")

I like this syntax mentioned by @dsibai

As an addition, the formatting options mentioned by @sttaft could optionally be used like String'Format("$(X+Y, "3.4E2") = $(X) + $(Y)")

Seems like an interesting approach. It is somewhat redundant to specify
the type (with the "f") in this case, and using letters for specifying the
radix seems a bit odd given that Ada currently never does that, but some
kind of Ada-oriented pattern such as "3.4E2" for specifying the format for
a floating-point number, and "16#5#" for specifying a hex integer might be
pretty intuitive. "" might be used for cases where the programmer doesn't
care, such as "16#
#" for hex, but no specified number of digits. We
probably still need to think about whether we want to pass this to the
'Image function, or to do it all with postprocessing (which is clearly
harder for some of these patterns).

Having the option to set formatting options by passing a variable instead of a literal might also be useful.
Like String'Format("$(X+Y, myFormat) = $(X) + $(Y)"). However, I am not sure how wrong formatting options should be handled in this case.

@pyjarrett
Copy link

A major strength of Ada is the ability to intuit what code is doing due to minimum symbology and that many of the things you come across can imminently be thrown into a search engine with good results. Adding a dollar-sign introduces additional symbology while reducing readability, and complicating searching for this syntax. From one of these, I also can't tell the type of the formatted string. In this way, I don't think it agrees with the Ada principles of readability or strong typing.

Both Rust and C++ allow a format function, with contained elements in braces, followed by a variable number of parameters in the call. Indexing parameters by integers tends to be very error prone.

In usual fashion, I would want to describe exactly what I'm doing, with double braces being a literal brace:

Put_Line (String'Format ("{X + Y} = {Y} + {Y}"));

For a multi-line string, it's formatting multiple lines, it would be nice to just have String'Format_Multiline, but there probably does need to be a new type of string literal to cross lines. Python uses triple quotes. The $" ... " syntax might be useful here.

Format specifiers are convenient and terse, but can be painful to remember to do correctly and unintuitive. Combining this with named keys like in Rust or python f-strings makes this read like what it does without any googling:

-- Using "when" to indicate named parameters with attached aspect-like
Put_Line (String'Format("{Sum} = {X} + {Y}",
    when Sum => X + Y with Justify => Right, Fill => '0', Width => 5, Precision => 3,
    when X => X with Justify => Right, Fill => ' ', Width => 4,
    when Y => Y with Justify => Left,
    when others => <> -- use interpolation for anything left over, only needed if any renames exist
   ));
-- Renames to indicate named parameters with attached aspect-like
-- Parameters with aggregate-like to remove ambiguities due to commas
Put_Line (String'Format("{Sum} = {X} + {Y}",
    renames
    Sum => X + Y with (Justify => Right, Fill => '0', Width => 5, Precision => 3),
    X => X with (Justify => Right, Fill => ' ', Width => 4),
    Y => Y with (Justify => Left),
    others => <> -- use interpolation for anything left over, only needed if any renames exist
   ));

Obviously neither is a final syntax, just an idea.

@pyjarrett
Copy link

I don't understand where {"..."} comes from as a syntax, I've used quite a few languages and don't ever remember coming across something like that before. I understand they probably want a new token to simplify the language tooling, but it would preclude a in-line dictionary (map) syntax later on. Prefixing strings isn't unheard of for special string types: f-strings in Python, L"..." for multi-byte strings in C++, R"( for raw strings in C++, Rust raw string literals, etc.

@onox
Copy link

onox commented Oct 9, 2021

Does {"..."} provide some safety benefit over F"..." or is it just a preference? I think F"..." is preferred because it is similar to how other languages do it as @pyjarrett said. (New) Ada users who would see the syntax for the first time can probably guess what it does if they have experience with one of these languages.

@yannickmoy
Copy link
Collaborator

Does {"..."} provide some safety benefit over F"..." or is it just a preference?

It's just a syntax discussion, no change in semantics.

I think F"..." is preferred because it is similar to how other languages do it as @pyjarrett said. (New) Ada users who would see the syntax for the first time can probably guess what it does if they have experience with one of these languages.

That's also my preference. Others prefer {"..."} because Ada does not otherwise use letters as syntactic elements, but I think that's the same for other languages using these F-strings.

@mgrojo
Copy link

mgrojo commented Oct 9, 2021

The first time I saw those prefixes in other languages, they seemed like odd syntax to me. Now they have a little advantage of familiarity, but forgetting about that, I like the symmetry of {"..."}. Ideally, they should be another pair of quote symbols, but there are no more in ASCII. Some languages use the `grave accents` for some string constructs, but I suppose they have been disregarded because they are difficult to input in some local keyboards, and they are not actually quotes.

We have adjusted the discussion so that {" ... "} and F" ... " are on a more even footing, as there seems to be significant interest in both syntaxes.
Use code-block formatting for the multi-line string literal example.
@yannickmoy
Copy link
Collaborator

I agree with @reznikmm that the current proposal for using \ as escaping character, in particular for newline as \n is not convincing at all. I'd rather use only {{ and }} as additional escaping sequence, like in other languages, in addition to the existing "" sequence in Ada for a double quote " inside a string.

I agree with @pyjarrett that the use of a different syntax {"..."} than all other mainstream programming languages is an issue if we want to reach out to people outside of the existing small Ada user community, especially as multiple of these languages have adopted the same syntax F"...". Do we really want Ada to be an outlier again?

I disagree with the current proposal to trim space characters in front of a digit during replacement, I think this should be reserved for the insertion of a single integer value, so in effect using a special version of the predefined T'Image in that case that does not have this initial space problem, rather than forcing that solution on all T'Image of all types including non-integer ones.

@sttaft
Copy link
Contributor Author

sttaft commented Oct 14, 2021

Yannick Moy wrote:

I disagree with the current proposal to trim space characters in front of a digit during replacement, I think this should be reserved for the insertion of a single integer value, so in effect using a special version of the predefined T'Image ...

It is likely that 'Image for a private type that is used to represent a numeric type (e.g. Ada 2022's big numbers) will also include a space in front of the first digit, to be consistent with Ada's builtin numeric types. So it doesn't make sense to omit the space when interpolating integer'Image and float'image, but not omit the space when interpolating big_integer'image. So it is safer to base the removal of the leading space on the string representation of the 'Image, rather than the type of the value. Furthermore, by default a private type uses the 'Image of its full type, so for any private type that happens to be implemented using a numeric type (e.g. something like GNAT's node-ids), you are breaking privacy if you say the leading space will be stripped if it happens to be implemented by a numeric type, but not otherwise.

@sttaft
Copy link
Contributor Author

sttaft commented Oct 14, 2021

By the way, it seems a bit inconsistent to avoid use of "\" even though it is widely used in other languages, and then say that we should not use {" ... "} because it is not used widely. I believe we should focus on readability and usability for Ada, and make choices that are sensitive to both Ada's tradition (e.g. mirrored syntaxes for bracketing syntax) and to conventions adopted widely (e.g. "\"), with readability being a very high priority.

One problem with doubling is that you always have to know exactly which characters are and are not to be doubled, whereas with "\" you can presumably use it with characters that you are not sure about. So for example, in these new strings, would you have to double "}"? It is not really necessary. And what about single apostrophe (')?

@yannickmoy
Copy link
Collaborator

If the only concern is to deal with the standard Ada 2022 big numbers, we can include them in the special case. And we can adapt the RM wording to deal with the privacy issue, I don't see it being an issue. The surprising removal of initial space for arbitrary T'Image looks like a bad idea, which is sure to surprise users.

@yannickmoy
Copy link
Collaborator

Regarding doubling, other languages are happy with doubling { and } to escape them, why do differently? Regarding the use of \, precisely it is not used for the newer format strings in these other languages, and @reznikmm did a good job at pointing at all the problems with it.

@raph-amiard
Copy link
Member

Regarding the use of , precisely it is not used for the newer format strings in these other languages, and @reznikmm did a good job at pointing at all the problems with it.

That's incorrect, most languages have escape sequences (Python, Rust, Swift, C/C++, Javascript). In fact it's harder to find a language that doesn't have them. I disagree completely with @reznikmm justification for not adding them. Newlines are hard yes. So let's not let the user solve that problem ...

I agree with @pyjarrett that the use of a different syntax {"..."} than all other mainstream programming languages is an issue if we want to reach out to people outside of the existing small Ada user community, especially as multiple of these languages have adopted the same syntax F"...". Do we really want Ada to be an outlier again?

I agree with this. I'd rather use F"", not because I vastly prefer it to {"..."}, but just because it will be more familiar for people coming from other languages, and statistical familiarity is almost the only objective thing you can base yourself on with syntax.

@sttaft
Copy link
Contributor Author

sttaft commented Oct 14, 2021

My own preference is to strip both leading and trailing whitespace, as it seems the simplest and most uniform rule, allowing the implementor of 'Image to use extra whitespace for standalone situations where some kind of alignment might be appropriate.

I think it might be a bigger surprise for a user if they write:

 {"The input is {X} and the result is {F(X)}."}

and they end up with extra white space before or after the interpolation, when clearly the user did not expect that since they allowed for exactly one space on either side.

One simple way, if you really want to see all of the whitespace that 'Image produces, might be to write:

{"The image of the input is {X'Image} and the image of the result is {F(X)'Image}"}

But normally I would say you are more interested in seeing the value of X or F(X), not its verbatim image which might have additional whitespace to provide some kind of formatting in a standalone usage.

The other option, as we have discussed, would be to allow for a second parameter inside the { ... } to provide control over how the 'Image is transformed during interpolation:

{"The image of the input is '{X, Verbatim}' and the image of the result is '{F(X), Verbatim}'"}

@yannickmoy
Copy link
Collaborator

That's incorrect, most languages have escape sequences (Python, Rust, Swift, C/C++, Javascript). In fact it's harder to find a language that doesn't have them. I disagree completely with @reznikmm justification for not adding them. Newlines are hard yes. So let's not let the user solve that problem ...

Thanks for correcting, indeed it's not mentioned explicitly in most description pages I've seen (like in the Rust online doc https://doc.rust-lang.org/std/fmt/index.html) but indeed \n is interpreted as newline. But what's the purpose of allowing a large number of escape sequences, if this is only meant for newline? In particular, why adopt this \{ and \} which are not used elsewhere? (but maybe you'll correct me on that too!)

If newline is to be supported, the RFC should describe how it is handled on various platforms (Unix/Windows/other). Plus if that's only to support this case, we could have a different syntax, e.g. use concatenation between F-strings to indicate newline: F"hello" & F"world" instead of F"hello\nworld".

@Fabien-Chouteau
Copy link
Member

My own preference is to strip both leading and trailing whitespace,

I am worried about the run-time penalty for this.

@onox
Copy link

onox commented Oct 16, 2021

e.g. use concatenation between F-strings to indicate newline: F"hello" & F"world" instead of F"hello\nworld".

I think it would be surprising to the user if implicit newlines appear in the output because of concatenation. Also how would you avoid an undesired newline if you need to split some F-strings into multiple parts to avoid long lines (> 79 chars) because of style checks (-gnatyM)?

@pyjarrett
Copy link

pyjarrett commented Oct 18, 2021

it seems a bit inconsistent to avoid use of "" even though it is widely used in other languages, and then say that we should not use {" ... "} because it is not used widely.

My argument against escape characters is based on it breaking conceptual integrity within the language of the behavior within strings, not on popularity. As a C++ programmer who came to Ada recently, this behavior confused me, but it would be confusing for it to work for one type of strings, but not another one. On the other hand, there are languages which differ in this behavior for raw and plain strings, so perhaps it's fine to do anyways?

My argument against {"..."} is due to never recalling this construct in the dozen or so languages I've worked in. If the argument for escape characters is popularity, it seems odd to purposely include a syntax which has no precedent in any mainstream language. I also feel like braces should be reserved for as long as possible, since some of the clunkiness of Ada syntax I've dealt with reminds me of Objective-C before they added the @[a, b, ...] and @{ ...} syntax for NSArray and NSDictionary literals respectively. The only languages I can think of which might even have a valid {" "} syntax would be as a implicit function return in Rust fn foo() -> 'static &str { "foo" } or as a braced initializer in C++: StringType foo { "foo" };.

There's merit to wanting a balanced delimeter, but " itself isn't a balanced element, in the manner of () or {} or <>. With all of these considerations from the previous discussion, the original recommendation of $"..." might actually best since it's a non-lettered prefix formatted string, albeit with another non-standard syntax, as it preserves usage of braces for the future.

Incorporate the "language-design circle" decision about using {" ... "} syntax.  Move things around a bit so alternatives are discussed in the appropriate sections.
We updated the RFC to incorporate decisions on trimming the leading space for "numeric" types where types with Integer_Literal or Real_Literal aspects are considered numeric.  We also included the multi-line string literal, and confirmed the decision to use backslash as the escape mechanism.
@yannickmoy
Copy link
Collaborator

Hi Tuck, currently it reads "trimmed of a leading space if the second character is a digit", I assume that's just a leftover from the previous version?

Also, you allowed the use of "" for double-quote character in format strings, and we discussed that it was also a leftover.

Add a Reference-Level section that clarifies the static and dynamic semantics of the feature.  Clarify that doubled characters are not treated as anything special, and in particular, that only backslash is used for escaping.
@yannickmoy
Copy link
Collaborator

@sttaft you mention in the first paragraph the use of backslash to enter unicode characters, is that intentional? It's not described in the rest of the RFC, and we did not discuss it.

@yannickmoy
Copy link
Collaborator

You also mention \n as the character used for newline in multi-line strings, but shouldn't it depend on the platform, like the characters inserted by New_Line, so that it would be \r\n on Windows?

@yannickmoy
Copy link
Collaborator

small fix: the rule for interpolated_string_literal should have double-quotes at start and end, not only curly braces.

@yannickmoy
Copy link
Collaborator

You say: "An escaped_character represents the given graphic_character" which seems to indicate that \k would be interpreted as just the character k. Shouldn't we ask for an error to be issued in such a case, and have explicitly a rule that \\ and \" denote respectively the backslash and double quote?

@yakobowski yakobowski changed the title [RFC] Topic/rfc string interpolation [RFC] String Interpolation Jan 5, 2022
the value of the variable or the expression is "interpolated" directly
into the value of the enclosing string upon use at run-time. In addition,
an escape character ('\\') is provided for inserting certain standard control
characters (such as newline) or unicode characters within
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the unicode reference @yannickmoy is talking about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this discussion to "future possibilities".

Added a legality rule restricting what characters may follow backslash.  Moved the discussion about unicode to future possibilities.  Indicated that newline might be represented by a sequence of characters.
@raph-amiard raph-amiard merged commit 67aadc0 into AdaCore:master Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet