New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] String Interpolation #77
[RFC] String Interpolation #77
Conversation
First draft of RFC
Modest formatting fixes
Use code block formatting.
More work on formatting.
Link to the rendered version: https://github.com/sttaft/ada-spark-rfcs/blob/topic/rfc-string-interpolation/considered/rfc-string-interpolation.rst |
Rendered version also here (click on "Files changed" at the top, and then "View File" on the "..." menu on the right): Note that I just fixed a few typos, so this rendered version is somewhat different than what was linked to originally. |
Fix a number of typographical errors.
Thanks @sttaft , My two cents on this proposal. 1. Referencing variable namesI am not a huge fan of referencing variable names in the template. I think especially with Ada and its most common programming style, names tend to be very long and will result in a template string that is not readable. Put_Line ($"(Name) is a (Profession)"$) I think it's going to be pretty rare to have such a clean template. In practice you will have names that are part of a record, or something else more complex. So in my opinion we will most like see patterns that will look like this: Put_Line ($"$(My_Object.Name) is a $(Professions_Img_Array (My_Object.Profession_Index))"$); Of course there will be the options to declare renames for strings: declare
Name : String renames My_Object.Name;
Profession : String renames Professions_Img_Array (My_Object.Profession_Index
begin
Put_Line ($"$(Name) is a $(Profession))"$);
end; But in the end I don't think this brings as much readability improvement as we could get. I see two other way around, and I am introducing a placeholder "syntax"
2. Expression in templates:Put_Line ($"Plop is $(X + Y)"$); I the same vein as my comment above, I think this is likely to generate templates that will be hard to read. And I am not an implementer, but I am guessing that this will introduce complexity that can be avoided in my opinion. 3. Formatting optionsIn the RFC Looking in Python's direction, it could be something like: Put_Line($"Plop is $(0.2f:Angle)"$) 3. EmbeddedAs most of you know my focus is on embedded and in particular bare-metal and small device. So I am always worried about how a given feature can be implemented and used in this kind of situation. I don't know how compatible with embedded the current proposal is, but I just want to state that it is important to keep that in mind. |
On Wed, Jun 16, 2021 at 5:04 AM Fabien Chouteau ***@***.***> wrote:
Thanks @sttaft <https://github.com/sttaft> ,
My two cents on this proposal.
1. Referencing variable names
I am not a huge fan of referencing variable names in the template. I think
especially with Ada and its most common programming style, names tend to be
very long and will result in a template string that is not readable.
If I take an example from the original issue:
Put_Line ($"(Name) is a (Profession)"$)
I think it's going to be pretty rare to have such a clean template. In
practice you will have names that are part of a record, or something else
more complex. So in my opinion we will most like see patterns that will
look like this:
Put_Line ($"$(My_Object.Name) is a $(Professions_Img_Array (My_Object.Profession_Index))"$);
Of course there will be the options to declare renames for strings:
declare
Name : String renames My_Object.Name;
Profession : String renames Professions_Img_Array (My_Object.Profession_Indexbegin
Put_Line ($"$(Name) is a $(Profession))"$);end;
But in the end I don't think this brings as much readability improvement
as we could get.
Of course I would like to have other opinions on this.
I have been using a language with string interpolation over the past few
years, and my experience is that it is much more readable and quite
intuitive. This is something where I would suggest we do some prototyping,
and then take some existing programs that do a lot of concatenation of
literal strings and 'Image (or Put) calls, and see how they look with
various approaches to string interpolation.
I see two other way around, and I am introducing a placeholder "syntax" $"
"$ (...) just for the examples:
- a. Positional:
Put_Line ($"$(0) is $(1)"'$ -- This is a lot of `$` I must say
(My_Object.Name,
Professions_Img_Array (My_Object.Profession_Index));
- b. Another variation on naming (I saw this in Python at least)
Put_Line ($"$(Name) is $(Profession)"'$
(Name => My_Object.Name,
Profession => Professions_Img_Array (My_Object.Profession_Index));
Using integers as place holders followed by a list of names adds complexity
and gets us into the kinds of bugs that are common with printf, where you
need to match up parameters with placeholders, and it is easy to have
problems during maintenance.
Using named parameters is an interesting alternative, but it feels like an
extra step and creates a case where any use of an interpolated string must
be in a context where it makes sense to add named parameters. String
literals can appear in many contexts in Ada, and trying to figure out how
to define exactly where the extra named parameters would go would be a
completely new concept in Ada, and potentially add complexity to overload
analysis, which is already pretty complex in Ada.
-
2. Expression in templates:
Put_Line ($"Plop is $(X + Y)"$);
I the same vein as my comment above, I think this is likely to generate
templates that will be hard to read. And I am not an implementer, but I am
guessing that this will introduce complexity that can be avoided in my
opinion.
I don't see the added implementation complexity here, since our general
approach is to apply <type>'Image, and whether it is a name or an
expression makes little difference in Ada, particularly since function
calls are considered "name"s in Ada, and both can involve overloading. You
have to be able to resolve the type without external context, but that is
true in other contexts in Ada (e.g. the operand of a type conversion), so
this is not an added complexity.
3. Formatting options
In the RFC $(X+Y, Width => 13) is given as an example. I am personally a
fan of printf's %2.3f, %5x, %04d, etc.
If we can make something as compact as that it would be great.
Looking in Python's direction, it could be something like:
Put_Line($"Plop is $(0.2f:Angle)"$)
Seems like an interesting approach. It is somewhat redundant to specify
the type (with the "f") in this case, and using letters for specifying the
radix seems a bit odd given that Ada currently never does that, but some
kind of Ada-oriented pattern such as "3.4E2" for specifying the format for
a floating-point number, and "16#5#" for specifying a hex integer might be
pretty intuitive. "*" might be used for cases where the programmer doesn't
care, such as "16#*#" for hex, but no specified number of digits. We
probably still need to think about whether we want to pass this to the
'Image function, or to do it all with postprocessing (which is clearly
harder for some of these patterns).
3. Embedded
As most of you know my focus is on embedded and in particular bare-metal
and small device. So I am always worried about how a given feature can be
implemented and used in this kind of situation. I don't know how compatible
with embedded the current proposal is, but I just want to state that it is
important to keep that in mind.
This is syntactic sugar, so there is no magic here. It is going to be
equivalent to a string concatenation and calls on 'Image, and the
programmer will need to keep that in mind.
…-Tuck
|
I hope editors start to highlight the interpolated variables different to the characters around, otherwise this would be less readable instead of more. I suppose escape characters (i.e.
Does that mean that interpolated positive integers will have a leading space? I would understand it for the shake of consistency, but that would shock many. |
On Sat, Jun 19, 2021 at 8:16 AM Manuel ***@***.***> wrote:
...
This is syntactic sugar, so there is no magic here. It is going to be
equivalent to a string concatenation and calls on 'Image, and the
programmer will need to keep that in mind.
Does that mean that interpolated positive integers will have a leading
space? I would understand it for the shake of consistency, but that would
shock many.
No, there would be a "Trim" applied to the result of 'Image so no extra
spaces. The RFC should have made that clear.
…-Tuck
|
of string literal syntaxes, chosen by a prefix letter, but our sense is that | ||
the string interpolation syntax has emerged as the favorite. We do not want | ||
to have lots of different syntaxes, so we have included the escape mechanism | ||
as part of both of the new string literal syntaxes. We have chosen '\' as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rendered as ''
. It probably needs to be quoted using backticks like this: \
Escape the escape character, and mention that leading and trailing whitespace is trimmed when interpolating the result of 'Image.
Fix the other place we had an un-escaped escape character.
Thanks. I fixed it in both places, and added a mention of trimming leading
and trailing white space.
…-Tuck
On Sat, Jun 19, 2021 at 12:04 PM Manuel ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In considered/rfc-string-interpolation.rst
<#77 (comment)>:
> +the beginning of all of them.
+
+Drawbacks
+=========
+
+Hopefully the semantics will be fairly intuitive, but this is certainly
+adding complexity to string literals, and
+
+Prior art
+=========
+
+String interpolation has begun to show up in many languages. Python has a number
+of string literal syntaxes, chosen by a prefix letter, but our sense is that
+the string interpolation syntax has emerged as the favorite. We do not want
+to have lots of different syntaxes, so we have included the escape mechanism
+as part of both of the new string literal syntaxes. We have chosen '\' as the
This is rendered as ''. It probably needs to be quoted using backticks
like this: \
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#77 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANZ4FMOORPOOQJ75F5TMK3TTS5ZFANCNFSM4545OUPA>
.
|
Some minor clarifications. Fix the sentence fragment in the Drawbacks section, and include a concern mentioned by Fabien.
Two more places needing doubled backslashes.
|
@zertovitch I was just thinking about this, maybe it would be interesting to prototype this string interpolation in HAC. |
Why not? A few hints on how to do it: there is an internal type called
String_Literals.
For adding the string interpolation feature, look at the lines 163+
(currently) in src/hac_sys-parser-expressions.adb .
You can add a boolean mode switch, say `Interpolated` activated by the
front '$', then parse the literal string in the special mode if Interpolated = True.
procedure Primary (FSys_Prim : Symset; X : out Exact_Typ) is -- RM 4.4 (7)
F : Opcode;
Ident_Index : Integer;
begin
X := Type_Undefined;
Test (CD, Primary_Begin_Symbol + StrCon, FSys_Prim, err_primary_unexpected_symbol);
case CD.Sy is
>>> when StrCon =>
X.TYP := String_Literals;
Emit_1 (CD, k_Push_Discrete_Literal, Operand_2_Type (CD.SLeng)); -- String Literal Length
Emit_1 (CD, k_Push_Discrete_Literal, Operand_2_Type (CD.INum)); -- Index To String IdTab
InSymbol (CD);
when IDent =>
|
As discussed with Yannick/Tucker, the 'double $' syntax is extremely difficult to read. I'd rather either have a letter indicating a formatting string such as F : or a whole word : The main problem with the word approach is that it looks like a mistyped function call, so a special, compiler-supported function such as I'll also add that bash/python/rust/etc reflexes make we want to write: |
agreed
let's see what others think?
I also like that, as it is clearer that this starts a format string.
That would conflict with a user function called Format taking a string and returning a string, so I don't think we can do this.
I'd rather avoid useless curly braces, unless there is both an advantage in readability and uniformity. |
I'm not a fan of format specifiers. What do you think of using backticks? e.g.
This way interpolated strings are as light as regular strings in terms of syntax but still distinguishable from regular strings. |
I'm not a fan of backticks because they introduce a new token in the lexer, and probably new escaping rules in string. I prefer prefixed strings simply because they're simpler to implement, and also pretty familiar to people because of Python. They're also a more generic syntax extension, that you might be able to reuse if you need another type of literals someday. |
I can't say that I have analysed this discussion in great detail but I shall throw in a couple of cents from my experience with Go and Dart. Dart has string interpolation and Go does not. Backticks allow multi line strings in Go, perhaps a multiline feature, could allow the nicer interpolation that Dart has with longer names? Go has a Printf but it's generally used Print is type aware and so you can do this but I often miss spaces out. fmt.Print("Hello", worldVar, "oops worldVar has no leading space") To be clear, Gos multiline strings are not compatible with formatting but used for constants etc.. However you can do multiline strings with formatting, so long as the last character of the line is + (& for Ada). I switched it out for my SQL statements for a string builder though anyway. Which strangely, seems nicer as well as being more efficient (when pre-bufferred). Ada has terminating semi colons that might help? |
I found this interesting comment by Claire Dross: So we could have an interpolation function as an attribute:
|
I like this syntax mentioned by @dsibai As an addition, the formatting options mentioned by @sttaft could optionally be used like
Having the option to set formatting options by passing a variable instead of a literal might also be useful. |
A major strength of Ada is the ability to intuit what code is doing due to minimum symbology and that many of the things you come across can imminently be thrown into a search engine with good results. Adding a dollar-sign introduces additional symbology while reducing readability, and complicating searching for this syntax. From one of these, I also can't tell the type of the formatted string. In this way, I don't think it agrees with the Ada principles of readability or strong typing. Both Rust and C++ allow a format function, with contained elements in braces, followed by a variable number of parameters in the call. Indexing parameters by integers tends to be very error prone. In usual fashion, I would want to describe exactly what I'm doing, with double braces being a literal brace:
For a multi-line string, it's formatting multiple lines, it would be nice to just have Format specifiers are convenient and terse, but can be painful to remember to do correctly and unintuitive. Combining this with named keys like in Rust or python f-strings makes this read like what it does without any googling:
Obviously neither is a final syntax, just an idea. |
I don't understand where |
Does |
It's just a syntax discussion, no change in semantics.
That's also my preference. Others prefer |
The first time I saw those prefixes in other languages, they seemed like odd syntax to me. Now they have a little advantage of familiarity, but forgetting about that, I like the symmetry of |
We have adjusted the discussion so that {" ... "} and F" ... " are on a more even footing, as there seems to be significant interest in both syntaxes.
Use code-block formatting for the multi-line string literal example.
I agree with @reznikmm that the current proposal for using I agree with @pyjarrett that the use of a different syntax I disagree with the current proposal to trim space characters in front of a digit during replacement, I think this should be reserved for the insertion of a single integer value, so in effect using a special version of the predefined |
Yannick Moy wrote:
It is likely that 'Image for a private type that is used to represent a numeric type (e.g. Ada 2022's big numbers) will also include a space in front of the first digit, to be consistent with Ada's builtin numeric types. So it doesn't make sense to omit the space when interpolating integer'Image and float'image, but not omit the space when interpolating big_integer'image. So it is safer to base the removal of the leading space on the string representation of the 'Image, rather than the type of the value. Furthermore, by default a private type uses the 'Image of its full type, so for any private type that happens to be implemented using a numeric type (e.g. something like GNAT's node-ids), you are breaking privacy if you say the leading space will be stripped if it happens to be implemented by a numeric type, but not otherwise. |
By the way, it seems a bit inconsistent to avoid use of "\" even though it is widely used in other languages, and then say that we should not use {" ... "} because it is not used widely. I believe we should focus on readability and usability for Ada, and make choices that are sensitive to both Ada's tradition (e.g. mirrored syntaxes for bracketing syntax) and to conventions adopted widely (e.g. "\"), with readability being a very high priority. One problem with doubling is that you always have to know exactly which characters are and are not to be doubled, whereas with "\" you can presumably use it with characters that you are not sure about. So for example, in these new strings, would you have to double "}"? It is not really necessary. And what about single apostrophe (')? |
If the only concern is to deal with the standard Ada 2022 big numbers, we can include them in the special case. And we can adapt the RM wording to deal with the privacy issue, I don't see it being an issue. The surprising removal of initial space for arbitrary |
Regarding doubling, other languages are happy with doubling |
That's incorrect, most languages have escape sequences (Python, Rust, Swift, C/C++, Javascript). In fact it's harder to find a language that doesn't have them. I disagree completely with @reznikmm justification for not adding them. Newlines are hard yes. So let's not let the user solve that problem ...
I agree with this. I'd rather use |
My own preference is to strip both leading and trailing whitespace, as it seems the simplest and most uniform rule, allowing the implementor of 'Image to use extra whitespace for standalone situations where some kind of alignment might be appropriate. I think it might be a bigger surprise for a user if they write:
and they end up with extra white space before or after the interpolation, when clearly the user did not expect that since they allowed for exactly one space on either side. One simple way, if you really want to see all of the whitespace that 'Image produces, might be to write:
But normally I would say you are more interested in seeing the value of X or F(X), not its verbatim image which might have additional whitespace to provide some kind of formatting in a standalone usage. The other option, as we have discussed, would be to allow for a second parameter inside the { ... } to provide control over how the 'Image is transformed during interpolation:
|
Thanks for correcting, indeed it's not mentioned explicitly in most description pages I've seen (like in the Rust online doc https://doc.rust-lang.org/std/fmt/index.html) but indeed If newline is to be supported, the RFC should describe how it is handled on various platforms (Unix/Windows/other). Plus if that's only to support this case, we could have a different syntax, e.g. use concatenation between F-strings to indicate newline: |
I am worried about the run-time penalty for this. |
I think it would be surprising to the user if implicit newlines appear in the output because of concatenation. Also how would you avoid an undesired newline if you need to split some F-strings into multiple parts to avoid long lines (> 79 chars) because of style checks ( |
My argument against escape characters is based on it breaking conceptual integrity within the language of the behavior within strings, not on popularity. As a C++ programmer who came to Ada recently, this behavior confused me, but it would be confusing for it to work for one type of strings, but not another one. On the other hand, there are languages which differ in this behavior for raw and plain strings, so perhaps it's fine to do anyways? My argument against There's merit to wanting a balanced delimeter, but |
Incorporate the "language-design circle" decision about using {" ... "} syntax. Move things around a bit so alternatives are discussed in the appropriate sections.
We updated the RFC to incorporate decisions on trimming the leading space for "numeric" types where types with Integer_Literal or Real_Literal aspects are considered numeric. We also included the multi-line string literal, and confirmed the decision to use backslash as the escape mechanism.
Hi Tuck, currently it reads "trimmed of a leading space if the second character is a digit", I assume that's just a leftover from the previous version? Also, you allowed the use of "" for double-quote character in format strings, and we discussed that it was also a leftover. |
Add a Reference-Level section that clarifies the static and dynamic semantics of the feature. Clarify that doubled characters are not treated as anything special, and in particular, that only backslash is used for escaping.
@sttaft you mention in the first paragraph the use of backslash to enter unicode characters, is that intentional? It's not described in the rest of the RFC, and we did not discuss it. |
You also mention \n as the character used for newline in multi-line strings, but shouldn't it depend on the platform, like the characters inserted by |
small fix: the rule for |
You say: "An escaped_character represents the given graphic_character" which seems to indicate that |
the value of the variable or the expression is "interpolated" directly | ||
into the value of the enclosing string upon use at run-time. In addition, | ||
an escape character ('\\') is provided for inserting certain standard control | ||
characters (such as newline) or unicode characters within |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the unicode reference @yannickmoy is talking about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this discussion to "future possibilities".
Added a legality rule restricting what characters may follow backslash. Moved the discussion about unicode to future possibilities. Indicated that newline might be represented by a sequence of characters.
This is the first draft of an RFC for the string interpolation feature.