Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Unicode-related String Interpolation standard -- consider for Ada? #37

Open
sttaft opened this issue Jan 9, 2023 · 11 comments
Open
Labels
Feature Request A proposal for a new language feature or capability

Comments

@sttaft
Copy link

sttaft commented Jan 9, 2023

At AdaCore we have been discussing possible ways of supporting string "interpolation" where a special syntax for string literals allows direct "interpolation" of the values of variables and expressions into the string, such as:

"Name = {First_Name} {Last_Name}, Address = {Address}, and Age = {(Now - Birthday) / Year}."

Of course, we would need some way of distinguishing such strings from "normal" string literals, and we have considered various options, such as:

  • {"Name = {...}."}
    or, with a python-like prefix
  • F"Name = {...}."
    or, using '$' consistently
  • $"Name = $First_Name $Last_Name, ... $((Now - Birthday) / Year)."$

Today I noticed that the Unicode consortium is working on a standard for something that approximates string interpolation, which they call "Message Format 2" (great name ;-):

Message Format 2.0 syntax

which is a follow-on to a relatively old existing standard "ICU MessageFormat", which had some "pain points":

ICU MessageFormat pain points

Here are a couple of simple examples (drawn from Message Format 2.0 syntax):

  {Hello, {$userName}!}

A message with an interpolated $date variable formatted with the :datetime function:

  {Today is {$date :datetime weekday=long}.}

If we want to consider something like this for standardizing, it would make sense to look at the work the Unicode consortium is doing, as it seems to be based on significant experience, both bad and good, with the ICU MessageFormat.

-Tuck

@Richard-Wai
Copy link
Contributor

Richard-Wai commented Jan 11, 2023

As much as it makes me cringe to imagine Ada with curly braces, I get the value of such format strings. However to me, it seems the usual way of concatenating strings is very close to what we'd get from any additional complex syntax.

Maybe there is a way we can add a simple syntactic sugar approach that collapses the " & ... & " sequence, such as via the dollar sign. For example, what if we say that a '$' within a string with a matching '$' that is more than zero characters away is exactly equivalent to (using single quote to delineate) '" & ' for the first and ' & "' for the second.

Put_Line ("My name is $First_Name$ and my age is$Year'Image(Current_Year - Birth_Year)$");

This would be syntactically equivalent to

Put_Line ("My name is " & First_Name & " and my age is" & Year'Image(Current_Year - Birth_Year) & "");

And could be formed through simple text replacement.

Also we could follow the same convention for double-quote where two '$' in a row is replaced by a single $.

Put_Line ("Your total is $$$Money'Image(Total)$.");

Which would become

Put_Line ("Your total is $" & Money'Image(Total) & ".");

I'm a bit weary if introducing the complexity as given in the Unicode standard, particularly given that Ada 2022 has such rich user-defined image facilities.

@sttaft
Copy link
Author

sttaft commented Jan 12, 2023

Now that Ada has a universal 'Image, it makes it very annoying to have to specify it all over the place when producing textual output. So the goal is to replace " & X'Image & " (or worse, " & Integer'Image(X + Y) & "), with simply $X or $(X + Y) when appearing in the middle of an "interpolated" string literal.

@ARG-Editor
Copy link
Collaborator

ARG-Editor commented Jan 12, 2023 via email

@sttaft
Copy link
Author

sttaft commented Jan 12, 2023

How are you proposing to differentiate "interpolated" string literals from regular ones?

The string literal would start (and possibly end) with a unique sequence, such as:

  {" ... "}

or

  F" ... "

or

 $" ... "

as mentioned above in the original note.

So a complete interpolated string literal might be:

Put_Line ({"Name = {First_Name} {Last_Name}, Address = {Address}, and Age = {(Now - Birthday) / Year}."});

or

Put_Line ($"Name = $First_Name $Last_Name, Address = $Address, and Age = $((Now - Birthday) / Year).");

@jprosen
Copy link

jprosen commented Jan 13, 2023 via email

@briot
Copy link

briot commented Jan 13, 2023 via email

@sttaft
Copy link
Author

sttaft commented Jan 13, 2023

I see nothing annoying in having to type a few more characters. This
looks like another feature justified by ease-of-writing, and because
some other interpreted popular language has something like it.
This increases the complexity of the language, defeats orthogonality,
and I doubt it will have any effect on the popularity of the language...

It really makes quite a difference on readability, and reduction in silly errors. An intern and I were writing a compiller in ParaSail that generated LLVM intermediate representation (which has a textual form), and at some point we realized all of the calls on ToString and the various concatenation operations were making the code unbelievably hard to read. Since we could, we added string interpolation to ParaSail, and the improvement was enormous. Yes it made it easier to write, but it also made it much easier to read, and hence much easier to notice mistakes.

@jprosen
Copy link

jprosen commented Jan 13, 2023 via email

@sttaft
Copy link
Author

sttaft commented Jan 20, 2023

Fair enough. But couldn't you achieve the same thing with a couple of
subprograms?

I don't see how. String interpolation requires the Ada lever, parser, and semantic analysis to work together. For example, an interpolated string literal like:

$"The solution to your problem is $(F(X + Y/3, "abc") + G('a', 7.5)) presuming X = $X"

I can't quite imagine how a couple of subprograms could handle that. The point is that we are interpolating the 'Image of the value of an arbitrary Ada expression into the middle of a string literal. The equivalent non-interpolated syntax would be:

"The solution to your problem is " & My_Type'Image(F(X + Y/3, "abc") + G('a', 7.5)) & " presuming X = " & X'Image

Both readability and writability are improved in the interpolated version, I would claim.

-Tuck

@Richard-Wai
Copy link
Contributor

I don't see how.

My interpretation of JP's point, which is one I sympathize with, is that the "solution to your problem" should be produced by a function itself. Particularly if that function was nearer in scope, and would see Y, G, et al directly. In such a case you'd simply say:

"The solution to your problem is " & My_Type'Image(Compute_Solution (X)) & " presuming X = " & X'Image

Or even better, you could have another function expression that returned My_Time'Image of Compute_Solution at X,

"The solution to your problem is " & Solution_String (X) & " presuming X = " & X'Image

I have used this kind of approach regularly, and I'm having a hard time seeing this proposal as being anything more than yet another lazy programmer feature, which alienates people (like myself) who really don't want to see Ada go that route, and does nothing to satisfy people using languages that are structurally faster to type, such as Rust.

Sure it might be more readable in isolation, but I don't think you can as easily argue that it is any more readable than abstracting things out to more specialized subprograms.

@ARG-Editor ARG-Editor added the Feature Request A proposal for a new language feature or capability label May 26, 2023
@eggrobin
Copy link

Writing here what I said in the ARG: One should bear in mind that MessageFormat 2.0 (like its ancestor, ICU MessageFormat) is about localized strings; as such, it comes with a rather fancy domain-specific language which is needed to handle the complexities of grammar in localized strings (the main example being pluralization; English is easy here, with just two plural cases—singular for 1, plural for everything else—but many languages are more interesting; consider Russian’s 4 plural cases or Arabic’s 6, depending on the last two digits).

See https://unicode-org.github.io/icu/userguide/format_parse/messages/#complex-argument-types in the old MessageFormat and https://github.com/unicode-org/message-format-wg/blob/main/spec/syntax.md#complex-messages in the draft new one.

String interpolation syntaxes in programming languages usually do not deal with that; as Tucker mentioned in the ARG meeting, it is common to see such things as the following Python:

f"{n} cat{'' if n == 1 else 's'}"

The reason why the MessageFormat syntax exists is that the above construct is impossible to localize: translators do not get to change the program, and no amount of playing with the s and the cat will yield 1 кошка, 2 кошки, 5 кошек, 21 кошка.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request A proposal for a new language feature or capability
Projects
None yet
Development

No branches or pull requests

6 participants