| Field | Value |
|---|---|
| DIP: | 1027 |
| Review Count: | 2 |
| Author: | Walter Bright walter@digitalmars.com |
| Implementation: | |
| Status: | Rejected |
Instead of requiring a format string followed by an argument list, string interpolation enables embedding the arguments in the string itself.
- Rationale
- Prior Work
- Description
- Breaking Changes and Deprecations
- Reference
- Copyright & License
- Reviews
While the conventional format string followed by the argument list is fine for short strings and a small number of arguments, it tends to break down with longer strings that have many arguments. Omitted arguments, extra arguments, and mismatches between format specifiers and their corresponding arguments are common errors. Embedding arguments in the format strings can eliminate these errors. Readability is improved and the code is visually easier to review for correctness.
- Interpolated strings have been implemented and well-received in many languages. For many such examples, see String Interpolation.
- Jason Helson submitted a DIP String Syntax for Compile-Time Sequences.
- Adam Ruppe's string interpolation proposal
- Jonathan Marler's Interpolated Strings
writefln(i"I ate $apples and ${%d}bananas totalling $(apples + bananas) fruit.");
gets rewritten as:
writefln("I ate %s and %d totalling %s fruit.", apples, bananas, apples + bananas);
This will also work with printf:
printf(i"I ate ${%d}apples and ${%d}bananas totalling ${%d}(apples + bananas) fruit.\n");
becomes:
printf("I ate %d and %d totalling %d fruit.\n", apples, bananas, apples + bananas);
The {%d} syntax is for circumstances when the format specifier needs to be anything other than %s,
which is the default. What goes between the { } is not specified, so this capability
can work with future format specification improvements without needing to update
the core language. It also makes interpolated strings agnostic about what the format
specifications are.
The interpolated string starts as a special string token, InterpolatedString, which is the same as a
DoubleQuotedString but with an i prefix and no StringPostFix.
InterpolatedString:
i" DoubleQuotedCharacters "
The InterpolatedString appears in the parser grammar
as an InterpolatedExpression, which is under PrimaryExpression.
InterpolatedExpression:
InterpolatedString
InterpolatedString StringLiterals
StringLiterals:
StringLiteral
StringLiteral StringLiterals
InterpolatedExpresssions undergo semantic analysis similar to MixinExpression.
The contents of the InterpolatedExpression must conform to the following grammar:
Elements:
Element
Element Elements
Element:
Character
'$$'
'$' Argument
'$' FormatString Argument
FormatString:
'{' FormatString '}'
CharacterNoBraces
CharacterNoBraces:
CharacterNoBrace
CharacterNoBrace CharacterNoBraces
CharacterNoBrace:
characters excluding '{' and '}'
Argument:
Identifier
Expression
Expression:
'(' Expression ')'
CharacterNoParens
CharacterNoParens:
CharacterNoParen
CharacterNoParen CharacterNoParens
CharacterNoParen:
characters excluding '(' and ')'
The InterpolatedExpression is converted to a tuple expression, where the first tuple element
is the transformed string literal and the Arguments form the remainder of the tuple elements.
The transformed string literal is constructed as follows:
If the Element is:
Character, it is written to the output string.'$$', a '$' is written to the output string.'$' Argument, then '%s' is written to the output string.'$' '{' FormatString '}' Argument, thenFormatStringis written to the output string.
The string becomes the first element of the tuple.
If the Argument is an Identifier, it is appended to the tuple as an IdentifierExpression.
If the Argument is an Expression, it is lexed and parsed (including the surrounding parentheses)
like MixinExpressions and appended to the tuple as an Expression.
Compile-time errors will be generated if the Elements do not fit the grammar.
In order to facilitate convenient formatting of long strings, if a StringLiteral follows an InterpolatedString,
it is appended to the InterpolatedString in the parsing pass. (Not the lexer pass.)
A StringPostfix may not appear on any of them.
Such concatenations enable the various string literal types to be used as InterpolatedStrings,
such as:
i""q{apples and $("bananas")}
yielding a tuple expression:
"apples and %s", "bananas"
Interpolated string formats cannot be mixed with conventional elements:
string tool = "hammer";
writefln(i"hammering %s with $tool", "nails");
will produce:
string tool = "hammer";
writefln("hammering %s with %s", tool, "nails");
as the interpolated arguments will always come first, and out of order with the conventional arguments. This error is not detected.
No attempt is made to diagnose format specification errors, such as attempting to format an integer as a floating point value. The meaning of the format specifications is unknown to the core language.
wchar and dchar interpolated strings are not allowed. These can be added
later if necessary, but the use of such literals
has turned out to be rare. They can always be formed with a cast if desired.
Since the interpolated string is a new token, no existing code is broken.
- [DIP 1027---String Interpolation---Community Review Round 1] (https://digitalmars.com/d/archives/digitalmars/D/DIP_1027---String_Interpolation---Community_Review_Round_1_333503.html)
Copyright (c) 2019 by the D Language Foundation
Licensed under Creative Commons Zero 1.0
The review generated a tremendous volume of feedback and discussion. Key points that were raised include:
- the choice of
%instead of something more familiar from other language, like$. Related, the lowering of%%to%was raised multiple times as being problematic. The DIP author modified the proposal to use$instead. - the requirement of the
iprefix. The DIP author explained that an interpolated string has to be a separate token. - what about concatenating multiple strings to the interpolated string with
~? The DIP author explained that this presents a problem during semantic analysis. He said it can be solved by allowing automatic concatenation of any strings following an interpolated string. Although such behavior is deprecated for string literals, and interpolated string is not treated as a string literal. - the choice of lowering an interpolated string to a tuple of format string + arguments as opposed to a tuple of strings and arguments. The DIP author responded that he didn't see the point.
- why not lower to a function call? The DIP author responded that functions currently can't return tuples and it wouldn't work with
printf. - why not a library solution? The DIP author said he tried, but it requires an awkward syntax.
- why no assignment to string? The DIP author presented several reasons why this was undesirable, including: requires GC allocation, worse performance due to intermediate buffer, will not work with BetterC, and more.
- the lowering of
%{FormatString}only works withprintfandwritef. The DIP author decided to change it to simply{FormatString}, e.g.,{%d}instead of%{d}, and treating everything in the braces as a whole string rather than as a suffix to%.
Much of the discussion revolved around the syntax and the choice of lowering to a tuple of format string and arguments, and questions about interaction with other language features (e.g., templates, token strings). The DIP author's position, established over several comments, was essentially:
- tuples can easily be manipulated to suit the user's needs
- the implementation must be compatible with BetterC, meaning
printfand similar C functions - the implementation must not trigger GC allocations
- the implementation must not depend on Phobos
- the implementation must be performant
Some of the feedback from this review round consisted of items brought up in the previous round. A summary:
- the focus on
printfandwritef. The DIP author again noted that these were just used as examples and the feature is not restricted to those functions. - the compiler should attempt to verify format strings. The DIP author responded that this should not be a core feature of the language, but could be added as an extension in a separate proposal.
- interpolated strings should lower to a struct instead of a tuple
- the DIP should separate specification and implementation details. The DIP author disagreed.
- incorporate the rationale for decisions arising from feedback in the previous review round.
- clarify if interpolated strings can be used in contexts other than function arguments and if new functions can be written to accept them as arugments. The DIP author said the text is already clear that they are lowered to tuples. He added that if the spec is not clear on how tuples should be manipulated, it should be amended rather than the DIP.
- it's a mistake to inject
%into the output when allowing the mixing of conventional specifiers such as%s. The DIP author noted that%is only injected in the default case when replacing${foo}with%sin the output. The user is expected to know the requirements of the target format string and can change the output as needed. - why does
InterpolatedExpressionundergo semantic analysis similar toMixinExpression? - the "Concatenation" section needs more detail.
- ambiguous grammar.
- what about nested interpolated strings?
- what about compatibility with the
c,w, anddsuffixes? - more discussion of design goals, comparison with other languages and proposals.
- a proposal to lower to a templated library functions rather than a tuple. The DIP author disagreed.
The language maintainers rejected this proposal due to overwhelming negative feedback in the two review rounds. Many reviewers were unhappy with the proposed tuple-based implementation. Some would prefer an implementation that resolves to a library template, but the language maintainers were not keen on such an implementation, seeing it as an equivalent of AST macros. The lack of consensus regarding the syntax was also a factor in their decision.