Skip to content

Latest commit

 

History

History
268 lines (196 loc) · 11.9 KB

DIP1027.md

File metadata and controls

268 lines (196 loc) · 11.9 KB

String Interpolation

Field Value
DIP: 1027
Review Count: 2
Author: Walter Bright walter@digitalmars.com
Implementation:
Status: Rejected

Abstract

Instead of requiring a format string followed by an argument list, string interpolation enables embedding the arguments in the string itself.

Contents

Rationale

While the conventional format string followed by the argument list is fine for short strings and a small number of arguments, it tends to break down with longer strings that have many arguments. Omitted arguments, extra arguments, and mismatches between format specifiers and their corresponding arguments are common errors. Embedding arguments in the format strings can eliminate these errors. Readability is improved and the code is visually easier to review for correctness.

Prior Work

Description

writefln(i"I ate $apples and ${%d}bananas totalling $(apples + bananas) fruit.");

gets rewritten as:

writefln("I ate %s and %d totalling %s fruit.", apples, bananas, apples + bananas);

This will also work with printf:

printf(i"I ate ${%d}apples and ${%d}bananas totalling ${%d}(apples + bananas) fruit.\n");

becomes:

printf("I ate %d and %d totalling %d fruit.\n", apples, bananas, apples + bananas);

The {%d} syntax is for circumstances when the format specifier needs to be anything other than %s, which is the default. What goes between the { } is not specified, so this capability can work with future format specification improvements without needing to update the core language. It also makes interpolated strings agnostic about what the format specifications are.

The interpolated string starts as a special string token, InterpolatedString, which is the same as a DoubleQuotedString but with an i prefix and no StringPostFix.

InterpolatedString:
   i" DoubleQuotedCharacters "

The InterpolatedString appears in the parser grammar as an InterpolatedExpression, which is under PrimaryExpression.

InterpolatedExpression:
   InterpolatedString
   InterpolatedString StringLiterals

StringLiterals:
   StringLiteral
   StringLiteral StringLiterals

InterpolatedExpresssions undergo semantic analysis similar to MixinExpression. The contents of the InterpolatedExpression must conform to the following grammar:

Elements:
    Element
    Element Elements

Element:
    Character
    '$$'
    '$' Argument
    '$' FormatString Argument

FormatString:
    '{' FormatString '}'
    CharacterNoBraces

CharacterNoBraces:
    CharacterNoBrace
    CharacterNoBrace CharacterNoBraces

CharacterNoBrace:
    characters excluding '{' and '}'


Argument:
    Identifier
    Expression

Expression:
    '(' Expression ')'
    CharacterNoParens

CharacterNoParens:
    CharacterNoParen
    CharacterNoParen CharacterNoParens

CharacterNoParen:
    characters excluding '(' and ')'

The InterpolatedExpression is converted to a tuple expression, where the first tuple element is the transformed string literal and the Arguments form the remainder of the tuple elements.

The transformed string literal is constructed as follows:

If the Element is:

  • Character, it is written to the output string.
  • '$$', a '$' is written to the output string.
  • '$' Argument, then '%s' is written to the output string.
  • '$' '{' FormatString '}' Argument, then FormatString is written to the output string.

The string becomes the first element of the tuple.

If the Argument is an Identifier, it is appended to the tuple as an IdentifierExpression. If the Argument is an Expression, it is lexed and parsed (including the surrounding parentheses) like MixinExpressions and appended to the tuple as an Expression.

Compile-time errors will be generated if the Elements do not fit the grammar.

Concatenations

In order to facilitate convenient formatting of long strings, if a StringLiteral follows an InterpolatedString, it is appended to the InterpolatedString in the parsing pass. (Not the lexer pass.) A StringPostfix may not appear on any of them. Such concatenations enable the various string literal types to be used as InterpolatedStrings, such as:

i""q{apples and $("bananas")}

yielding a tuple expression:

"apples and %s", "bananas"

Limitations

Mixing Conventional Format Arguments With Interpolated Strings

Interpolated string formats cannot be mixed with conventional elements:

string tool = "hammer";
writefln(i"hammering %s with $tool", "nails");

will produce:

string tool = "hammer";
writefln("hammering %s with %s", tool, "nails");

as the interpolated arguments will always come first, and out of order with the conventional arguments. This error is not detected.

Format Specification Errors

No attempt is made to diagnose format specification errors, such as attempting to format an integer as a floating point value. The meaning of the format specifications is unknown to the core language.

W and D Interpolated Strings

wchar and dchar interpolated strings are not allowed. These can be added later if necessary, but the use of such literals has turned out to be rare. They can always be formed with a cast if desired.

Breaking Changes and Deprecations

Since the interpolated string is a new token, no existing code is broken.

Reference

Copyright & License

Copyright (c) 2019 by the D Language Foundation

Licensed under Creative Commons Zero 1.0

Reviews

Community Review Round 1

Reviewed Version

Discussion

The review generated a tremendous volume of feedback and discussion. Key points that were raised include:

  • the choice of % instead of something more familiar from other language, like $. Related, the lowering of %% to % was raised multiple times as being problematic. The DIP author modified the proposal to use $ instead.
  • the requirement of the i prefix. The DIP author explained that an interpolated string has to be a separate token.
  • what about concatenating multiple strings to the interpolated string with ~? The DIP author explained that this presents a problem during semantic analysis. He said it can be solved by allowing automatic concatenation of any strings following an interpolated string. Although such behavior is deprecated for string literals, and interpolated string is not treated as a string literal.
  • the choice of lowering an interpolated string to a tuple of format string + arguments as opposed to a tuple of strings and arguments. The DIP author responded that he didn't see the point.
  • why not lower to a function call? The DIP author responded that functions currently can't return tuples and it wouldn't work with printf.
  • why not a library solution? The DIP author said he tried, but it requires an awkward syntax.
  • why no assignment to string? The DIP author presented several reasons why this was undesirable, including: requires GC allocation, worse performance due to intermediate buffer, will not work with BetterC, and more.
  • the lowering of %{FormatString} only works with printf and writef. The DIP author decided to change it to simply {FormatString}, e.g., {%d} instead of %{d}, and treating everything in the braces as a whole string rather than as a suffix to %.

Much of the discussion revolved around the syntax and the choice of lowering to a tuple of format string and arguments, and questions about interaction with other language features (e.g., templates, token strings). The DIP author's position, established over several comments, was essentially:

  • tuples can easily be manipulated to suit the user's needs
  • the implementation must be compatible with BetterC, meaning printf and similar C functions
  • the implementation must not trigger GC allocations
  • the implementation must not depend on Phobos
  • the implementation must be performant

Final Review

Reviewed Version

Discussion

Feedback

Some of the feedback from this review round consisted of items brought up in the previous round. A summary:

  • the focus on printf and writef. The DIP author again noted that these were just used as examples and the feature is not restricted to those functions.
  • the compiler should attempt to verify format strings. The DIP author responded that this should not be a core feature of the language, but could be added as an extension in a separate proposal.
  • interpolated strings should lower to a struct instead of a tuple
  • the DIP should separate specification and implementation details. The DIP author disagreed.
  • incorporate the rationale for decisions arising from feedback in the previous review round.
  • clarify if interpolated strings can be used in contexts other than function arguments and if new functions can be written to accept them as arugments. The DIP author said the text is already clear that they are lowered to tuples. He added that if the spec is not clear on how tuples should be manipulated, it should be amended rather than the DIP.
  • it's a mistake to inject % into the output when allowing the mixing of conventional specifiers such as %s. The DIP author noted that % is only injected in the default case when replacing ${foo} with %s in the output. The user is expected to know the requirements of the target format string and can change the output as needed.
  • why does InterpolatedExpression undergo semantic analysis similar to MixinExpression?
  • the "Concatenation" section needs more detail.
  • ambiguous grammar.
  • what about nested interpolated strings?
  • what about compatibility with the c, w, and d suffixes?
  • more discussion of design goals, comparison with other languages and proposals.
  • a proposal to lower to a templated library functions rather than a tuple. The DIP author disagreed.

Formal Assessment

The language maintainers rejected this proposal due to overwhelming negative feedback in the two review rounds. Many reviewers were unhappy with the proposed tuple-based implementation. Some would prefer an implementation that resolves to a library template, but the language maintainers were not keen on such an implementation, seeing it as an equivalent of AST macros. The lack of consensus regarding the syntax was also a factor in their decision.