Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String Interpolation #140

Closed
wants to merge 30 commits into from
Closed

String Interpolation #140

wants to merge 30 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jan 23, 2019

No description provided.

Jonathan Marler and others added 30 commits December 7, 2018 08:30
Added some links and implementation details
Need to support arbitrary expressions
The proposed "lazy alias" feature is unnecessary, and its presence in
the DIP distracts readers from the main focus: string sequence literals.

To achieve the same results without language changes, it suffices in
almost all cases to wrap the string sequence literal in an explicit
AliasSeq instantiation, or a call to std.typecons.tuple:

    alias ctSeq = AliasSeq!(i"hello, {name}");
    auto rtSeq = tuple(i"a + b = ${a+b}");

If lazy evaluation is desired, it can be opted-into explicitly using
existing language features (lambdas, lazy parameters).
As @pbackus mentioned in #5, `std.typecons.tuple` can be used as an alternative to support expressions:
```
import std.typecons;
auto foo = tuple(i"a+1 is $(a+1).");
writeln(foo.expand);
```
Remove "expressions" section
@12345swordy
Copy link

12345swordy commented Jan 29, 2019

Awkward syntax

Why is it awkward? Examples are badly needed here. The dip needs an actual library to compare with instead of an hypothetical one.

Bad performance

This needed actual benchmark data to say that it has bad performance.

Depends on a library for a trivial feature

So what? Complex numbers depends on a library. Not a good con.

Cannot be used with betterC

You need to explain why. An actual implementation of an library would be useful here in demonstrating this.

@ghost
Copy link
Author

ghost commented Jan 29, 2019

@mdparker
Copy link
Member

Just wanted to let you know I've seen this. It will be some time yet before I get to it, but please take that time to encourage folks to look over it for draft review.

@quickfur
Copy link
Member

Also, a precise description of exactly how the compiler is to interpolate the string is required. Even though we already have an implementation, this still needs to be spelled out exactly in detail (e.g., what characters may follow a $, what to do if the identifier clashes with subsequent characters that shouldn't be part of the identifier, how to include a literal $ in the string, are complex expressions like array lookups allowed, what their syntax is, etc.).

I've rather limited time on my hands, but if I get some free time I could try to throw together a draft of such a description. But this needs some serious fleshing out if it's to stand any chance of being accepted by Walter & Andrei.

@quickfur
Copy link
Member

Also, there should be more detailed examples as to what to do with the resulting tuple / sequence of tokens. A sample implementation of string interpolation is a good basic example (I see it's already implied in the call to text, but this needs to be explained more explicitly and clearly -- and with more examples of why current syntax sux and why this proposal is better. Like examples with mixins, HTML interpolation, form letter interpolation, what-have-you).

But it would do wonders if we included a more powerful example, like Steven's database example, accompanied with a detail explanation of just why it's so awesome. Any other possible applications should also be listed -- otherwise it would sound weak ("fancy language feature with limited applications").

@thewilsonator
Copy link
Contributor

This should give an example of interpolated qstrings.

@andre2007
Copy link

In the examples 3 different styles are used:
${foo}
$foo
$(foo)
You can list them at the beginning and say there are different possibilities. For readability the examples should all use the same style.

@quickfur
Copy link
Member

quickfur commented Feb 5, 2019

In the examples 3 different styles are used:
${foo}
$foo
$(foo)
You can list them at the beginning and say there are different possibilities. For readability the examples should all use the same style.

And moreover, there should be a section clearly describing these different syntaxes as different possibilities, rather than implying that all of them must be supported simultaneously. Or, if the latter was intended, it should be clearly stated and not merely left up to implication.

@mdparker
Copy link
Member

@Jash11 Please edit the status field to "Draft". Thanks!

Copy link
Contributor

@ntrel ntrel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Presumably the user has to also import std.conv : text. This is a significant downside compared to other interpolated strings in other languages - creating a new string is the common case, not creating a sequence. It would be better if the interpolated string expression was actually a struct instance that implicitly converts to a value sequence instead of being just a value sequence, and also has built-in fields something like:
U u; V v;
// string s = i"u = $u, v = $v".text;
string s = struct {
  alias expand this;
  alias expand = __dmdseq("u = ", u, ", v = ", v);
  alias text = {import std.conv : text; return text(expand);}`;
  enum literalLength = "u = ".length + ", v = ".length;
}().text;

The text import is only made if the alias is used. (Having a dependency on Phobos for a language feature has precedence - a ^^ b lowers to std.math.pow IIRC). It would also be useful if the struct had a name so template code can recognise that interpolated arguments have been passed rather than just variadic arguments.

Bonus points if you can do i"v = $v".text!myAllocator.

Having literalLength could help the implementation of e.g. text to avoid having to sum the length of each string fragment to avoid reallocations when assembling a long query string. (With format("u = %s, v = %s", u, v), the first approximation for string length is just arg1.length). In any case, using a struct instance over just a value sequence gives us more opportunity for enhancements.

  1. Re: Library implementation 'awkward syntax': A simpler more general feature than this DIP would be to have $templateName(args) mean mixin(templateName!args). Then $i("v = $v") syntax reads OK. The i template would need to be in object.d.


`db.exec("UPDATE Foo SET a = ?, b = ?, c = ?, d = ? WHERE id = ?", aval, bval, cval, dval, id);`<br>
Becomes:<br>
`db.exec(i"UPDATE Foo SET a = $(aval), b = $(bval), c = $(cval), d = $(dval) WHERE id = $(id)");`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not equivalent, the implementation of exec has to change to receive interleaved strings and variables.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, and actually, I would never, as a library author, write a library function that works with i"..." as it is defined right now because it would be so easy to do wrong.

What would stop you from writing db.exec("UPDATE foo SET a = ", a)? Well, nothing, and it would work fine, but just kinda coincidentally.

But what about db.exec(i"UPDATE $(tablePrefix)table SET a = ?", 10). OK, it is fair to say "don't do that", but there's no way to tell, in the library, that they did!

It translates to db.exec("UPDATE ",tablePrefix,"table SET a = ?", 10). At run time, you can issue a sql syntax error, but it is impossible to tell the difference at compile time.

I hate to be the downer, cuz I basically like this proposal, but I still think we should be making this thing give a new type. Instead of a naked tuple, do an anonymous struct. Change text and writeln etc. to recognize this new kind of struct, or at the usage site, use .tupleof.

`i"foo $(a)".tupleof.whatever is still an option... but then library authors who actually want to work with the details have them available. (and we should probably change text and writeln so those just work first anyway.)

Copy link
Contributor

@ntrel ntrel May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should be making this thing give a new type. Instead of a naked tuple, do an anonymous struct

Amen. We need to be able to tell on the callee side if an interpolated sequence was passed vs just variadic args. And if it's a struct we can also have a text method wrapping std.conv.text, so we can actually get a string easily without having to keep adding a local import every time we use interpolated strings. Getting a string is a major use case, however nice parsing a sequence of elements is for efficiency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can live with the convenience .text method. (i almost want to call it .toString to leverage existing things that call that name, but whatever, i don't care that much about the name).

I am slightly disgusted at having a magic struct created by the compiler include such a reference, but I think it works and the compiler already magically references std.math in syntax, so I can live with it. I'd just make its magic .text thing be a zero-arg template, so you don't pay for it if it isn't actually called.

I think I will write more about this in today's "this week in D"; I need to write that in a couple hours anyway and that gives me some content to dual-purpose since i missed most of dconf lol

I'll link it back here in a while. It might be a fairly different implementation though, since a magic struct cannot - I think anyway - be done with just a lexer hack. Even if you wrapped it in a call syntax to punt the details to the runtime library, you'd actually lose some functionality. (consider: foo!i"$(some_alias)". If that is passed to __d_is(some_alias), the aliasness is gone. On the other hand, foo(i"$(a + 4)") being passed to __d_is!(a+4) is liable to fail with "variable a is not accessible at compile time". So I don't think they both can work with a library function injected in the middle. Now, if I had to choose, I think the latter case is more important and I'd sacrifice the former case, but with the current PR, both work, and I think that is actually kinda cool. So I wanna preserve that. And a compiler-generated magic struct can do it.)

Anyway, I'll write more in a few hours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, since we have alias this, we can default to expanding the interpolated struct into a tuple while still giving access to methods to work with the interpolated data structure.

writeln(i"1 + 2 is $(1+2)");

// don't need to import std.conv.text because .text is just a member of the
// interpolated string type
audo s = "1 + 2 is $(1 + 2)".text;

I'd like to know where @WalterBright and @andralex stand on this. Since they are the ones to decide what design will be accepted, I'd like to get their input on this particular aspect. But I think I agree with you on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I forgot that Tuple doesn't automatically expand to a "tuple", that's kind of the whole reason why Tuple exists :)

In any case I do see alot of versatility in using the structure. It's a very good idea and I'm on board with it. It does make a few use cases harder by sometimes requiring the user to use tupleof and/or requiring a wrapper template, but a small cost compared to the benefits. There will likely only be a handful of functions that need to accept interpolated strings. Though, I would like to be able to do things like:

// contrived example, you wouldn't actually do it this way
writeln(i"foo is $(foo) and bar is ", bar, i" and baz is $(baz)");

This would mean I don't think we could just use a wrapper, but would need to add a static if inside writeln to handle interpolated strings to automatically expand them.

And yes, having a toString(sink) would also be good.

Thanks for taking the time to write this up, I like it alot. I'll think about it and may update my implementation to it.

Copy link
Contributor

@marler8997 marler8997 May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment on your interpolation proposal. I think a big use case for interpolated strings is to make mixins much easier to write/maintain. Using q{} string literals along with interpolated strings is a big win. You mentioned in your article that you don't think they should be supported, and provided an example that you thought was ambiguous. However, it's currently well-defined in my implementation.

There's 2 ways you could go about string interpolation. You could either parse the interpolated string at the same time you are you are parsing the string literal, or you could first parse the string literal and then interpolate it afterwards. I opted for the second design. With this model, it's easier to determine what will happen in cases like yours because the string is processed by each part one at a time. So in your example:

mixin(iq{
   string s = i"foo $(bar)";
});
  1. The token string is parsed. Token strings don't have escapes so it stays the same.
  2. Then the interpolater runs. It ignores everything except when it sees a '$' character. The rule is made intentionally simple so a developer will be able to determine what will happen. The the string above will become:
mixin(__d_interpolatedString(`
   string s = i"foo `, bar, `";
`));
  1. Now mixin gets involved, of which interpolatedString support should be added, and it will convert bar to a string. Let's assume its string representation is "barvalue".
mixin(`
   string s = i"foo barvalue";
`));
   string s = i"foo barvalue";
  1. Now the string literal parser and interpolator run again which gives us:
     string s = __d_interpolatedString("foo barvalue");

Now if you wanted the other behavior where the $(bar) expression is expanded later, then you would escape the $ with $$:

mixin(iq{
   string s = i"foo $$(bar)";
});

Will become:

   string s = i"foo $(bar)";

Which becomes:

     string s = __d_interpolatedString("foo ", bar);

So by default, any $(...) expressions are always expanded by the initial interpolation, and if you want to delay expansion then you can escape the dollar. The interpolation parser doesn't try to understand the characters in between the '$' expressions and escape quoted sections etc... that just ends up causing more confusion and will cause the developer headaches when they try to figure out how to get what they want from their interpolated strings.

Copy link
Contributor

@marler8997 marler8997 May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, double quoted strings have the same issue. Your example doesn't only apply to token strings:

mixin("
   string s = i\"foo $(bar)\";
");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think it is trivial with double-quoted strings because it is all syntax highlighted as a big string :) with token strings, the D lexer runs inside too so you might think it is different.

But that said, your reasoning is solid and simple to understand, so I could go with that too. (just gotta be sure we mention all these things - and explicitly mention what we decide to exclude - to get past the Walter barrier :) )

And also I guess mixin should probably be able to handle the interpolated thing too. The alias this could handle that, so could an explicit call to toString (or text), but we should prolly document it regardless.

PS this is a kinda weird place to be having this conversation, as comments on a random line of example code :P but whatever.

Copy link
Contributor

@marler8997 marler8997 May 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using alias this with toString would mean that using mixins with interpolated strings would require phobos.

enum SomeValue = 1234;
mixin(iq{enum AnotherValue = $(SomeValue + 1);});

Seems wrong to me....I'll have to think about this one.

Token:
...
StringLiteral
i StringLiteral
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to have an interpolated WysiwygString, it isn't WYSIWYG. You also can't have escapes so you can't have a $ character.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way the DIP should specify what happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string literal itself is still WYSIWYG. It's just that afterwards, it's processed by the interpolator. You can think of the interpolator like an operator and/or function call. For example, if you passed a WYSIWYG string to the "toUpper" function, the string wouldn't by WYSIWYG.

src/build.d:556:<br>
`auto hostDMDURL = "http://downloads.dlang.org/releases/2.x/"~hostDMDVer~"/dmd."~hostDMDBase;`<br>
Becomes:<br>
`auto hostDMDURL = i"http://downloads.dlang.org/releases/2.x/$hostDMDVer/dmd.$hostDMDBase".text;`<br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better example would be to use a non-string variable:

import std.conv : to;
string folder = "releases/" ~ dmd_major_version.to!string ~ ".x";
// no `to` import needed (but `text` might be)
string folder = "releases/$dmd_major_version.x".text;

@thewilsonator
Copy link
Contributor

The author seems to have deleted his GH account?

@marler8997
Copy link
Contributor

I would take it over, but alas, it would be irresponsible for me to invest so much into yet another DIP when the process is so broken. Hopefully the leadership takes our feedback and fixes the process. I would love to create a good DIP and work with the community to fine tune this feature.

@12345swordy
Copy link

12345swordy commented Mar 8, 2019

The author seems to have deleted his GH account?

...The hell!? Why am I not surprise by this? It quite clear that we need an issue tracker to track fundamental language issues that people have with the language as not everyone have the time/skill to write an DIP and wait through the 180+ response. There are two advantages to this approach:

  1. Track the popularity of an given issue.
  2. Allow A&W to shoot down any ideas that are bad.

This is no means a replacement for the DIP. This allow easy information gathering when writing a DIP.

@marler8997
Copy link
Contributor

@12345swordy sounds like a fine idea. Maybe you should write a DIP? jk :)

@WalterBright
Copy link
Member

@marler8997 I suggest you reboot this as a new DIP to match your proposal. It seems to be the easiest way forward.

@adamdruppe
Copy link
Contributor

adamdruppe commented May 14, 2019 via email

@marler8997
Copy link
Contributor

I don't really have an issue that it's using phobos necessarily. More an issue that it's hard coded to use std.conv.text, or whatever. Maybe we could make that configurable in object.d?

// somewhere in object.d
// make sure this a template so it doesn't try to import anything unless it's used
template __d_valuesToString ()
{
    import std.conv : text;
    alias __d_valuesToString = std.conv.text;
}

@adamdruppe
Copy link
Contributor

adamdruppe commented May 14, 2019 via email

@marler8997
Copy link
Contributor

Gotcha, I'm good with that.

@mdparker
Copy link
Member

Given that the DIP author's account has been deleted and Walter has submitted an alternative proposal, I'm closing this.

@mdparker mdparker closed this Sep 26, 2019
@marler8997
Copy link
Contributor

Can you link to his proposal?

@mdparker
Copy link
Member

#165

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.