Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: optimize codegen for "simple" string interpolation #9212

Closed
jods4 opened this issue Feb 25, 2016 · 19 comments
Closed

Suggestion: optimize codegen for "simple" string interpolation #9212

jods4 opened this issue Feb 25, 2016 · 19 comments
Labels
Area-Compilers Feature Request Resolution-Duplicate The described behavior is tracked in another issue

Comments

@jods4
Copy link

jods4 commented Feb 25, 2016

String interpolation simplicity and clearness makes it very attractive for formatting text.
In fact, at my company I've seen it used for things as trivial as var greeting = $"Hello {name}", which admittedly is only slightly nicer than "Hello " + name.

The thing is: it doesn't have the same performance characteristics. Currently C# converts an interpolated string into a string.Format("Hello {0}", name) call. This has the benefit of being the simplest implementation (from a compiler perspective). But it has quite some overhead: boxing of value type placeholders (such as numbers), parsing of the format string, validation of the parameters, etc.

Most of the time you may not care. But sometimes in hot path code you want to be very careful with those allocations and computations.

I suggest that the compiler could generate more optimized code in at least some (or all?) situations.

It is evident that when you convert to FormattableString or IFormattable you'd need to generate a specialized class for each template, which may be excessive (or not?).

In the common case of interpolating strings, you could turn the parameters into strings at interpolation site, with specialized code, so that value types don't require boxing.
Concatenation of the results could be done with string.Concat, which doesn't require allocating an array for up to 4 parts.

This means that code like string s = $"x: {x}, y: {y}", which is a common use of string interpolation, could generate the IL equivalent of string s = string.Concat("x: ", x.ToString(), ", y: ", y.ToString()) which is probably the most efficient code you could have to create that string.

If the compiler did that work, we would get both benefits of clean syntax and most efficient specialized code.

@alrz
Copy link
Contributor

alrz commented Feb 26, 2016

Related: #194.

@jods4
Copy link
Author

jods4 commented Feb 26, 2016

Sorry about that, I did a search for issues about "interpolation", not sure how I missed this one.

Apparently the idea was dismissed, I will read why later on.

@jods4
Copy link
Author

jods4 commented Feb 26, 2016

I read the other thread and @bbarry seemed to have made the argument that led to closing the issue.

I think not all arguments were convincing. Let me recap some of that thread objections.

First, keep in mind that we could choose to draw the line between what is optimized and what falls back to simpler codegen anywhere we want to. Most interpolations I see in code at work are basic: few parameters, no format, no alignment. Those could easily compile to specialized code.

(1) Thread #194 focuses on StringBuilder and dismissed string.Concat because "it allocates an array". Up to 4 string parts, string.Concat does not need an array.
For 5+ parts you need an array, it's true. Could be a compiler-generated thread-static string[] if you really wanted to but I don't think it's worth the complexity. It's about minimizing allocations, you can't bring them down to 0 for complex interpolations anyway.
Note that if you pool your StringBuffer, acquisition is not free and the final ToString() incurs an additional full result copy (because it's pooled).

(2) Alignment is perceived as a huge show stopper... Just add a helper method to align strings and we're done: RightAlign(x.ToString(format), 30).
This could be optimized further to avoid the additional intermediate string copy, if you really want to: var s = x.ToString(format); var pad = Pad(s.Length, 30); string.Concat("template ", pad, s, "template"). The pad helper could return predefined constants for small lengths.

(3) Boxing can't be removed most of the time. Argument is: when there's a format cast to IFormattable can't be avoided.
I think it could. 90% of the time, parameters of an interpolation either don't have a format or are of a value type. Now in the case of value types with a format: if they implement IFormattable publicly the compiler can take advantage of that. All BCL primitives do so, and most value types should, precisely because a public implementation avoids boxing. Changing a value type to add an explicit implementation of IFormattable when it previously was public would be a breaking change anyway.

I understand MS has finite resources and maybe there are more important problems..
In an ideal world, I see no technical reasons that common pieces of code can't be optimal, such as:

string s;
double x, y;
s = $"x: {x:N2}, y: {y:N2}";
// can become:
s = "x: " + x.ToString("N2") + ", y: " + y.ToString("N2");

string task;
Exception ex;
s = $"Failed {task} with error {ex.Message}";
// can become:
s = "Failed " + task + " with error " + ex.Message;

Such examples are very common and the generated code could be optimal.
If some people choose to write the longer version by hand because of perceived efficiency advantage, it's a fail 😢.

(Micro 😦) benchmark:
https://gist.github.com/mattwarren/d2be775eec5a7adba928

@gafter
Copy link
Member

gafter commented Feb 26, 2016

@jods4
$"{1}" is defined as a call to string.Format("{0}", 1). The spec for string.Format defines this as equivalent to string.Format(CultureInfo.CurrentCulture, "{0}", 1), which is defined as equivalent to

(CultureInfo.CurrentCulture as ICustomFormatter)?.Format(null, 1, CultureInfo.CurrentCulture) ?? (1 is IFormattable) ? (1 as IFormattable).ToString(CultureInfo.CurrentCulture) : 1.ToString()

Of course, since the compiler knows that 1 is of type int, which it can see implements IFormattable, it can simplify this to

(CultureInfo.CurrentCulture as ICustomFormatter)?.Format(null, 1, CultureInfo.CurrentCulture) ?? 1.ToString(CultureInfo.CurrentCulture)

Is that what you're hoping we'll do? I don't see how to get it any simpler than that.

Things are not much better for var greeting = $"Hello {name}":

var greeting = "Hello " + ((CultureInfo.CurrentCulture as ICustomFormatter)?.Format(null, name, CultureInfo.CurrentCulture) ?? name)

@jods4
Copy link
Author

jods4 commented Feb 26, 2016

@gafter I was unaware that all formatting operations look whether CurrentCulture is a ICustomFormatter before doing anything else. At least I learned something, not sure I'll ever use it though ;)

This is indeed a roadblock for generating efficient formatting code.

Idea: given that almost all code runs with the built-in CultureInfo and those do not implement ICustomFormatter the compiler could generate a quick check and bail out to string.Format. Maybe the IL equivalent of:

string result = CultureInfo.CurrentCulture is ICustomFormatter ?
  string.Format("Hello {0}", name) :
  "Hello " + name;

I admit that it's getting complicated 😞
But perf-wise it would be optimal code most of the time, with just an additional typecheck.

@leppie
Copy link
Contributor

leppie commented Feb 26, 2016

@jods4 A good example is using the currency formatter.

@jods4
Copy link
Author

jods4 commented Feb 26, 2016

@leppie can you elaborate? I'm curious and not sure what you refer to.

@leppie
Copy link
Contributor

leppie commented Feb 26, 2016

@jods4 $"{100.0:C2}". Dates, decimals also.

@jods4
Copy link
Author

jods4 commented Feb 26, 2016

Oh sure. I use those all the time (numbers and dates mostly).
And they can convert to (100.0).ToString("C2").
This works because double implicitly implements IFormattable and this doesn't require boxing.

The feature I didn't know about was that string.Format and co. will first look at CultureInfo.CurrentCulture to see if it implements ICustomFormatter. And I checked on my machine, your usual locale does not.

@leppie
Copy link
Contributor

leppie commented Feb 26, 2016

It could take a fast path for string args though if it does not do that already.

@jods4
Copy link
Author

jods4 commented Feb 26, 2016

@leppie
Amazingly @gafter wrote that according to the specs even strings without specified format have to go through ICustomFormatter if there is one.

I'm kind of wondering what a proper usage of that feature would be. I mean: if you want to override formatting for a class you don't have control of (so can't implement IFormattable), you can always pass your own IFormatProvider to the string.Format call. Creating your own CultureInfo to set an ambient override for all formatting calls may seem convenient but also very wrong at several levels.
Anyway, that's the specs so we have to live with it.

@leppie
Copy link
Contributor

leppie commented Feb 26, 2016

@jods4 is is pretty cheap check, so I guess it does not hurt much.

@leppie
Copy link
Contributor

leppie commented Feb 26, 2016

@jods4 Also such optimizations might even be performed by the JIT, or at least it could hint that way.

@alrz
Copy link
Contributor

alrz commented Feb 26, 2016

@leppie On the other hand, you'd have no idea there is a type check when you use string interpolation. kind of wtf.

@bbarry
Copy link

bbarry commented Feb 26, 2016

#6738

In the pull request I initially submitted for this I reached the conclusion that a few helper methods added to the BCL would ultimately save a few instructions worth of time but that this would require an effort across multiple projects both open and closed all to make the compiler more complex.

@jods4
Copy link
Author

jods4 commented Feb 26, 2016

@bbarry Interesting to see the PR as I now can see this is where most of the discussion about this issue has taken place.

Like you (judging by your comments), I feel sad that C# introduces clean, attractive new features that devs quickly adopt (and are sometimes suggested by refactoring tools such as Resharper) to later find out that in perf. critical code you should avoid them (e.g. the Kestrel case you pointed out). It makes me even more sad that the feature is not impossible to implement efficiently (look at Rust formatting) but of course hindsight is always 20/20.

I would agree that trying to generate specialized code for all cases is too complicated.

But taking advantage of edge cases, common code could be (near-)optimal.
By that I mean: (a) bail out at runtime if the current culture is ICustomFormatter; (b) use public IFormattable implementations of sealed classes (incl. all value types such as int, double, DateTime); (c) optimize string; (d) alignment could be inserted as a new String(' ', Math.Max(0, alignment - param.Length)), or maybe bail out at compile-time.

Undeniably this is more complexity in the compiler, but it's isolated (interpolation codegen only) and it would benefit all C# users.
In the end not sure if it's worth it but I see that the topic has been brought up several times before. Including this week by Nick Craver (from StackOverflow) on twitter.

@gafter
Copy link
Member

gafter commented Mar 31, 2016

This was previously considered in #6669, #4678, #194, #76.

@gafter gafter closed this as completed Mar 31, 2016
@gafter gafter added the Resolution-Duplicate The described behavior is tracked in another issue label Mar 31, 2016
@jods4
Copy link
Author

jods4 commented Mar 31, 2016

@gafter I didn't see all those duplicates! Maybe that's a sign that something should be done?

Optimizing common cheap cases at runtime is possible yet complicated.
Another approach could be to tweak the language itself?

Here's one idea:
In .NET BCL, all formatting functions accept a specific culture. E.g. usually when you format "technical" string you are supposed to pass InvariantCulture or OrdinalCulture rather than using the thread culture.
What if C# was extended in some way so that a specific CultureInfo can be passed to string interpolation? Or even just a flag to say: "use InvariantCulture please".
This would:

  1. bring more flexibility to string interpolation, and arguably some parity with string.Format & co.
  2. enable compile-time optimization in the codegen. Constants could be formatted at compile time and the rest could use the optimizations outlined in all those tickets...

I guess the complicated part will be to come up with a good syntax 😒

@ghost
Copy link

ghost commented Mar 29, 2018

Or even just a flag to say: "use InvariantCulture please".

@jods4, this was suggested in dotnet/csharplang#177 using $i syntax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Feature Request Resolution-Duplicate The described behavior is tracked in another issue
Projects
None yet
Development

No branches or pull requests

6 participants