Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why isn't the new System.FormattableString a struct? #518

Closed
justinvp opened this issue Feb 15, 2015 · 8 comments
Closed

Why isn't the new System.FormattableString a struct? #518

justinvp opened this issue Feb 15, 2015 · 8 comments
Labels
Area-Language Design Language-C# Language-VB Question Resolution-Answered The question has been answered Tenet-Performance Regression in measured performance of the product from goals. Verification Not Required
Milestone

Comments

@justinvp
Copy link

Why isn't the new System.FormattableString type a struct?

It's currently a class. This means that formatting a string with the invariant culture using the new string interpolation language feature and the built-in FormattableString.Invariant method results in an unnecessary allocation of the System.FormattableString class.

For example:

float latitude = 47.644961f;
float longitude = -122.130690f;
FormattableString.Invariant($"{{ lat = {latitude}; lon = {longitude} }}");

Compiles to:

float latitude = 47.644961f;
float longitude = -122.130690f;
FormattableString.Invariant(FormattableStringFactory.Create("{{ lat = {0}; lon = {1} }}", new object[] { latitude, longitude }));

The call to System.Runtime.CompilerServices.FormattableStringFactory.Create allocates a new instance of the FormattableString class.

(Not to mention the object[] array and boxing allocations; I'll get back to those.)

In the comments of a related issue (#506), I asked why FormattableString isn't a struct.

@gafter replied:

We wanted to allow specialized subclasses that store, for example, 1, 2, or 3 arguments without allocating an array. That is not done in the current version but we expect to do that.

The compiler uses a class called FormattableStringFactory to create the instances using a static factory method Create which is selected by overload resolution.

In such specialized subclasses, you would be able to avoid the array allocation, but in exchange you have to allocate a new FormattableString subclass.

I would have liked to see something like the following:

namespace System
{
    public struct FormattableString : IEquatable<FormattableString>, IFormattable
    {
        private readonly string _format;
        private readonly object[] _arguments;

        public FormattableString(string format, params object[] arguments)
        {
            if (format == null)
                throw new ArgumentNullException("format");
            if (arguments == null)
                throw new ArgumentNullException("arguments");

            _format = format;
            _arguments = arguments;
        }

        public string Format
        {
            get { return _format; }
        }

        public object[] Arguments
        {
            get { return _arguments; }
        }

        public override bool Equals(object obj)
        {
            return obj is FormattableString && Equals((FormattableString)obj);
        }

        public bool Equals(FormattableString other)
        {
            return _format == other._format && _arguments == other._arguments;
        }

        public static bool operator ==(FormattableString left, FormattableString right)
        {
            return left.Equals(right);
        }

        public static bool operator !=(FormattableString left, FormattableString right)
        {
            return !left.Equals(right);
        }

        public override int GetHashCode()
        {
            return (null == _format ? 0 : _format.GetHashCode())
                 ^ (null == _arguments ? 0 : _arguments.GetHashCode());
        }

        public override string ToString()
        {
            return string.Format(_format, _arguments);
        }

        public string ToString(IFormatProvider formatProvider)
        {
            return string.Format(formatProvider, _format, _arguments);
        }

        string IFormattable.ToString(string ignored, IFormatProvider formatProvider)
        {
            return string.Format(formatProvider, _format, _arguments);
        }

        public static string Invariant(FormattableString formattable)
        {
            return formattable.ToString(CultureInfo.InvariantCulture);
        }
    }
}

With this, FormattableString.Invariant can be used without allocating FormattableString on the heap.

The implementation of System.Runtime.CompilerServices.FormattableStringFactory becomes the following (which isn't all that useful now):

namespace System.Runtime.CompilerServices
{
    public static class FormattableStringFactory
    {
        public static FormattableString Create(string format, params object[] arguments)
        {
            return new FormattableString(format, arguments);
        }
    }
}

Instead of relying on System.Runtime.CompilerServices.FormattableStringFactory.Create overloads to construct specialized instances, why not allow any type that implements IFormattable and do overload resolution on the type's constructors? Along the lines of how the collection initializer feature works (where a type must implement IEnumerable and have a public Add method), this would work with any type that implements IFormattable and has a constructor that takes a string format argument followed by one or more arguments.

To avoid the array and boxing allocations, just like we have Action, Action<T>, Action<T1, T2>, Action<T1, T2, T3>, etc., there could be a few built-in generic struct types (e.g. FormattableString<T>, FormattableString<T1, T2>, and FormattableString<T1, T2, T3>) along with similar generic overloads to String.Format, as an optimization for common number of arguments.

Such a design has other benefits.

For example, logging libraries such as NLog use the following pattern:

public class Logger
{
    public void Info<TArgument>(string message, TArgument argument)
    { 
        if (this.IsInfoEnabled)
        {
            this.Write(LogLevel.Info, string.Format(message, new object[] { argument }));
        }
    }

    public void Info<TArgument1, TArgument2>(string message, TArgument1 argument1, TArgument2 argument2)
    { 
        if (this.IsInfoEnabled)
        {
            this.Write(LogLevel.Info, string.Format(message, new object[] { argument1, argument2 }));
        }
    }

    public void Info<TArgument1, TArgument2, TArgument3>(string message, TArgument1 argument1, TArgument2 argument2, TArgument3 argument3)
    { 
        if (this.IsInfoEnabled)
        {
            this.Write(LogLevel.Info, string.Format(message, new object[] { argument1, argument2, argument3 }));
        }
    }
}

Usage:

long id = 1;
int foo = 500;
_logger.Info("Id: {0}. Something happened: {1}", id, foo);

The whole purpose of these methods is to avoid the array, boxing, and string.Format allocations when the Info log level isn't enabled, making the logging functionality "pay-for-play".

With the existing design of FormattableString as a class, you couldn't really leverage the new string interpolation feature with allocation-conscious logging APIs like this.

However, with the suggested design change, you would be able to implement this and avoid the allocations (the array and boxing allocations could be eliminated altogether):

public class Logger
{
    public void Info<TArgument>(FormattableString<TArgument> formattable)
    { 
        if (this.IsInfoEnabled)
        {
            this.WriteToTargets(LogLevel.Info, formattable.ToString());
        }
    }

    public void Info<TArgument1, TArgument2>(FormattableString<TArgument1, TArgument2> formattable)
    { 
        if (this.IsInfoEnabled)
        {
            this.WriteToTargets(LogLevel.Info, formattable.ToString());
        }
    }

    public void Info<TArgument1, TArgument2, TArgument3>(FormattableString<TArgument1, TArgument2, TArgument3> formattable)
    { 
        if (this.IsInfoEnabled)
        {
            this.WriteToTargets(LogLevel.Info, formattable.ToString());
        }
    }
}

Which would allow string interpolation feature to be used without sacrificing any of the optimizations to avoid allocations.

long id = 1;
int foo = 500;
_logger.Info($"Id: {id}. Something happened: {foo}");

I really hope its not too late to consider a design tweak here, otherwise we have to live with FormattableString as a class in the System namespace and FormattableString.Invariant helper as-is forever.

I'd be happy to help with any and all changes to roslyn and coreclr (implementation, tests, documentation, etc.) to make something like this happen.

/cc @gafter, @stephentoub, @KrzysztofCwalina, @terrajobst

@gafter
Copy link
Member

gafter commented Feb 15, 2015

The current design was intended to allow specialization of the factories and subclasses of FormattableString, by making a generic versions of the factory accepting 1, 2, and 3 arguments but without the conceptual surface area of additional API elements of your proposal.

@justinvp
Copy link
Author

It's really unfortunate that the current design always comes with the cost of an object allocation, just to minimize the API surface area.

This is difficult to workaround with the current design. Developers would have to include their own custom implementation of System.Runtime.CompilerServices.FormattableStringFactory inside their own codebase (making sure the FormattableStringFactory referenced from mscorlib/System.Runtime isn't used by the compiler). There's no simple way to create an API that accepts interpolated strings without the allocation (such as in the Logger scenario).

With my proposal, you wouldn't have to include the factory at all because it would do overload resolution on the IFormattable type's constructor. It would satisfy the Invariant scenario without the extra allocation (i.e. invariant on par with current culture). It would enable the Logger scenario.

You could choose to only ship the non-generic FormattableString struct built-in.

But are we really that concerned about the added surface area of FormattableString<T>, FormattableString<T1, T2> and FormattableString<T1, T2, T3>? Seems like a very small price to pay in terms of API surface area, especially since these "collapse" into a single type conceptually like Action and Func. These could even be nested types inside the non-generic FormattableString struct.

@gafter gafter added Question Tenet-Performance Regression in measured performance of the product from goals. labels Feb 15, 2015
@gafter
Copy link
Member

gafter commented Feb 15, 2015

@justinvp The shape of the language under your proposal would be very strange. We would have to say that the allowable target types of a conversion from an interpolated string to an instance of the type FormattableString depends on the details of the interpolations... and the caller's code would have to change (that target type) when the interpolations change (i.e. added, deleted, change type, etc). I'd say no, that is not a price we'd be willing to pay in the shape of the language.

I suggest you talk with @AlexGhiondea who drove the tradeoffs resulting in the shape of the API in its current form.

@gafter gafter closed this as completed Feb 15, 2015
@gafter gafter added the Resolution-Answered The question has been answered label Feb 15, 2015
@gafter gafter added this to the 1.0 (stable) milestone Feb 15, 2015
@gafter
Copy link
Member

gafter commented Feb 15, 2015

/cc @jaredpar @nguerrera @ericstj

CC'ing to those who signed off on the shape of the API; you may be interested in this discussion.

@omariom
Copy link

omariom commented Apr 11, 2017

It could be:

public struct FormattableString<T1,T2,T3> // Up to 7?
{
    public FormattableString(string messageFormat, (T1,T2,T3) args)
    {
        MessageFormat = messageFormat;
        Args = args;
    }

    public string MessageFormat { get; }

    public (T1, T2, T3) Args { get; }
}

@ericstj
Copy link
Member

ericstj commented Apr 11, 2017

API was approved in 2014 and its already shipped in numerous frameworks, including .NET Desktop 4.6. I don't think you can change it compatibly. You could add a new API and have the compiler use that if present.

@omariom
Copy link

omariom commented Apr 11, 2017

So sad.. :)

@ericstj
Copy link
Member

ericstj commented Apr 11, 2017

I'm not saying something can't be done. I'm just saying that compatibility has to be a design point in any changes made given that this is an already released API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Language Design Language-C# Language-VB Question Resolution-Answered The question has been answered Tenet-Performance Regression in measured performance of the product from goals. Verification Not Required
Projects
None yet
Development

No branches or pull requests

6 participants