System.Text.Formatting

Tore Lervik edited this page Mar 30, 2017 · 5 revisions

System.Text.Formatting APIs are similar to the existing StringBuilder and TextWriter APIs. They are designed to format values into text streams and to build complex strings. But these APIs are optimized for creating text for the Web. They do formatting with minimum GC heap allocations (1/6 of allocations in some scenarios) and can format directly to UTF8 streams. This can result in significant performance wins for software that does a lot of text manipulation.

Examples

Hello World!

var formatter = new StringFormatter();
formatter.Append(100); // Int32.ToString() is not called here, or ever
string text = formatter.ToString();

Hello Web!

Stream stream = new MemoryStream(256);
var writer = new StreamFormatter(stream, FormattingData.InvariantUtf8);
writer.Append(100); // this writes UTF8 to the stream without creating UTF16 first

This allocates 2MB

int numbersToWrite = 100000;

var sb = new StringBuilder(numbersToWrite);
for (int i = 0; i < numbersToWrite; i++) {
    sb.Append(i % 10);
}
var text = sb.ToString();

This allocates 400KB

int numbersToWrite = 100000;

var sb = new StringFormatter(numbersToWrite);
for (int i = 0; i < numbersToWrite; i++) {
    sb.Append(i % 10);
}
var text = sb.ToString();

This allocates 100KB

int numbersToWrite = 100000;
Stream stream = new MemoryStream(numbersToWrite); // this does the 100KB allocation
var sb = new StreamFormatter(stream, FormattingData.InvariantUtf8);
for (int i = 0; i < numbersToWrite; i++) {
    sb.Append(i % 10);
}

How is that achieved?

In current .NET formatting, StringBuilder (and TextWriter) would call value.ToString(), on the argument passed to Append, this would allocate a string. The characters of this newly allocated string would be then copied to the internal buffer of the StringBuilder. In case of StringFormatter, the value is formatted directly into StringFormatter's buffer using a method similar to the following:

public interface IBufferFormattable {
    /// <summary>
    /// This interface should be implemented by types that want to support allocation-free formatting.
    /// </summary>
    /// <param name="buffer">The buffer to format the value into</param>
    /// <param name="format">This is a pre-parsed representation of the formatting string.</param>
    /// <param name="formattingData">Provides bytes representing digits and symbols.</param>
    /// <param name="written">Return the number of bytes that were written to the buffer</param>
    /// <returns>False if the buffer was to small, otherwise true.</returns>
    bool TryFormat(Span<byte> buffer, Format.Parsed format, FormattingData formattingData, out int written);
}

And then such methods (for all formattable types) would be called by StringFormatter as follows:

public void Append<TFormatter, T>(T value, Format.Parsed format) where T : IBufferFormattable
{
    int bytesWritten;
    while (!value.TryFormat(formatter.FreeBuffer, format, this.FormattingData, out bytesWritten)) {
        this.ResizeBuffer();
        bytesWritten = 0;
    }
    this.CommitBytes(bytesWritten);
}

What's the point?

Today's web traffic is largely text. Moreover, such text payloads are more often than not UTF8. Web server application parse and format a lot of such text, and this library is an experiment how we can make such text operations faster and cheaper. See https://github.com/dotnet/corefxlab/blob/master/src/System.Text.Formatting/tests/Non-AllocatingJson.cs for a glimpse of scenarios we think this library could make more efficient.

Caveats

This library is a very early prototype. It's not complete, has bugs, and performance problems that need to be fixed. Please don't try to use it in real world software.

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.