Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make string.FastAllocateString public #36989

Closed
vanbukin opened this issue May 25, 2020 · 20 comments
Closed

Make string.FastAllocateString public #36989

vanbukin opened this issue May 25, 2020 · 20 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime

Comments

@vanbukin
Copy link
Contributor

Background and Motivation

Provide ability for developers to write libraries that able to format something to string using pre-allocated pinned char* pointer withoput any overhead for delegate call.
It's also possible now to "export" that function using .ilproj like this, but it works only with undocumented IgnoresAccessChecksToAttribute

Proposed API

namespace System
{
-    internal static extern string FastAllocateString(int length);
+    public static extern string FastAllocateString(int length);
}

Usage Examples

Preallocate known-size string, get pinned char* pointer and overwrite string contents like here or here

public string ToString(string? format, IFormatProvider? formatProvider)
{
   // ...
   public string ToString(string? format, IFormatProvider? formatProvider)
   {
       // ...
        var uuidString = string.FastAllocateString(32);
        fixed (char* uuidChars = &uuidString.GetPinnableReference())
        {
            FormatN(uuidChars);
        }
        return uuidString;
       // ...
   }
   // ...
}

Alternative Designs

Provide separate external library (do not ship it with System.Private.CoreLib), that expose that method with public modifier or add string.Create overload without delegate that can only allocate and return string.

Risks

That API should be supported in a future versions of .NET

@vanbukin vanbukin added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label May 25, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Runtime untriaged New issue has not been triaged by the area owner labels May 25, 2020
@stephentoub
Copy link
Member

stephentoub commented May 25, 2020

Making this public would encourage mutating an immutable type. It's internal for a good reason. I would not want to make this public. string.Create was exposed an alternative.

@vanbukin
Copy link
Contributor Author

@stephentoub Maybe provide that ability via string.Create without delegate?

@svick
Copy link
Contributor

svick commented May 25, 2020

@vanbukin How would that work?

@tannergooding
Copy link
Member

There could be an unsafe version which takes a function pointer, which would allow the delegate like scenario without forcing a delegate allocation

@GrabYourPitchforks
Copy link
Member

@vanbukin
Per Steve's comment, we're not going to expose an API which encourages mutating string instances. In fact, in .NET Core 3.0 we started obsoleting APIs which encouraged this practice.

Adding overloads to the string.Create factory method may work. What do you have in mind?

@vanbukin
Copy link
Contributor Author

vanbukin commented May 25, 2020

@svick It could be like this:
I call string.Create(32), pin it using fixed statement, get char* pointer and then change string content using that pointer inside fixed statement. Like System.Guid.ToString.

Here and now it’s only possible inside System.Private.CoreLib, because FastAllocateString is internal. Prefer way to do that now is to call string.Create, provide delegate, fake “state” and get span pointer inside delegate. But I don’t want to get overhead on delegate creation and call, that’s why I ask to make FastAllocateString public.

@vanbukin
Copy link
Contributor Author

@tannergooding sounds like a good idea

@stephentoub
Copy link
Member

There could be an unsafe version which takes a function pointer, which would allow the delegate like scenario without forcing a delegate allocation

I seem to remember @jkotas having an opinion about that, but I don't remember what it was.

But I don’t want to get overhead on delegate creation and call

Can you share more details about in what situation this matters? A delegate can be cached (and often is by the C# compiler), so the concern is about the cost of delegate invocation? A string is being allocated and populated; the delegate invocation shows up as being negatively impactful?

@svick
Copy link
Contributor

svick commented May 25, 2020

@tannergooding

There could be an unsafe version which takes a function pointer, which would allow the delegate like scenario without forcing a delegate allocation

Since string.Create has a state parameter, it should be just one delegate allocation per callsite for the whole application runtime. Is that worth optimizing?

@vanbukin

I call string.Create(32), pin it using fixed statement, get char* pointer and then change string content using that pointer inside fixed statement.

Oh, I assumed you meant something else, not just renaming string.FastAllocateString to string.Create.

Prefer way to do that now is to call string.Create, provide delegate, fake “state” and get span pointer inside delegate. But I don’t want to get overhead on delegate creation and call, that’s why I ask to make FastAllocateString public.

The delegate allocation overhead is tiny, since it's just one allocation for the application runtime, if you use string.Create correctly. And the overhead of delegate call in the case of Guid formatting seems to be ~10 %.

@vanbukin
Copy link
Contributor Author

vanbukin commented May 25, 2020

Since string.Create has a state parameter, it should be just one delegate allocation per callsite for the whole application runtime. Is that worth optimizing?

I think yes.

Oh, I assumed you meant something else, not just renaming string.FastAllocateString to string.Create.

Not renaming. Make FastAllocateString public or provide additional string.Create overload that does not take any state or delegrate. Only string size. To avoid any allocations.

The delegate allocation overhead is tiny

Yes, but that might be helpful in performance-sensitive places. Like custom structures string formatting or database providers strings materialization. And that story is not only about possible delegate allocation, but also about delegate calling. It's not free.

@vanbukin
Copy link
Contributor Author

@svick Here is my results. Looks like 30%.

@vanbukin
Copy link
Contributor Author

@svick I've updated results. Add benchmarks for .NET 5.0 preview 4. Same difference.

@GrabYourPitchforks
Copy link
Member

Per earlier comments, string instances are contractually immutable. The team is largely against adding APIs which encourage developers to mutate these instances. Such code is not guaranteed to work correctly across all versions of the runtime. I don't think benchmarks are going to do much to sway the majority position here.

There might be opportunity here to improve the performance of the existing string.Create method. Or, per Tanner's earlier suggestion, offer unsafe overloads which use function pointers + calli rather than delegate dispatch. Or any overloads which might be useful for your scenario.

But to repeat, any API which promotes the idea of returning a string instance to the caller and saying "sure, go ahead and pin it and mutate its contents" is anathema to our team.

@vanbukin
Copy link
Contributor Author

@GrabYourPitchforks Ok. Tanner's suggestion looks good. Let's continue our discussion in a way to add string.Create overload that takes a pointer instead of opening FastAllocateString. Can we continue our discussion here or I need to create a new issue?

@GrabYourPitchforks
Copy link
Member

Feel free to explore that idea here.

@jkotas
Copy link
Member

jkotas commented May 25, 2020

There could be an unsafe version which takes a function pointer, which would allow the delegate like scenario without forcing a delegate allocation

I seem to remember @jkotas having an opinion about that, but I don't remember what it was.

The difference between delegate call and function pointer calli call is a few instruction. It is unlikely to make significant difference to add an overload that takes function pointer.

I believe that most of the overhead of passing in delegate comes from extra call(s): setting up the frame, passing the arguments around, saving callee saved registers, ... . This overhead needs to be eliminated to get on par with calling FastAllocateString directly. Teaching the JIT to inline the delegates would be the most natural way to do it. I know @AndyAyersMS looked into it, but it is not easy.

@SingleAccretion
Copy link
Contributor

SingleAccretion commented May 26, 2020

add string.Create overload that takes a pointer instead of opening FastAllocateString

I investigated the possibility of adding a new API that would utilize the old "generic struct" trick. The conclusion is that there is room for improvement, but not a whole lot.

The setup

The goal was to see if the following API could achieve performance of the fully "unsafe" method that uses FastAllocateString directly:

static string GenericStringCreate<T>(int length, T initializer) where T : IStringInitialzer

interface IStringInitializer
{
    void Initialize(Span<char> span);
}

The testing consisted of measuring the time it would take to pass parameters to a dummy function with this signature:

[MethodImpl(MethodImplOptions.NoInlining)]
static unsafe void Payload(char* dest, Uuid* uuidPtr) { }

unsafe struct Uuid { fixed byte Bytes[16]; }

Tested "means of delivery" (optimized for optimal performance):

public const int Length = 1; //To exacerbate the differences

[MethodImpl(MethodImplOptions.AggressiveInlining)]
static unsafe string StringifyWithUnsafe(Uuid uuid)
{
    var result = CoreLib.FastAllocateString(Length);
    fixed (char* ptr = &result.GetPinnableReference())
    {
        Payload(ptr, &uuid);
    }

    return result;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
static unsafe string StringifyWithGenericStringCreate(Uuid uuid)
{
    return StringHelpers.GenericStringCreate(Length, new UuuidStringInitialzer(&uuid));
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
static unsafe string StringifyWithStringCreate(Uuid uuid)
{
    var uuidString = string.Create(Length, uuid, (s, u) =>
    {
        fixed (char* ptr = &MemoryMarshal.GetReference(s))
        {
            Payload(ptr, &u);
        }
    });

    return uuidString;
}

readonly unsafe struct UuuidStringInitialzer : IStringInitialzer
{
    private readonly Uuid* _uuidPtr;

    public UuuidStringInitialzer(Uuid* uuidPtr) => _uuidPtr = uuidPtr;

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public void Initialize(Span<char> span)
    {
        fixed (char* ptr = &MemoryMarshal.GetReference(span))
        {
            Payload(ptr, _uuidPtr);
        }
    }
}

Where StringHelpers.GenericStringCreate is the implementation mirroring current string.Create's one, but with an initializer structure instead of a delegate.

static unsafe string GenericStringCreate<T>(int length, T initializer) where T : IStringInitialzer
{
    if (length <= 0)
    {
        if (length == 0)
        {
            return string.Empty;
        }

        throw new ArgumentOutOfRangeException(nameof(length));
    }
    
    string result = CoreLib.FastAllocateString(length);
    initializer.Initialize(MemoryMarshal.CreateSpan(ref CoreLib.GetRawStringData(result), length));

    return result;
}

The benchmarks

public class Tests
{
    public static readonly Uuid DefaultUuid = default;

    [Benchmark(Baseline = true)]
    [MethodImpl(MethodImplOptions.NoInlining)]
    public unsafe string UnsafeOverhead()
    {
        return UuidManipulation.StringifyWithUnsafe(DefaultUuid);
    }

    [Benchmark]
    [MethodImpl(MethodImplOptions.NoInlining)]
    public unsafe string StringGenericCreateOverhead()
    {
        return UuidManipulation.StringifyWithGenericStringCreate(DefaultUuid);
    }

    [Benchmark]
    [MethodImpl(MethodImplOptions.NoInlining)]
    public unsafe string StringCreateOverhead()
    {
        return UuidManipulation.StringifyWithStringCreate(DefaultUuid);
    }
}

Typical results

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-4820K CPU 3.70GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.300
[Host] : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT
DefaultJob : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT

Method Mean Error StdDev Ratio RatioSD
UnsafeOverhead 7.635 ns 0.1612 ns 0.1429 ns 1.00 0.00
StringGenericCreateOverhead 10.183 ns 0.2914 ns 0.4086 ns 1.30 0.04
StringCreateOverhead 15.050 ns 0.3796 ns 0.3728 ns 1.98 0.05

I rerun them multiple times with random statics added to see if layout could influence the timings, but the results do seem to be consistent (or at least not deviating by more than 1 ns with each run). On average, the unsafe version was ahead by ~2 ns, and vanilla string.Create was behind by ~4.5 ns. While this is a somewhat significant improvement, I personally do not think it warrants adding a new API, especially since it still cannot achieve the performance of "the best" code. It is very much possible I missed something but after ~8 hours of trying different things out this is the outcome.

@TYoung86
Copy link

TYoung86 commented May 31, 2020

Try new string('\0', n) instead of CoreLib.FastAllocateString(Length)?

Also, will the gc move allocated strings if they're not pinned? e.g.

    public static string ToHexString(this ReadOnlySpan<byte> bytes) {
      var newStr = new string('\0', bytes.Length * 2);
      ref var firstCh = ref Unsafe.AsRef(newStr.GetPinnableReference());
      for (var i = 0; i < bytes.Length; ++i) {
        var b = bytes[i];
        var nib1 = b >> 4;
        var isDig1 = (nib1 - 10) >> 31;
        var ch1 = 55 + nib1 + (isDig1 & -7);
        var nib2 = b & 0xF;
        var isDig2 = (nib2 - 10) >> 31;
        var ch2 = 55 + nib2 + (isDig2 & -7);
        Unsafe.As<char, int>(ref Unsafe.Add(ref firstCh, i * 2))
          = (ch2 << 16) | ch1;
      }

      return newStr;
    }

Is there a chance for the string to move during this on a large run? (Given string.Create doesn't appear to pin)

Given this string ctor is basically just the result of FastAllocateString handed out almost directly, and GetPinnableReference is essentially readonly GetRawStringData;

        private string Ctor(char c, int count)
        {
            if (count <= 0)
            {
                if (count == 0)
                    return Empty;
                throw new ArgumentOutOfRangeException(nameof(count), SR.ArgumentOutOfRange_NegativeCount);
            }

            string result = FastAllocateString(count);

            if (c != '\0') // Fast path null char string
            {
                // ... snipped ...
            }
            return result;
        }

        public ref readonly char GetPinnableReference() => ref _firstChar;

        internal ref char GetRawStringData() => ref _firstChar;

@SingleAccretion
Copy link
Contributor

Try new string('\0', n) instead of CoreLib.FastAllocateString(Length)?

Using the constructor seems to be slower:

[Benchmark(Baseline = true)]
public string CreateWithFastAllocate() => CoreLib.FastAllocateString(32);

[Benchmark]
public string CreateWithNullCharacters() => new string('\0', 32);
Method Mean Error StdDev Ratio RatioSD
CreateWithFastAllocate 6.838 ns 0.2152 ns 0.2013 ns 1.00 0.00
CreateWithNullCharacters 11.600 ns 0.3240 ns 0.3857 ns 1.70 0.06

Also, will the gc move allocated strings if they're not pinned?

I believe no, since refs are "managed" pointers, so the GC is aware of their existence. However, in your example, you are modifying the reference variable, thus changing its identity, and I do not know if that makes it not "tracked" anymore. It would be reasonable to expect that that answer is "yes, it is not safe to do this", but I am not knowledgeable enough to definitively say so.

The reason that I went with the pointer-based Payload and not refs is that there is no way to do ref fields, thus I will always have to copy the Uuid parameter at least once. That said, this decision might be worth revisiting.

@GrabYourPitchforks GrabYourPitchforks removed the untriaged New issue has not been triaged by the area owner label May 31, 2020
@GrabYourPitchforks
Copy link
Member

I'm closing this issue because the API questions have been answered and because the discussion has veered into encouraging dangerous coding practices. But once more, just to make this painfully clear:

It is never safe or supported to mutate the contents of a returned string instance.

Doesn't matter whether the API is new string('\0', ...) or string.FastAllocateString. As soon as the caller gets an object of type string, the contents of that object must remain immutable. I see no world in which the .NET team will ever change its stance on this matter.

If you mutate a string instance within your own library or application, you are entering unsupported territory. A future framework update could break you. Or - more likely - you'll encounter memory corruption that will be very painful for you or your customers to diagnose.

@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests

9 participants