Productize Utf8Parser and Utf8Formatter #25078

ghost · 2017-11-06T15:55:07Z

Fixes https://github.com/dotnet/corefx/issues/24607

Remaining debt (cut for time):

Parsing Intgers with the "N" format

https://github.com/dotnet/corefx/issues/24986

Some questions to be resolved as to whether to be compatible
(BCL doesn't care where you put the commas) or correct.

Format of floating point is still a wrapper hack

https://github.com/dotnet/corefx/issues/25077

The portable DoubleToNumber() code was never ported
to C# (though the big block comment advertising it
was).

KrzysztofCwalina · 2017-11-06T17:12:58Z

src/System.Memory/ref/System.Memory.cs

@@ -281,6 +281,66 @@ public static class BinaryPrimitives

 namespace System.Buffers.Text
 {
+    public struct StandardFormat : IEquatable<StandardFormat>


I think we talked about this type being in System.Buffers namespace. I know that SignalR team wanted to use it as a format specifier for non-textual output formats.

KrzysztofCwalina · 2017-11-06T17:14:28Z

src/System.Memory/ref/System.Memory.cs

+    }
+}
+
+namespace System.Buffers.Text


Why open the same namespace as above?

Merge artifact.

justinvp · 2017-11-06T17:15:45Z

src/System.Memory/src/System/Buffers/Text/StandardFormat.cs

+    /// Represents a standard formatting string without using an actual String. A StandardFormat consists of a character (such as 'G', 'D' or 'X')
+    /// and an optional precision ranging from 0..99, or the special value NoPrecision.
+    /// </summary>
+    public struct StandardFormat : IEquatable<StandardFormat>


Applicable structs were recently marked readonly throughout the repo (#24997). Looks like this could be readonly as well.

KrzysztofCwalina · 2017-11-06T17:16:23Z

src/System.Memory/src/System/Buffers/Text/StandardFormat.cs

+        /// <summary>
+        /// Precision values for format that don't use a precision, or for when the precision is to be unspecified.
+        /// </summary>
+        public const byte MaxPrecision = 99;


Do we really want this to be a const? What if we want to change that in the future? i.e. should be be simply a readonly field?

Maybe it doesn't need to exist at all? We usually publish argument range limits in documentation rather than as IL metadata.

KrzysztofCwalina · 2017-11-06T17:16:46Z

src/System.Memory/src/System/Buffers/Text/StandardFormat.cs

+    /// Represents a standard formatting string without using an actual String. A StandardFormat consists of a character (such as 'G', 'D' or 'X')
+    /// and an optional precision ranging from 0..99, or the special value NoPrecision.
+    /// </summary>
+    public struct StandardFormat : IEquatable<StandardFormat>


Might be worth making it a readonly struct

justinvp · 2017-11-06T17:16:52Z

src/System.Memory/src/System/Buffers/Text/Utf8Constants.cs

+        public const byte Space = (byte)' ';
+        public const byte Hyphen = (byte)'-';
+
+        public const byte Seperator = (byte)',';


Seperator => Separator

KrzysztofCwalina · 2017-11-06T17:19:30Z

src/System.Memory/src/System/Buffers/Text/StandardFormat.cs

+        /// </summary>
+        public static StandardFormat Parse(ReadOnlySpan<char> format)
+        {
+            return Parse(new string(format.ToArray()));


Hmm, this seems backwards. I think we should implement this method without allocations and then call it from the Parse(string) overload.

Oh yeah, that was some leftover debt too - it was using Utf16Parser which we weren't productizing and uint.TryParse(ReadOnlySpan) isn't available in NetStandard 1.1.

KrzysztofCwalina · 2017-11-06T17:29:53Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/FormattingHelpers.cs

+        /// We don't have access to Math.DivRem, so this is a copy of the implementation.
+        /// </summary>
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        public static long DivMod(long numerator, long denominator, out long modulo)


last time I looked, our JIT was generating two idiv instructions for this. @AndyAyersMS, it would be great if we could fix it.

cc @AndyAyersMS

Agree we should fix; but there are challenges. See dotnet/coreclr#757 for some notes.

cc: @tannergooding, @CarolEidt

KrzysztofCwalina · 2017-11-06T17:31:47Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/FormattingHelpers.cs

+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        public static int CountDigits(long n)
+        {
+            if (n == 0)


Should this be if (n < 10)

It was probably meant to handle the special case of "0" - probably ok to let it cover the "1"-"9" case too.

Actually, "n" is signed so it n < 10 isn't the right test.

Ah, < > double check is probably not worth?

why not unchecked((ulong)n) < 10, which should be the equivalent of (n >= 0 && n < 10), but only a single comparison

It's not the equivalent - it would need to be Abs(n) for it to do the check we need.

ah, I didn't realize the dual check was n > -10 && n < 10 (which would be: Abs(n) < 10), not n >= 0 && n < 10 (which would be: unchecked((ulong)n) < 10)

KrzysztofCwalina · 2017-11-06T17:36:21Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Date.G.cs

+        //    05/25/2017 10:30:15 -08:00
+        //
+        private static bool TryFormatDateTimeG(DateTime value, TimeSpan offset, Span<byte> buffer, out int bytesWritten)
+        {


It might be worth Debug.Asserting that the format is indeed 'G'

We don't have the format here...

Ah, the github source viewer long line clipping bites again :-)

KrzysztofCwalina · 2017-11-06T17:53:49Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Float.cs

+            if (format.Precision == StandardFormat.NoPrecision)
+                formatString = format.Symbol.ToString();
+            else
+                formatString = format.Symbol.ToString() + format.Precision;


Can we allocate string of length 1 + numberOfDigits, pin it, set pStr[0] = symbol, and then format the precision into it? Also, could we optimize for the case of the default format?

Didn't think it was kosher to mutate a string outside of CoreLib.

I think it's fine as long as we implement it in one place: StandarfFormat.ToString(). I think it will be quite common for people to want to do the same thing, i.e. convert StandardFormat to string representation,

Really? I thought conversion between format strings and StandardFormat wouldn't be done by anyone who actually cares about performance. I'd see ToString() as a debugging aid.

CI is on the floor right now but I'm pretty sureString.Copy() was only introduced in NetStandard 2.0 and I've been bitten before by the "there's no way the tooling will intern this" only to have it happen.

When building for .NET Core, we can do the more efficient thing, using the new String.Create method, etc. But even without that, this is only a few chars in length, right? Why not just use temporary stack space to build up the chars and then create a string from that?

KrzysztofCwalina · 2017-11-06T17:56:09Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Float.cs

+
+            for (int i = 0; i < length; i++)
+            {
+                buffer[i] = (byte)(utf16Text[i]);


Would it make sense to Debug.Assert that the chars are ASCII?

KrzysztofCwalina · 2017-11-06T18:06:31Z

src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Date.R.cs

+                uint dowString = (dow0 << 24) | (dow1 << 16) | (dow2 << 8) | comma;
+                switch (dowString)
+                {
+                    case 0x53554E2c: dayOfWeek = DayOfWeek.Sunday; break;


Might be worth adding a comment explaining what these magic numbers are.

KrzysztofCwalina · 2017-11-06T18:07:50Z

src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Date.cs

+            {
+                case 'R':
+                case 'l':
+                    return TryParseDateTimeOffsetR(text, out value, out bytesConsumed);


Are we ok with passing in 'R' and then successfully parsing lower cased date time?

We've generally been "case-insensitive" for parsing but given that this is a RFC wire format, maybe an exception is warranted. Waiting to see what others have to say on this.

I've made them 'strict case' for now.

KrzysztofCwalina · 2017-11-06T18:17:35Z

src/System.Memory/tests/Performance/Perf.Utf8Formatter.cs

+        [InlineData(12837467L)] // standard format
+        [InlineData(-9223372036854775808L)] // min value
+        [InlineData(9223372036854775807L)] // max value
+        private static void ParserInt64(long value)


How do the results compare to UTF16 routines in mscorlib?

Improvement

44% // 12837467L
-26% // long.MinValue
13% // long.MaxValue

("long.MinValue" has a special path just for it in FormatInt64D() that delegates to the unsigned formatter - looks like this is an outlier case. 'Course, if we really think people will benchmark that a lot, it is easy to write a super-fast path when it's only for one specific value...)

ahsonkhan · 2017-11-06T20:32:52Z

src/System.Memory/Common/src/System/MutableDecimal.cs

+            set { Flags = (Flags & ~ScaleMask) | ((uint)value << ScaleShift); }
+        }
+
+        // Sign mask for the flags field. A value of zero in this bit indicates a


A value of zero in this bit indicates

Which bit exactly?

0x80000000. That's the assigned value of SignMask.

ahsonkhan · 2017-11-06T20:42:55Z

src/System.Memory/src/System.Memory.csproj

+  <ItemGroup>
+    <!-- Common or Common-branched source files -->
+    <Compile Include="$(CommonPath)\System\NotImplemented.cs">
+      <Link>Common\System\NotImplemented.cs</Link>


Why can't we specify the complete path here rather than using Link property? What does it do?
i.e. Why not just Compile Include="..\Common\src\System\NotImplemented.cs" />?

$(CommonPath) is how it's done in all the other .csproj files that include this file. Personally, my care factor here is low - this was copy-paste.

ahsonkhan · 2017-11-06T20:44:06Z

src/System.Memory/src/System/Buffers/StandardFormat.cs

+        public const byte NoPrecision = byte.MaxValue;
+
+        /// <summary>
+        /// Precision values for format that don't use a precision, or for when the precision is to be unspecified.


The xml comments for MaxPrecision and NoPrecision are mismatched.

ahsonkhan · 2017-11-06T20:45:18Z

src/System.Memory/src/System/Buffers/StandardFormat.cs

+        /// <summary>
+        /// Create a StandardFormat.
+        /// </summary>
+        /// <param name="symbol">A type-specific formatting characeter such as 'G', 'D' or 'X'</param>


characeter spelling

ahsonkhan · 2017-11-06T20:50:06Z

src/System.Memory/src/System/Buffers/StandardFormat.cs

+        public static implicit operator StandardFormat(char symbol) => new StandardFormat(symbol);
+
+        /// <summary>
+        /// Converts a classic .NET format string into a StandardFormat


Update comment to reflect ROS<char> as input.

The term "string" is used generically here, not specifically referrring to System.String. This is just an overload of the one that takes String - I don't think a different XML comment is needed.

Fair point. Although, it is qualified as a "classic .NET format string".

ahsonkhan · 2017-11-06T22:41:52Z

src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Boolean.cs

+            if (!(standardFormat == default(char) || standardFormat == 'G' || standardFormat == 'l'))
+                throw new FormatException(SR.Argument_BadFormatSpecifier);
+
+            if (text.Length >= 4)


This implementation doesn't produce different results based on the standardFormat.
Is any upper/lower case combination of the characters within true considered valid for both 'G' and 'l'?

ahsonkhan · 2017-11-06T22:46:59Z

src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Date.cs

+    public static partial class Utf8Parser
+    {
+        /// <summary>
+        /// Parses a Byte at the start of a Utf8 string.


update xml comment

ahsonkhan · 2017-11-06T22:48:04Z

src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Date.cs

+        }
+
+        /// <summary>
+        /// Parses a SByte at the start of a Utf8 string.


Also here, update the comment

ahsonkhan · 2017-11-06T22:50:34Z

src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Integer.Signed.cs

+                    return TryParseUInt16X(text, out Unsafe.As<short, ushort>(ref value), out bytesConsumed);
+
+                default:
+                    throw new FormatException(SR.Argument_BadFormatSpecifier);


use throw helper to help with inlining.

ahsonkhan · 2017-11-06T22:53:12Z

src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Integer.Unsigned.D.cs

+            return true;
+        }
+
+        private static bool TryParseUInt32D(ReadOnlySpan<byte> text, out uint value, out int bytesConsumed)


Consider applying the same loop unrolling optimization as TryParseInt32D (https://github.com/dotnet/corefx/pull/25078/files#diff-5e90497df430f6541664f0a780acac46R199).

Same with TryParseUInt16D and TryParseByteD.

Not in this PR. We need to get this in before I leave for vacation and this sort of thing is too much code churn for a commit this size.

Agreed :)
Already pushing the bar for what can be reviewed within the browser.

ahsonkhan · 2017-11-06T22:57:48Z

src/System.Memory/src/System/Buffers/Text/Utf8Parser/Utf8Parser.Number.cs

+
+        private static bool TryParseNumber(ReadOnlySpan<byte> text, ref NumberBuffer number, out int bytesConsumed, ParseNumberOptions options, out bool textUsedExponentNotation)
+        {
+            Debug.Assert(number.Digits[0] == 0 && number.Scale == 0 && !number.IsNegative, "Number not initialized to default(NumberBuffer)");


Is the Debug.Assert condition and comment inverted here?

I know the condition isn't and the MSDN example (and usability) says the text should describe what went wrong.

I see. That makes sense.

if number == default, it will be Debug.Assert(true,""), and hence all is good.
if number != default, it will be Debug.Assert(false,""), and we see that message.

ahsonkhan · 2017-11-06T22:59:25Z

src/System.Memory/tests/ParsersAndFormatters/Formatter/FormatterTestData.cs

+
+using Xunit;
+
+namespace System.Buffers.Text.Tests


Can you please post the code coverage numbers for the Formatters and Parsers?

Utf8Formatter: 100%

FormattingHelper: 100%

Utf8Parser: 98.8% (the 1.2% are NotImplementedException and Debug.Assert paths.)

ParserHelper: 100%

MutableDecimal: 100%

System.Number: 98.2% (the 1.8% are paths that are unreachable in System.Memory but may be reachable in ProjectN where this code came from. I did not bother to analyze the much larger Project N codebase and I don't want to introduce unnecessary diffs here.)

System.NumberBuffer: 45.4% (the remaining 55.6% is the ToString() method which only exists so that NumberBuffer displays nicely in the debugger.)

StandardFormat: 100%

ahsonkhan · 2017-11-06T23:01:25Z

src/System.Memory/tests/ParsersAndFormatters/Parser/ParserTests.2gbOverflow.cs

+            IntPtr pMemory;
+            try
+            {
+                pMemory = Marshal.AllocHGlobal(int.MaxValue);


Use AllocationHelper? https://github.com/dotnet/corefx/blob/master/src/System.Memory/tests/AllocationHelper.cs

ahsonkhan · 2017-11-06T23:02:18Z

src/System.Memory/tests/ParsersAndFormatters/Parser/ParserTests.2gbOverflow.cs

+
+        [Fact]
+        [OuterLoop]
+        public static void TestParser2GiBOverflow()


Only run such tests on Windows/OSX. See https://github.com/dotnet/corefx/blob/master/src/System.Memory/tests/Span/Clear.cs#L236

As it stands right now, it's not specific to Windows...

I was suggesting excluding running this test on Linux. We have seen an issue in the past with trying to use OOM exception to skip allocation on Linux:

On Linux, the allocation can succeed even if there is not enough memory but then the test may get killed by the OOM killer at the time the memory is accessed which triggers the full memory allocation.

ahsonkhan · 2017-11-06T23:06:47Z

src/System.Memory/tests/ParsersAndFormatters/TestException.cs

+// Intentionally placed in the global namespace so that we don't have to see an annoying long type name
+// in the xunit output.
+//
+internal sealed class TestException : Exception


This is general purpose and can potentially be used in other tests too. Move this up to src/System.Memory/tests.

ahsonkhan · 2017-11-06T23:10:04Z

src/System.Memory/tests/Performance/Perf.Utf8Parser.cs

+        [InlineData("12837467")] // standard parse
+        [InlineData("-2147483648")] // min value
+        [InlineData("2147483647")] // max value
+        private static void ParserInt32(string text)


What about adding baseline to compare performance against int.TryParse?

Also, can we port over the variable length performance tests like https://github.com/dotnet/corefxlab/blob/master/tests/System.Text.Primitives.Tests/Parsing/PrimitiveParserInt32PerfTests.cs#L121?

I wasn't sure how much usage the perf tests in CoreFx got so I only added the most critical scenarios and mostly so we'd have a set place to add more. We can beef this up for sure - but it might be better to do it on an as-needed basis. There's also a lot of stuff in that test assembly.

ahsonkhan · 2017-11-06T23:13:22Z

src/System.Memory/tests/System.Memory.Tests.csproj

+    <Compile Include="..\Common\src\System\MutableDecimal.cs" />
+  </ItemGroup>
+  <ItemGroup>
+    <Compile Include="ParsersAndFormatters\Formatter\FormatterTestData.cs" />


Since we will likely have Encoders, it might make more sense to just have the following directories at the root, rather than nested within "ParsersAndFormatters":

Formatter

Parser

Encoder

I'm not sure - the Formatter and Parser test-bed has a lot of shared infrastructure that wouldn't be useful for Encoders.

ahsonkhan · 2017-11-06T23:15:17Z

src/System.Memory/tests/ParsersAndFormatters/SupportedFormats.cs

+        public char ParseSynonymFor { get; set; } = default;
+    }
+
+    internal static partial class TestData


Split into separate file

I think it's useful for the SupportedFormats-related metadata to be in the same file as SupportedFormats struct.

jkotas · 2017-11-07T01:12:02Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Date.L.cs

+
+            byte[] dayAbbrev = DayAbbreviationsLowercase[(int)value.DayOfWeek];
+            Unsafe.Add(ref utf8Bytes, 0) = dayAbbrev[0];
+            Unsafe.Add(ref utf8Bytes, 1) = dayAbbrev[1];


Is there a good reason why we are using Unsafe code here? The JIT should be able to eliminate the bounds checks here - because of it should be able to see that Span is at least 29 bytes given the precondition above. If the JIT does not do this optimization, we should get it fixed.

It was that way from the start when it was added to CoreFxLab. It was likely a combination of following the pattern used by the variable-lengthed formatters and the fact that we needed the pinnable reference anyway to call the WriteDigits helpers. @shiftylogic - if you're still following this, was there a JIT-related reason for using Unsafe.Add here?

Switching to the Span indexer on this does seem to slow it down 1-2% (fast Span.)

stephentoub · 2017-11-07T05:02:51Z

src/System.Memory/src/System/Buffers/StandardFormat.cs

+        /// </summary>
+        public static StandardFormat Parse(ReadOnlySpan<char> format)
+        {
+            return Parse(new string(format.ToArray()));


When building System.Memory for .NET Core, we should avoid the extra allocations here, e.g.

return Parse(new string(format));

Even better, invert these so that the string-based overload delegates to the ReadOnlySpan<char>-based overload, and then even the string allocation wouldn't be necessary.

stephentoub · 2017-11-07T05:03:27Z

src/System.Memory/src/System/Buffers/StandardFormat.cs

+
+            if (format.Length > 1)
+            {
+                if (!byte.TryParse(format.Substring(1), out precision))


On .NET Core this can use format.AsReadOnlySpan().Slice(1) and the new byte.TryParse that accepts a span.

The Parse() method doesn't really merit an #if-bifurcated implementation - it was easier just to throw in the simple inline parser.

stephentoub · 2017-11-07T05:03:39Z

src/System.Memory/src/System/Buffers/StandardFormat.cs

+            if (format.Length > 1)
+            {
+                if (!byte.TryParse(format.Substring(1), out precision))
+                    throw new FormatException("format");


nameof(format)... but why is the message just "format"? Shouldn't there be some resource string outlining the problem, or else this would be an ArgumentException of some kind?

Added resource string.

stephentoub · 2017-11-07T05:04:50Z

src/System.Memory/src/System/Buffers/StandardFormat.cs

+                    throw new FormatException("format");
+
+                if (precision > MaxPrecision)
+                    throw new FormatException("precision");


Same questions as I had above.

Same answer.

stephentoub · 2017-11-07T05:29:17Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/FormattingHelpers.cs

+        {
+            long div = numerator / denominator;
+            modulo = numerator - (div * denominator);
+            return div;


This is already the implementation of Math.DivRem in .NET Core:
https://github.com/dotnet/coreclr/blob/master/src/mscorlib/shared/System/Math.cs#L82-L90

long div = a / b; result = a - (div * b); return div;

stephentoub · 2017-11-07T05:42:43Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Guid.cs

+            if (bookEnds && format.Symbol == 'B')
+                Unsafe.Add(ref utf8Bytes, idx++) = CloseBrace;
+            else if (bookEnds && format.Symbol == 'P')
+                Unsafe.Add(ref utf8Bytes, idx++) = CloseParen;


if (bookEnds) { if (format.Symbol == 'B') { Unsafe.Add(ref utf8Bytes, idx++) = CloseBrace; } else if (format.Symbol == 'P') { Unsafe.Add(ref utf8Bytes, idx++) = CloseParen; } }

Fixes https://github.com/dotnet/corefx/issues/24607 Remaining debt (cut for time): Parsing Intgers with the "N" format https://github.com/dotnet/corefx/issues/24986 Some questions to be resolved as to whether to be compatible (BCL doesn't care where you put the commas) or correct. Format of floating point is still a wrapper hack https://github.com/dotnet/corefx/issues/25077 The portable DoubleToNumber() code was never ported to C# (though the big block comment advertising it was).

- Move StandardFormat to System.Buffers - Mark it readonly - Fix spelling: "Seperator" - Deduplicate namespace in ref .cs

- Assert that a culture-invariant ToString() on double produced ASCII characters only - Add magic literal comments for the 4-byte compares in DateTimeOffset parsing. - More "seperator" vs. "separator" - Rename formatter benchmarks to be you know, "formatter"-like.

- Improve perf of long.MinValue path in TryFormat(long) - Make TryParseDateTimeOffset compare exact casing for Rfx1123 formats ('R' and 'l')

- Lots of small items in the last round of feedback.

Removed the extra allocations from this path (though I still can't imagine anyone who cares enough about perf to use this parser wanting to use these apis) and addressed the outstanding feedback surrounding these. Removed the 1.2% false positive noise from Utf8Parser code coverage. It's now at 100%.

ahsonkhan · 2017-11-07T21:21:37Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Date.cs

@@ -121,7 +121,7 @@ public static bool TryFormat(DateTimeOffset value, Span<byte> buffer, out int by
                    return TryFormatDateTimeG(value.DateTime, offset, buffer, out bytesWritten);

                default:
-                    throw new FormatException(SR.Argument_BadFormatSpecifier);
+                    return ThrowHelper.TryFormatThrowFormatException(out bytesWritten);


nit: assign bytesWritten to default outside the switch statement so that we don't have to have an out parameter on the ThrowHelper. It looks strange that it is returning an out parameter.

We'd be writing bytesWritten twice and we'd still need a strange looking return type on the helper to avoid ruining the 100% code coverage on Utf8Formatter. ThrowHelper is always going to look strange unless you're one the people deep into JIT-fu. It still looks strange to me.

ahsonkhan · 2017-11-07T21:32:35Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Integer.Signed.cs

-                // Officially, the default is "G" but "G without a precision is equivalent to "D" so that's why we're calling the "D" helper.
-                return TryFormatInt64D(value, format.Precision, buffer, out bytesWritten);
+                // Officially, the default is "G" but "G without a precision is equivalent to "D" and so that's why we're using "D" (eliminates an unnecessary HasPrecision check)
+                format = 'D';


Overriding the entire struct and calling the constructor here increases the method assembly size.

What is wrong with the following?

char formatChar = format.Symbol; if (format.IsDefault) { formatChar = 'D'; } switch (formatChar) { ...

What's wrong is that you'll pass the wrong precision down to the subformatter.

Ah yes, if format.IsDefault is true, the precision == 0. But we want NoPrecision, not 0. Got it.

ahsonkhan · 2017-11-07T21:40:12Z

src/System.Memory/src/System/Buffers/StandardFormat.cs

        }

        /// <summary>
+        /// Converts a classic .NET format string into a StandardFormat
+        /// </summary>
+        public static StandardFormat Parse(string format) => format == null ? default : Parse(SpanExtensions.AsReadOnlySpan(format));  //@todo: Change back to extension syntax once the ambiguous reference with CoreLib is eliminated.


Can you please explain why this is causing an ambiguous reference?

This:

https://github.com/dotnet/coreclr/blob/6a5b0e5bd45b987005d0947cec35dce9eb2c51f2/src/mscorlib/shared/System/Span.NonGeneric.cs#L121

paired with this:

corefx/src/System.Memory/src/System.Memory.csproj

Line 69 in ea39ed0

<ReferenceFromRuntime Include="System.Private.CoreLib" />

I do not understand the issue. If there is an ambiguous reference, how will it be eliminated?

We reference CoreLib and then the extension methods forward to them.
https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/SpanExtensions.Fast.cs#L48

One extension method is defined on SpanExtensions, the other on Span. Presumably, the fix is to fix CoreLib - either by making its own copy of the extension method internal or by moving it to the correct type. It's out of the scope of this PR.

Or perhaps the CoreLib AsReadOnlySpan shouldn't be an extension method.

ahsonkhan · 2017-11-07T21:44:59Z

src/System.Memory/src/System/Buffers/StandardFormat.cs

+                    byte precision = Precision;
+                    if (precision != NoPrecision)
+                    {
+                        if (precision >= 100)


Is this considered valid if MaxPrecision is 99? I thought MaxLength would be 3, not 4.

ToString() cannot throw exceptions and there's no way to prevent a mischievous app from hand-crafting a struct with illegal values in it. So yes, it has to be robust in this situation.

I'm not understanding this argument. If you corrupt the data structure with unsafe code or the equivalent, then yeah, this might give you an invalid answer, e.g. 23 instead of 123. That's fine. It won't corrupt anything, it'll just give an invalid answer for invalid input.

ahsonkhan · 2017-11-07T21:55:21Z

src/System.Memory/src/System/Buffers/StandardFormat.cs

+            else
+            {
+                uint parsedPrecision = 0;
+                for (int srcIndex = 1; srcIndex < format.Length; srcIndex++)


This allows for parsing of strings like "D000...0009". Is that desired behavior? Should we strictly only accept [char][1-9][0-9]?

Why not - the regular BCL parsing routines allow leading zeros in their format string.

ahsonkhan · 2017-11-07T23:24:09Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Boolean.cs

@@ -10,7 +10,7 @@ public static partial class Utf8Formatter
        /// Formats a Boolean as a UTF8 string.
        /// </summary>
        /// <param name="value">Value to format</param>
-        /// <param name="buffer">Buffer to receive UTF8 string</param>
+        /// <param name="buffer">Buffer to write the UTF8-formatted value to</param>
        /// <param name="bytesWritten">Receives the length of the formatted text in bytes</param>


The same nit about using "receive" in the xml comments applies to other parameters, like bytesWritten.

ahsonkhan · 2017-11-07T23:24:21Z

src/System.Memory/src/System/Buffers/Text/Utf8Formatter/Utf8Formatter.Date.cs

@@ -78,7 +78,7 @@ public static partial class Utf8Formatter
        /// Formats a DateTimeOffset as a UTF8 string.
        /// </summary>
        /// <param name="value">Value to format</param>
-        /// <param name="buffer">Buffer to receive UTF8 string</param>
+        /// <param name="buffer">Buffer to write the UTF8-formatted value to</param>
        /// <param name="bytesWritten">Receives the length of the formatted text in bytes</param>


Also here and elsewhere.

Won't block on this: feel free to submit a PR for the XML docs - probably faster if you do it with the wording you feel works best.

ahsonkhan · 2017-11-10T22:15:56Z

src/System.Memory/Common/src/System/MutableDecimal.cs

@@ -0,0 +1,54 @@
+// Licensed to the .NET Foundation under one or more agreements.


Should this be in corefx/src/Common/ instead of System.Memory/Common/...?

I think the concept is a bit too icky to advertise broadly.

@atsushikan, would you be ok with me moving this file within the src directory?

So, from System.Memory/Common/src/System/MutableDecimal.cs to System.Memory/src/System/MutableDecimal.cs? It seems strange to put this in a "Commons" directory and yet it is only accessed within System.Memory. At the very least, I think we should move it inside System.Memory/src, if we want to keep it in Common. Right now, System.Memory is the only one with a folder other than pkg/ref/src/tests at the root.

It's also weird for a test project to reach into a sibling src directory. If the current setup is really objectionable, then go ahead and move it to the global Common directory. We have other narrowly-targeted stuff up there so this isn't that different.

ahsonkhan · 2017-11-10T22:17:00Z

src/System.Memory/src/System/Number/Number.NumberBuffer.cs

+using System.Runtime.InteropServices;
+using System.Runtime.CompilerServices;
+
+namespace System


Do these internal Number structs have to be in the System namespace or can they be moved to System.Buffers.Text?

It's internal so namespace doesn't strictly matter but I'd also prefer not to introduce unnecessary diff's between these and the originals in CoreRT.

* Produce Utf8Parser and Utf8Formatter Fixes https://github.com/dotnet/corefx/issues/24607 Remaining debt (cut for time): Parsing Intgers with the "N" format https://github.com/dotnet/corefx/issues/24986 Some questions to be resolved as to whether to be compatible (BCL doesn't care where you put the commas) or correct. Format of floating point is still a wrapper hack https://github.com/dotnet/corefx/issues/25077 The portable DoubleToNumber() code was never ported to C# (though the big block comment advertising it was). * PR feedback. - Move StandardFormat to System.Buffers - Mark it readonly - Fix spelling: "Seperator" - Deduplicate namespace in ref .cs * PR feedback. - Assert that a culture-invariant ToString() on double produced ASCII characters only - Add magic literal comments for the 4-byte compares in DateTimeOffset parsing. - More "seperator" vs. "separator" - Rename formatter benchmarks to be you know, "formatter"-like. * PR feedback. - Improve perf of long.MinValue path in TryFormat(long) - Make TryParseDateTimeOffset compare exact casing for Rfx1123 formats ('R' and 'l') * PR feedback. - Lots of small items in the last round of feedback. * PR feedback (ThrowHelper, AllocHelper, random easy stuff) * PR feedback (StandardFormat.Parse/ToString()) Removed the extra allocations from this path (though I still can't imagine anyone who cares enough about perf to use this parser wanting to use these apis) and addressed the outstanding feedback surrounding these. Removed the 1.2% false positive noise from Utf8Parser code coverage. It's now at 100%. * Replace 'buffer' text in XML docs Commit migrated from dotnet/corefx@d9924a5

ghost added the area-System.Memory label Nov 6, 2017

ghost added this to the 2.1.0 milestone Nov 6, 2017

ghost self-assigned this Nov 6, 2017

ghost requested review from ahsonkhan and KrzysztofCwalina November 6, 2017 15:55

ghost changed the title ~~Produce Utf8Parser and Utf8Formatter~~ Productize Utf8Parser and Utf8Formatter Nov 6, 2017

KrzysztofCwalina reviewed Nov 6, 2017

View reviewed changes

justinvp reviewed Nov 6, 2017

View reviewed changes

KrzysztofCwalina reviewed Nov 6, 2017

View reviewed changes

justinvp reviewed Nov 6, 2017

View reviewed changes

KrzysztofCwalina reviewed Nov 6, 2017

View reviewed changes

ahsonkhan suggested changes Nov 6, 2017

View reviewed changes

ahsonkhan reviewed Nov 6, 2017

View reviewed changes

jkotas reviewed Nov 7, 2017

View reviewed changes

stephentoub reviewed Nov 7, 2017

View reviewed changes

atsushikan added 6 commits November 7, 2017 05:03

PR feedback.

de9a5a7

- Move StandardFormat to System.Buffers - Mark it readonly - Fix spelling: "Seperator" - Deduplicate namespace in ref .cs

PR feedback.

3c036bf

- Assert that a culture-invariant ToString() on double produced ASCII characters only - Add magic literal comments for the 4-byte compares in DateTimeOffset parsing. - More "seperator" vs. "separator" - Rename formatter benchmarks to be you know, "formatter"-like.

PR feedback.

c213345

- Improve perf of long.MinValue path in TryFormat(long) - Make TryParseDateTimeOffset compare exact casing for Rfx1123 formats ('R' and 'l')

PR feedback.

b6784f8

- Lots of small items in the last round of feedback.

PR feedback (ThrowHelper, AllocHelper, random easy stuff)

d7fd0b6

ahsonkhan reviewed Nov 7, 2017

View reviewed changes

Replace 'buffer' text in XML docs

41a19cc

ahsonkhan reviewed Nov 7, 2017

View reviewed changes

ahsonkhan approved these changes Nov 7, 2017

View reviewed changes

ghost merged commit d9924a5 into dotnet:master Nov 7, 2017

ghost deleted the stp branch November 7, 2017 23:40

ahsonkhan reviewed Nov 10, 2017

View reviewed changes

ahsonkhan mentioned this pull request Mar 9, 2018

Move MutableDecimal from System.Memory to Common #27917

Merged

ahsonkhan mentioned this pull request May 31, 2018

Support Uri parsing in InvariantParser. dotnet/corefxlab#749

Closed

This pull request was closed.

		@@ -0,0 +1,54 @@
		// Licensed to the .NET Foundation under one or more agreements.

Productize Utf8Parser and Utf8Formatter #25078

Productize Utf8Parser and Utf8Formatter #25078

Conversation

ghost commented Nov 6, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KrzysztofCwalina Nov 6, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KrzysztofCwalina Nov 6, 2017 • edited

Choose a reason for hiding this comment

tannergooding Nov 7, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KrzysztofCwalina Nov 6, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahsonkhan Nov 6, 2017 • edited

KrzysztofCwalina Nov 6, 2017 •

edited

KrzysztofCwalina Nov 6, 2017 •

edited

tannergooding Nov 7, 2017 •

edited

KrzysztofCwalina Nov 6, 2017 •

edited

ahsonkhan Nov 6, 2017 •

edited