-
Notifications
You must be signed in to change notification settings - Fork 345
Primitive formatting enhancements for Time and Uuid #1462
Conversation
shiftylogic
commented
Apr 11, 2017
- Optimized for Invariant UTF-8 and UTF-16.
- Formats are matched with existing ToString(InvariantCulture).
- Test coverage.
- Performance improvements (2-7x depending on scenario).
…versions; code structure cleanup
public static int WriteFractionDigits(long value, int digitCount, ref byte buffer, int index) | ||
{ | ||
for (var i = FractionDigits; i > digitCount; i--) | ||
value = DivMod(value, 10, out long m); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use out long m
anywhere, should we have an overload that doesn't return an out parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases, that out param is used since that is the purpose of this function. However, this particular usage is actually leftover from when I was using a custom implementation of DivMod10 that was faster but obscure and only worked on positive integers up to 1 billion or so. I ended up reverting to the Math.DivRem implementation for easier maintenance.
I will switch this to a straight division since my optimized DivMod10 is gone now.
public static long DivMod(long numerator, long denominator, out long modulo) | ||
{ | ||
long div = numerator / denominator; | ||
modulo = numerator - (div * denominator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a perf difference between this and just using the % operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The % operator is really expensive. The compiler should do this for us (like the C++ compiler does), but it doesn't. This trick is actually coded into the Math library as a DivRem method, but I don't have access to it here. My testing shows this to be significantly faster.
#region Character counting helper methods | ||
|
||
[MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
public static int CountDigits(long n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this implementation and it is faster (esp for n < 10000000):
n = Math.Abs(n);
int length = 1;
while ((n /= 10) >= 1)
{
length++;
}
return length;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw different results on my dev machine. I might re-visit this. Though, if I change it, I will likely just unroll the loop completely.
// code must have already done all the necessary validation. | ||
internal static class FormattingHelpers | ||
{ | ||
private const int FractionDigits = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this 7? Can you explain what FractionDigits represents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec for fractions on time formats is fixed at exactly 7 characters. I have no idea why this is, but it is part of most time specs. I'll add a comment to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
Unsafe.Add(ref utf16Bytes, 13) = Colon; | ||
|
||
FormattingHelpers.WriteDigits(value.Minute, 2, ref utf16Bytes, 14); | ||
Unsafe.Add(ref utf16Bytes, 16) = Colon; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we get 16 from adding 2 and 14 from the call to WriteDigits? If so, should this be a calculation stored in a variable?
digitCount = 2;
index = 14;
Unsafe.Add(ref utf16Bytes, digitCount + index) = Colon;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra variables means more IL instructions. These are constants and therefore can be optimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Maybe add a comment to make the relationship explicit then and explain what the constants are? I am just thinking to make it easier in terms of maintainability or code changes/refactoring.
|
||
namespace System.Text | ||
{ | ||
internal static class InvariantUtf8TimeFormatter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to extract out the duplicate code between InvariantUtf16TimeFormatter and InvariantUtf8TimeFormatter into helpers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are fundamentally different enough that you can't merge them without using generics or some level of indirection since they operate using different sized characters and types. The point of separating them was to avoid indirection and abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, for TryFormatG, aren't these parts of the methods exactly the same (aside from the difference that one takes a ref char while the other takes ref byte)? Could we extract that out into helpers?
Aside from that, the other difference is:
bytesWritten = charsNeeded * sizeof(char);
and
Span<char> dst = buffer.NonPortableCast<byte, char>();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One writes char, the other byte. Without generics, how would you make this common? If they were each operating on the same sized span and writing the same data, it would be trivial.
case 'G': | ||
return TryFormatDateTimeFormatG(value.DateTime, buffer, out bytesWritten, encoder); | ||
return TryFormatDateTimeFormatG(value.DateTime, offset, buffer, out bytesWritten, encoder); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If format.Symbol == 'G' and format.IsDefault == false, offset = NullOffset. Do we want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tests for the "default" case and this is passing based on what DateTimeOffset.ToString returns. For some reason, the system DateTime.ToString "default" (though it is documented as 'G') actually does something slightly different than explicitly using 'G'. I don't claim to understand why, but these APIs now follow exactly what the system does in all the cases we support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems odd, but I suppose we should match the DateTime.ToString behaviour.
I wonder if the source can provide some insight (I haven't looked deeply into this, but for reference):
http://referencesource.microsoft.com/#mscorlib/system/globalization/datetimeformat.cs,908
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My implementations are derived from that code. That is how I found the difference. For some reason, they treat the default (i.e. no format specifier) as a modified 'G'.
Assert.Equal(expected, actual); | ||
} | ||
|
||
static readonly Random Rnd = new Random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a seed value for consistency between runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but I don't know if I like it. This will have better coverage over time, and in failure cases, the logs will have enough information to replicate the failure if we end up hitting a bad case.
I could be swayed, if this is something we MUST do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to keep test runs consistent. If the test does catch a bug, we want it to be deterministic and always fail rather than seeing intermittent failures which could cause confusion (scenario: seeing a red in CI, re-test assuming some CI infra issue, and it goes green). If we want higher coverage, should we be explicit in our test cases and brute force a large input set?
better coverage over time
I suppose that would be OK but imo being explicit in the edge cases/etc would be better.
|
||
namespace System.Text | ||
{ | ||
internal static class InvariantUtf16UuidFormatter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename as GUID instead of UUID?
Otherwise, LGTM. |
|
||
namespace System.Text.Primitives.Tests | ||
{ | ||
public class UuidTests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename as guid? Along with filename.
else if (encoder.IsInvariantUtf16) | ||
return InvariantUtf16GuidFormatter.TryFormat(value, buffer, out bytesWritten, format); | ||
else | ||
throw new NotImplementedException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have issues filed (or some other tracking mechanism) for all the NotImplementedExceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should already be covered by issue #57. There are variants of that issue for parsing and encoding as well.