Skip to content
This repository has been archived by the owner on Aug 2, 2023. It is now read-only.

Primitive formatting enhancements for Time and Uuid #1462

Merged
merged 19 commits into from
Apr 12, 2017
Merged

Primitive formatting enhancements for Time and Uuid #1462

merged 19 commits into from
Apr 12, 2017

Conversation

shiftylogic
Copy link
Contributor

  • Optimized for Invariant UTF-8 and UTF-16.
  • Formats are matched with existing ToString(InvariantCulture).
  • Test coverage.
  • Performance improvements (2-7x depending on scenario).

public static int WriteFractionDigits(long value, int digitCount, ref byte buffer, int index)
{
for (var i = FractionDigits; i > digitCount; i--)
value = DivMod(value, 10, out long m);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use out long m anywhere, should we have an overload that doesn't return an out parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases, that out param is used since that is the purpose of this function. However, this particular usage is actually leftover from when I was using a custom implementation of DivMod10 that was faster but obscure and only worked on positive integers up to 1 billion or so. I ended up reverting to the Math.DivRem implementation for easier maintenance.

I will switch this to a straight division since my optimized DivMod10 is gone now.

public static long DivMod(long numerator, long denominator, out long modulo)
{
long div = numerator / denominator;
modulo = numerator - (div * denominator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a perf difference between this and just using the % operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The % operator is really expensive. The compiler should do this for us (like the C++ compiler does), but it doesn't. This trick is actually coded into the Math library as a DivRem method, but I don't have access to it here. My testing shows this to be significantly faster.

#region Character counting helper methods

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int CountDigits(long n)
Copy link
Member

@ahsonkhan ahsonkhan Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this implementation and it is faster (esp for n < 10000000):

n = Math.Abs(n);
int length = 1;
while ((n /= 10) >= 1)
{
    length++;
}
return length;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw different results on my dev machine. I might re-visit this. Though, if I change it, I will likely just unroll the loop completely.

// code must have already done all the necessary validation.
internal static class FormattingHelpers
{
private const int FractionDigits = 7;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this 7? Can you explain what FractionDigits represents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec for fractions on time formats is fixed at exactly 7 characters. I have no idea why this is, but it is part of most time specs. I'll add a comment to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Unsafe.Add(ref utf16Bytes, 13) = Colon;

FormattingHelpers.WriteDigits(value.Minute, 2, ref utf16Bytes, 14);
Unsafe.Add(ref utf16Bytes, 16) = Colon;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we get 16 from adding 2 and 14 from the call to WriteDigits? If so, should this be a calculation stored in a variable?

digitCount = 2;
index = 14;
Unsafe.Add(ref utf16Bytes, digitCount + index) = Colon;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra variables means more IL instructions. These are constants and therefore can be optimized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Maybe add a comment to make the relationship explicit then and explain what the constants are? I am just thinking to make it easier in terms of maintainability or code changes/refactoring.


namespace System.Text
{
internal static class InvariantUtf8TimeFormatter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to extract out the duplicate code between InvariantUtf16TimeFormatter and InvariantUtf8TimeFormatter into helpers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are fundamentally different enough that you can't merge them without using generics or some level of indirection since they operate using different sized characters and types. The point of separating them was to avoid indirection and abstraction.

Copy link
Member

@ahsonkhan ahsonkhan Apr 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, for TryFormatG, aren't these parts of the methods exactly the same (aside from the difference that one takes a ref char while the other takes ref byte)? Could we extract that out into helpers?

https://github.com/shiftylogic/corefxlab/blob/9b46b6344f2feae9e4ffdff1fc074ec42203dad3/src/System.Text.Primitives/System/Text/Formatting/Internal/TimeFormatter.Utf8.cs#L76-L113

https://github.com/shiftylogic/corefxlab/blob/9b46b6344f2feae9e4ffdff1fc074ec42203dad3/src/System.Text.Primitives/System/Text/Formatting/Internal/TimeFormatter.Utf16.cs#L78-L115

Aside from that, the other difference is:
bytesWritten = charsNeeded * sizeof(char);
and
Span<char> dst = buffer.NonPortableCast<byte, char>();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One writes char, the other byte. Without generics, how would you make this common? If they were each operating on the same sized span and writing the same data, it would be trivial.

case 'G':
return TryFormatDateTimeFormatG(value.DateTime, buffer, out bytesWritten, encoder);
return TryFormatDateTimeFormatG(value.DateTime, offset, buffer, out bytesWritten, encoder);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If format.Symbol == 'G' and format.IsDefault == false, offset = NullOffset. Do we want that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tests for the "default" case and this is passing based on what DateTimeOffset.ToString returns. For some reason, the system DateTime.ToString "default" (though it is documented as 'G') actually does something slightly different than explicitly using 'G'. I don't claim to understand why, but these APIs now follow exactly what the system does in all the cases we support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems odd, but I suppose we should match the DateTime.ToString behaviour.
I wonder if the source can provide some insight (I haven't looked deeply into this, but for reference):
http://referencesource.microsoft.com/#mscorlib/system/globalization/datetimeformat.cs,908

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My implementations are derived from that code. That is how I found the difference. For some reason, they treat the default (i.e. no format specifier) as a modified 'G'.

Assert.Equal(expected, actual);
}

static readonly Random Rnd = new Random();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a seed value for consistency between runs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but I don't know if I like it. This will have better coverage over time, and in failure cases, the logs will have enough information to replicate the failure if we end up hitting a bad case.

I could be swayed, if this is something we MUST do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to keep test runs consistent. If the test does catch a bug, we want it to be deterministic and always fail rather than seeing intermittent failures which could cause confusion (scenario: seeing a red in CI, re-test assuming some CI infra issue, and it goes green). If we want higher coverage, should we be explicit in our test cases and brute force a large input set?

better coverage over time

I suppose that would be OK but imo being explicit in the edge cases/etc would be better.


namespace System.Text
{
internal static class InvariantUtf16UuidFormatter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename as GUID instead of UUID?

@ahsonkhan
Copy link
Member

Otherwise, LGTM.


namespace System.Text.Primitives.Tests
{
public class UuidTests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename as guid? Along with filename.

@shiftylogic shiftylogic merged commit 156e818 into dotnet:master Apr 12, 2017
else if (encoder.IsInvariantUtf16)
return InvariantUtf16GuidFormatter.TryFormat(value, buffer, out bytesWritten, format);
else
throw new NotImplementedException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have issues filed (or some other tracking mechanism) for all the NotImplementedExceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be covered by issue #57. There are variants of that issue for parsing and encoding as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants