Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernisation/Performance boost POC #39

Merged
merged 18 commits into from Aug 18, 2023
Merged

Modernisation/Performance boost POC #39

merged 18 commits into from Aug 18, 2023

Conversation

TimothyMakkison
Copy link
Contributor

@TimothyMakkison TimothyMakkison commented Dec 21, 2022

Partially resolving #38 , #37 #35 . Modernised and implemented interfaces to be more like Guid, optimized code using intrinsics and bit hacks.

  • Implemented IComparable.
  • Implemented ISpanParsable and ISpanParsable<Ulid>.
  • Optimised get Time by 30%.
  • Vectorised and optimised public Ulid(Guid) for a 80% and 20% speed up.
  • Added vector equality checks and modern unsafe code for equality checks for a 20-30% speedup.
  • Modernised GetHashCode to use modern unsafe code.
  • Vectorised ToGuid and added bit hack optimization for a 80-90% and 20% speed up.
  • Added .Net 7.0
  • Benchmarks used can be found here.

I've updated Ulid, PerfBenchmarks and Ulid,Tests to support .Net 7.0, this is to add ISpanParsable and modern Vector methods. All changes should be back compatible with older versions of dotnet - modern code is added with the #if pre processor directives.

A lot of intrinsics code appears to be repeated for .Net 3+ and for .Net 7, this is deliberate as the cross platform Vector128.Shuffle was only added in .Net 7. This way Ssse compatible devices will get the benefit of vectorisation in most modern versions of .Net.

Because of JIT related issues (I think) if .Net 3 vector code doesn't constant vectors outside of method bodies for performance reasons. Likewise .Net 7 vector code should only use constant vectors placed inside its method body, again for performance reasons. I'm not sure why this happens (might be similar to this or this), all I know is that when I use a shuffle indices vector the wrong way the performance takes a hit. I'd be curious to see if other people experience the same performance improvements I've had.

Note that my benchmarks may not be accurate, the times are in the nano seconds and vary quite a bit. For some benchmarks I've added a loop to get more reliable numbers, but due to the magic of software these numbers are probably not representative. I've labeled these results accordingly. Some of that changes may not be significant and It's possible that I've added performance regressions for other versions of .Net or devices.

Finally please note that I'm still unexperienced with unsafe code and vectorisation. It's likely that I've made a mistake or overlooked key points. ie memory alignment, pointer safety, architecture compatibility, Unsafe.As vs Unsafe.As(Unsafe.AsRef), accidental mutations.

Implemented

IComparable

Implemented IComparable, returns 1 if null, raises an exception if not an Ulid, otherwise the default CompareTo method is used. I also added a comparison unit test.

ISpanFormattable & ISpanParsable

Implemented both interfaces for their respective versions with additional unit tests. I'm not familiar with these interfaces (can't find any docs/articles) and treated the methods as wrappers for TryWriteStringify/ToString and Parse/TryParse, ignoring IFormatProvider and ReadOnlySpan<char> format.
Looking at the Guid internals they ignored IFormatProvider but would change the serialization method depending on values inside ReadOnlySpan<char> format. I assumed that it isn't mandatory for implementors to use these parameters. Perhaps you could consider custom formatting options here.

Changed

Time

Updated the Time property to use Unsafe code to reverse the first 6 bytes. I kept the original method for big endian devices.
Benchmarking shows a 30% speedup with times equal to or faster than vectorisation so I opted to only use byte shuffling.

Method Runtime Mean Error StdDev Gen0 Allocated
UlidUpdated_ .NET Core 3.1 18.22 ns 0.289 ns 0.075 ns - -
UlidVector_ .NET Core 3.1 17.64 ns 0.088 ns 0.023 ns - -
UlidOriginal_ .NET Core 3.1 23.48 ns 0.132 ns 0.034 ns - -
NUlid_ .NET Core 3.1 34.48 ns 0.525 ns 0.081 ns 0.0408 64 B
UlidUpdated_ .NET 5.0 17.35 ns 0.342 ns 0.089 ns - -
UlidVector_ .NET 5.0 17.33 ns 0.079 ns 0.012 ns - -
UlidOriginal_ .NET 5.0 23.51 ns 0.094 ns 0.024 ns - -
NUlid_ .NET 5.0 32.94 ns 0.310 ns 0.048 ns 0.0408 64 B
UlidUpdated_ .NET 7.0 12.94 ns 0.179 ns 0.028 ns - -
UlidVector_ .NET 7.0 13.24 ns 0.163 ns 0.042 ns - -
UlidOriginal_ .NET 7.0 18.71 ns 0.189 ns 0.029 ns - -
NUlid_ .NET 7.0 29.44 ns 0.179 ns 0.046 ns 0.0408 64 B

Looped benchmark (100 iterations)

Method Runtime Mean Error StdDev Gen0 Allocated
UlidUpdated_ .NET Core 3.1 1,970.66 ns 12.498 ns 1.934 ns - -
UlidVector_ .NET Core 3.1 1,910.51 ns 20.386 ns 5.294 ns - -
UlidOriginal_ .NET Core 3.1 2,531.84 ns 86.987 ns 13.461 ns - -
NUlid_ .NET Core 3.1 33.57 ns 0.466 ns 0.072 ns 0.0408 64 B
UlidUpdated_ .NET 5.0 1,872.80 ns 11.559 ns 3.002 ns - -
UlidVector_ .NET 5.0 1,908.42 ns 11.305 ns 1.749 ns - -
UlidOriginal_ .NET 5.0 2,499.66 ns 11.249 ns 2.921 ns - -
NUlid_ .NET 5.0 32.65 ns 0.585 ns 0.091 ns 0.0408 64 B
UlidUpdated_ .NET 7.0 1,478.76 ns 13.499 ns 3.506 ns - -
UlidVector_ .NET 7.0 1,476.74 ns 10.306 ns 1.595 ns - -
UlidOriginal_ .NET 7.0 2,105.35 ns 12.327 ns 3.201 ns - -
NUlid_ .NET 7.0 29.10 ns 0.501 ns 0.130 ns 0.0408 64 B

Constructor public Ulid Guid

Added/updated 3 methods for public Ulid(Guid guid). For .Net 7+ or .Net 3+ I used shuffle intrinsics to shuffle the structure of the guid and cast it into a Ulid. For little endian intrinsic incompatible devices I used Unsafe and BinaryPrimitives.ReverseEndianness to alter the Guid structure to Ulid.

Method Runtime Mean Error StdDev Gen0 Allocated
UlidUpdatedBitwise_ .NET Core 3.1 9.794 ns 0.1290 ns 0.0200 ns - -
UlidUpdatedVec_ .NET Core 3.1 2.747 ns 0.0699 ns 0.0182 ns - -
UlidOriginal_ .NET Core 3.1 12.401 ns 0.0551 ns 0.0143 ns - -
NUlid_ .NET Core 3.1 19.591 ns 0.2255 ns 0.0586 ns 0.0255 40 B
UlidUpdatedBitwise_ .NET 5.0 9.548 ns 0.4433 ns 0.0686 ns - -
UlidUpdatedVec_ .NET 5.0 2.348 ns 0.0683 ns 0.0106 ns - -
UlidOriginal_ .NET 5.0 12.083 ns 0.0419 ns 0.0109 ns - -
NUlid_ .NET 5.0 18.995 ns 0.3381 ns 0.0878 ns 0.0255 40 B
UlidUpdatedBitwise_ .NET 7.0 9.015 ns 0.1410 ns 0.0366 ns - -
UlidUpdatedVec_ .NET 7.0 2.094 ns 0.1952 ns 0.0507 ns - -
UlidOriginal_ .NET 7.0 11.003 ns 0.3082 ns 0.0801 ns - -
NUlid_ .NET 7.0 17.700 ns 0.2000 ns 0.0519 ns 0.0255 40 B

Equality

Added a vectorized equality check using either .Net 7+ Vector128 extensions, a manual Ssse3 comparison for .Net 3.0-.Net 6.0 or a refactored unsafe equality check.

Method Runtime Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
Guid_ .NET Core 3.1 2.0128 ns 0.0352 ns 0.0091 ns 1.00 0.00 - - NA
UlidUnsafe_ .NET Core 3.1 1.8601 ns 0.0629 ns 0.0097 ns 0.93 0.01 - - NA
UlidVector_ .NET Core 3.1 2.0095 ns 0.1870 ns 0.0486 ns 1.00 0.03 - - NA
UlidOriginal_ .NET Core 3.1 2.5127 ns 0.0408 ns 0.0106 ns 1.25 0.01 - - NA
NUlid_ .NET Core 3.1 36.7950 ns 2.3330 ns 0.6059 ns 18.28 0.36 0.0510 80 B NA
Guid_ .NET 5.0 1.9346 ns 0.0226 ns 0.0035 ns 1.00 0.00 - - NA
UlidUnsafe_ .NET 5.0 2.0288 ns 0.2671 ns 0.0694 ns 1.04 0.03 - - NA
UlidVector_ .NET 5.0 1.6987 ns 0.0409 ns 0.0063 ns 0.88 0.00 - - NA
UlidOriginal_ .NET 5.0 2.3973 ns 0.3028 ns 0.0786 ns 1.23 0.05 - - NA
NUlid_ .NET 5.0 37.4773 ns 4.3714 ns 1.1352 ns 19.23 0.60 0.0510 80 B NA
Guid_ .NET 7.0 1.6386 ns 0.1061 ns 0.0276 ns 1.00 0.00 - - NA
UlidUnsafe_ .NET 7.0 0.4664 ns 0.0366 ns 0.0057 ns 0.29 0.00 - - NA
UlidVector_ .NET 7.0 0.3541 ns 0.0080 ns 0.0021 ns 0.22 0.00 - - NA
UlidOriginal_ .NET 7.0 1.1777 ns 0.6814 ns 0.1769 ns 0.72 0.12 - - NA
NUlid_ .NET 7.0 36.0873 ns 19.4051 ns 5.0394 ns 22.04 3.26 0.0510 80 B NA

Looped benchmark (100 iterations)

Method Runtime Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
Guid_ .NET Core 3.1 2.034 ns 0.0409 ns 0.0106 ns 1.00 0.00 - - NA
UlidUnsafe_ .NET Core 3.1 242.943 ns 18.8697 ns 4.9004 ns 119.46 2.47 - - NA
UlidVector_ .NET Core 3.1 206.211 ns 1.3328 ns 0.3461 ns 101.40 0.41 - - NA
UlidOriginal_ .NET Core 3.1 337.801 ns 4.4441 ns 1.1541 ns 166.10 0.55 - - NA
NUlid_ .NET Core 3.1 34.217 ns 0.9053 ns 0.1401 ns 16.83 0.11 0.0510 80 B NA
Guid_ .NET 5.0 1.951 ns 0.1234 ns 0.0191 ns 1.00 0.00 - - NA
UlidUnsafe_ .NET 5.0 297.125 ns 98.1249 ns 25.4827 ns 152.33 16.48 - - NA
UlidVector_ .NET 5.0 206.018 ns 2.8097 ns 0.4348 ns 105.62 0.83 - - NA
UlidOriginal_ .NET 5.0 305.997 ns 1.9817 ns 0.5146 ns 156.87 1.70 - - NA
NUlid_ .NET 5.0 34.712 ns 0.6706 ns 0.1741 ns 17.81 0.18 0.0510 80 B NA
Guid_ .NET 7.0 1.636 ns 0.0610 ns 0.0094 ns 1.00 0.00 - - NA
UlidUnsafe_ .NET 7.0 62.873 ns 24.1450 ns 6.2704 ns 38.12 4.36 - - NA
UlidVector_ .NET 7.0 42.068 ns 0.6935 ns 0.1073 ns 25.71 0.18 - - NA
UlidOriginal_ .NET 7.0 144.580 ns 14.0945 ns 3.6603 ns 87.89 2.37 - - NA
NUlid_ .NET 7.0 32.020 ns 0.7219 ns 0.1875 ns 19.59 0.22 0.0510 80 B NA

GetHashCode

Refactored the previous unsafe scoped implementation to use modern Unsafe code.

Method Runtime Mean Error StdDev Median Ratio RatioSD Gen0 Allocated Alloc Ratio
Guid_ .NET Core 3.1 1.4569 ns 1.5604 ns 0.4052 ns 1.3952 ns 1.00 0.00 - - NA
UlidUpdate_ .NET Core 3.1 0.4813 ns 1.0117 ns 0.2627 ns 0.3474 ns 0.37 0.24 - - NA
UlidOriginal_ .NET Core 3.1 1.0379 ns 0.5577 ns 0.0863 ns 0.9981 ns 0.77 0.17 - - NA
NUlid_ .NET Core 3.1 22.5968 ns 0.6576 ns 0.1708 ns 22.6603 ns 16.47 4.40 0.0255 40 B NA
Guid_ .NET 7.0 0.0059 ns 0.0178 ns 0.0046 ns 0.0041 ns 1.00 0.00 - - NA
UlidUpdate_ .NET 7.0 0.0055 ns 0.0053 ns 0.0008 ns 0.0053 ns 3.73 5.33 - - NA
UlidOriginal_ .NET 7.0 0.0679 ns 0.0232 ns 0.0036 ns 0.0693 ns 46.73 65.15 - - NA
NUlid_ .NET 7.0 20.6823 ns 0.5543 ns 0.1440 ns 20.6432 ns 11,398.66 17,269.11 0.0255 40 B NA

ToString

Used string.Create this lets TryWriteStringify directly mutate the string during constructions, before it is made immutable.

ToGuid

Added/updated 3 methods for public Ulid(Guid guid). For .Net 7+ or .Net 3+ I used shuffle intrinsics to shuffle the structure of the Ulid and cast it into a Guid. For little endian intrinsic incompatible devices I used Unsafe and BinaryPrimitives.ReverseEndianness to alter the Ulid structure to Guid.

Method Runtime Mean Error StdDev Gen0 Allocated
UlidUpdatedBitwise_ .NET Core 3.1 7.5734 ns 0.0181 ns 0.0028 ns - -
UlidUpdatedVec_ .NET Core 3.1 1.2751 ns 0.0302 ns 0.0047 ns - -
UlidOriginal_ .NET Core 3.1 9.8598 ns 0.0609 ns 0.0158 ns - -
NUlid_ .NET Core 3.1 17.2219 ns 0.3480 ns 0.0539 ns 0.0255 40 B
UlidUpdatedBitwise_ .NET 5.0 7.1705 ns 0.0137 ns 0.0021 ns - -
UlidUpdatedVec_ .NET 5.0 1.0241 ns 0.0296 ns 0.0077 ns - -
UlidOriginal_ .NET 5.0 9.7610 ns 0.0179 ns 0.0028 ns - -
NUlid_ .NET 5.0 17.2024 ns 0.2036 ns 0.0529 ns 0.0255 40 B
UlidUpdatedBitwise_ .NET 7.0 6.6813 ns 0.0260 ns 0.0067 ns - -
UlidUpdatedVec_ .NET 7.0 0.6562 ns 0.0389 ns 0.0060 ns - -
UlidOriginal_ .NET 7.0 8.6386 ns 0.0742 ns 0.0115 ns - -
NUlid_ .NET 7.0 18.0501 ns 0.2538 ns 0.0393 ns 0.0255 40 B
Method EnvironmentVariables Runtime Mean Error StdDev Median Ratio RatioSD Gen0 Allocated Alloc Ratio
Guid_ COMPlus_EnableSSE2=0 .NET 6.0 44.831 ns 0.8923 ns 0.8346 ns 44.760 ns 1.00 0.00 - - NA
Ulid_ COMPlus_EnableSSE2=0 .NET 6.0 6.925 ns 0.0585 ns 0.0547 ns 6.914 ns 0.15 0.00 - - NA
NUlid_ COMPlus_EnableSSE2=0 .NET 6.0 19.132 ns 0.1224 ns 0.1145 ns 19.136 ns 0.43 0.01 0.0255 40 B NA
Guid_ COMPlus_EnableSSE2=0 .NET 7.0 52.820 ns 0.3402 ns 0.2841 ns 52.692 ns 1.00 0.00 - - NA
Ulid_ COMPlus_EnableSSE2=0 .NET 7.0 6.583 ns 0.0428 ns 0.0379 ns 6.585 ns 0.12 0.00 - - NA
NUlid_ COMPlus_EnableSSE2=0 .NET 7.0 20.839 ns 0.4598 ns 0.7683 ns 20.684 ns 0.39 0.01 0.0255 40 B NA
Guid_ Empty .NET 6.0 46.470 ns 0.9681 ns 0.9056 ns 46.405 ns 1.00 0.00 - - NA
Ulid_ Empty .NET 6.0 1.766 ns 0.0562 ns 0.0469 ns 1.765 ns 0.04 0.00 - - NA
NUlid_ Empty .NET 6.0 19.523 ns 0.2070 ns 0.1835 ns 19.518 ns 0.42 0.01 0.0255 40 B NA
Guid_ Empty .NET 7.0 53.839 ns 0.9657 ns 0.8064 ns 53.778 ns 1.00 0.00 - - NA
Ulid_ Empty .NET 7.0 1.513 ns 0.0778 ns 0.1039 ns 1.496 ns 0.03 0.00 - - NA
NUlid_ Empty .NET 7.0 18.989 ns 0.4115 ns 0.8769 ns 18.490 ns 0.37 0.02 0.0255 40 B NA

Testing

To support vectorised/non vectorised code a way of testing with and without hardware intrinsics enabled should be added. I'm not familiar with github actions but perhaps the following could be added to build-debug.yml.

- run: dotnet test -c Debug  --environment "COMPlus_EnableSSE2=0"
- run: dotnet test -c Debug  --environment "COMPlus_EnableHWIntrinsic=0;COMPlus_EnableSSE2=0"

Future changes/ Thoughts

  • Modernise the code to use C# 11 - use file scoped namespaces, simplify slices.
  • Enable nullable across the project and update relevant methods.
  • Consider adding comparison operators <, <=, >, >=.
  • Add 'NotNullWhen' attributes to methods such as 'TryParse'.

public string ToBase64 uses ArrayPool, stackalloc could instead be used here. The SkipLocalsInit could be used to remove the overhead of clearing the span.

Would the logic used in the original get Time and internal Ulid(long timestampMilliseconds... work on big endian machines?

Little endian
TimestampMilliseconds -> Ulid first 6 bytes
|A|B|C|D|E|F|0|0|              -> |F|E|D|C|B|A|

Big endian
TimestampMilliseconds -> Ulid first 6 bytes
|0|0|F|E|D|C|B|A|              -> |F|E|D|C|0|0|

I have a working vectorized implementation of CrockfordBase32 that is compatible with the Ulid/Nulid implementation (Accepts inputs of 0-122, treats lowercase as uppercase, invalid characters become 31). It needs more work, decoding is 30% faster than the array version but encoding is still slower.

Experimental change to `internal Ulid(long,..)`

Internal Ulid long

Experimented with using Unsafe and BinaryPrimitives.ReverseEndianness to speed up internal constructors. Benchmarks showed little improvement.

Benchmark

Method Runtime Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
Guid_ .NET Core 3.1 89.71 ns 4.371 ns 0.676 ns 1.00 0.00 - - NA
UlidUpdated_ .NET Core 3.1 138.93 ns 56.850 ns 14.764 ns 1.58 0.18 - - NA
Ulid_ .NET Core 3.1 135.14 ns 1.101 ns 0.286 ns 1.51 0.01 - - NA
UlidOrginal_ .NET Core 3.1 130.13 ns 2.616 ns 0.679 ns 1.45 0.01 - - NA
NUlid_ .NET Core 3.1 272.83 ns 3.776 ns 0.584 ns 3.04 0.02 0.0663 104 B NA
Guid_ .NET 5.0 87.75 ns 0.983 ns 0.255 ns 1.00 0.00 - - NA
UlidUpdated_ .NET 5.0 120.98 ns 0.612 ns 0.095 ns 1.38 0.00 - - NA
Ulid_ .NET 5.0 128.48 ns 10.173 ns 1.574 ns 1.47 0.02 - - NA
UlidOrginal_ .NET 5.0 122.75 ns 1.357 ns 0.352 ns 1.40 0.01 - - NA
NUlid_ .NET 5.0 264.45 ns 12.622 ns 3.278 ns 3.01 0.04 0.0663 104 B NA
Guid_ .NET 7.0 86.09 ns 1.055 ns 0.163 ns 1.00 0.00 - - NA
UlidUpdated_ .NET 7.0 52.90 ns 0.735 ns 0.191 ns 0.61 0.00 - - NA
Ulid_ .NET 7.0 52.97 ns 0.182 ns 0.047 ns 0.62 0.00 - - NA
UlidOrginal_ .NET 7.0 54.26 ns 0.209 ns 0.032 ns 0.63 0.00 - - NA
NUlid_ .NET 7.0 188.91 ns 0.492 ns 0.076 ns 2.19 0.00 0.0663 104 B NA

Looped benchmark (100 iterations)

Method Runtime Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
Guid_ .NET Core 3.1 88.18 ns 1.145 ns 0.297 ns 1.00 0.00 - - NA
UlidUpdated_ .NET Core 3.1 13,310.85 ns 2,060.225 ns 318.822 ns 150.88 3.98 - - NA
UlidOrginal_ .NET Core 3.1 13,086.56 ns 109.310 ns 28.387 ns 148.42 0.70 - - NA
NUlid_ .NET Core 3.1 276.40 ns 6.419 ns 1.667 ns 3.13 0.02 0.0663 104 B NA
Guid_ .NET 5.0 88.72 ns 3.598 ns 0.934 ns 1.00 0.00 - - NA
UlidUpdated_ .NET 5.0 12,813.03 ns 430.020 ns 111.675 ns 144.42 1.14 - - NA
UlidOrginal_ .NET 5.0 12,600.82 ns 131.549 ns 20.357 ns 141.81 1.68 - - NA
NUlid_ .NET 5.0 264.17 ns 1.700 ns 0.263 ns 2.97 0.04 0.0663 104 B NA
Guid_ .NET 7.0 85.85 ns 1.576 ns 0.244 ns 1.00 0.00 - - NA
UlidUpdated_ .NET 7.0 5,504.00 ns 15.282 ns 3.969 ns 64.13 0.15 - - NA
UlidOrginal_ .NET 7.0 5,644.55 ns 68.978 ns 10.674 ns 65.75 0.20 - - NA
NUlid_ .NET 7.0 191.18 ns 14.316 ns 2.215 ns 2.23 0.02 0.0663 104 B NA
Method Runtime Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
Guid_ .NET Core 3.1 89.83 ns 8.693 ns 2.257 ns 1.00 0.00 - - NA
UlidUpdated_ .NET Core 3.1 13,079.05 ns 229.835 ns 59.687 ns 145.66 3.54 - - NA
UlidOrginal_ .NET Core 3.1 13,201.82 ns 38.433 ns 9.981 ns 147.03 3.70 - - NA
NUlid_ .NET Core 3.1 273.02 ns 3.563 ns 0.925 ns 3.04 0.08 0.0663 104 B NA
Guid_ .NET 5.0 88.77 ns 3.623 ns 0.941 ns 1.00 0.00 - - NA
UlidUpdated_ .NET 5.0 12,436.28 ns 65.738 ns 17.072 ns 140.11 1.63 - - NA
UlidOrginal_ .NET 5.0 12,628.54 ns 224.992 ns 58.430 ns 142.27 1.86 - - NA
NUlid_ .NET 5.0 264.49 ns 1.955 ns 0.508 ns 2.98 0.03 0.0663 104 B NA
Guid_ .NET 7.0 85.78 ns 1.533 ns 0.237 ns 1.00 0.00 - - NA
UlidUpdated_ .NET 7.0 5,489.08 ns 36.487 ns 9.475 ns 64.00 0.14 - - NA
UlidOrginal_ .NET 7.0 5,744.08 ns 113.194 ns 17.517 ns 66.97 0.36 - - NA
NUlid_ .NET 7.0 191.33 ns 37.974 ns 9.862 ns 2.25 0.11 0.0663 104 B NA

Copy link

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's quite some code duplication.
Art the officially unsupported targets (pre .NET 6) worth to vectorized them?
Can the both Ulid.cs-implementations be unified? Maybe with the help of common helper-methods if not doable otherwise.

@@ -155,21 +183,58 @@ internal Ulid(ReadOnlySpan<char> base32)
randomness8 = (byte)((CharToBase32[base32[22]] << 7) | (CharToBase32[base32[23]] << 2) | (CharToBase32[base32[24]] >> 3));
}

#if NETCOREAPP3_0_OR_GREATER
private static readonly Vector128<byte> guidVecShuffle = Vector128.Create((byte)3, 2, 1, 0, 5, 4, 7, 6, 8, 9, 10, 11, 12, 13, 14, 15);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For newer targets (.NET 6.0 onwards) this constant should be given "inline" for allowing the JIT to emit better code. As at the moment only .NET 6+ is officially supported I wouldn't bother with .NET Core 3.1 optimizations anymore. So the below #if def can be dropped. For the Shuffle you could create a private method that handles this depending on the target.

@TimothyMakkison
Copy link
Contributor Author

Art the officially unsupported targets (pre .NET 6) worth to vectorized them? As at the moment only .NET 6+ is officially supported I wouldn't bother with .NET Core 3.1 optimizations anymore.
There's quite some code duplication.

Good point, it probably isn't worth supporting them, I'll remove them unless the owner want to keep them. Should remove some bloat. I'll see if I can create one or two helper functions.

Could I ask you a question? From what I gathered from testing .Net 3, 3.1, 5 should have constants declared outside the method and .Net 7 should have constants delcared inline. Finally .Net 6 uniquely runs the same with constants declared outside or inline. Is this correct?

@gfoidl
Copy link

gfoidl commented Jan 2, 2023

.NET 6 onwards should use the vector constants inline. The JIT has then the ability to optimize the usage. It streamlines the C# code also.
Before .NET 6 it was better to have the vector constants declared outside.
See dotnet/runtime#54827 (and the linked issue) for more info.

@michelbieleveld
Copy link

Hi, seems a lot of great progress has been made. Any chance this might get merged and a new nuget version produced ?

@hadashiA
Copy link
Member

Sorry for the late reply and thanks for the great PR.
We have confirmed the changes and would be happy to merge them.
We would also like to adopt your suggestion about adding a test environment.
Thanks a lot :)

@hadashiA hadashiA merged commit 4449a82 into Cysharp:master Aug 18, 2023
@hadashiA hadashiA mentioned this pull request Aug 18, 2023
@hadashiA
Copy link
Member

There seems to be some discussion left unanswered.

Modernise the code to use C# 11 - use file scoped namespaces, simplify slices.
Enable nullable across the project and update relevant methods.

Currently, we want the same code to work in Unity 2019.3 so we may not change it until we change the supported version.

Consider adding comparison operators <, <=, >, >=.
Add 'NotNullWhen' attributes to methods such as 'TryParse'.
public string ToBase64 uses ArrayPool, stackalloc could instead be used here. The SkipLocalsInit could be used to remove the overhead of clearing the span.

This is probably a reasonable change.
We would like to consider this.

Would the logic used in the original get Time and internal Ulid(long timestampMilliseconds... work on big endian machines?

Hmm, it looks like you are right. I would like to check with the original author.

Experimental change to internal Ulid(long,..)

Although we do not want to add too much complexity to the value generation for the sake of speed, BinaryPrimitives.ReverseEndianness may be reasonable.

Thanks for the suggestions. further pr is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants