Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MemoryExtensions.SequenceCompare #20525

Closed
AronParker opened this issue Mar 8, 2017 · 22 comments
Closed

MemoryExtensions.SequenceCompare #20525

AronParker opened this issue Mar 8, 2017 · 22 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Milestone

Comments

@AronParker
Copy link

Latest Proposal

See https://github.com/dotnet/corefx/issues/16878#issuecomment-349414287

Problem:

As of right now there is no function to efficiently compare memory. The result is that such comparisons are usually done on a byte by byte basis or even worse Enumerable.SequenceEqual, which performs really slow.

Suggested APIs:

class static SpanExtensions
{
    static int SequenceCompareTo(this ReadOnlySpan<byte> first, ReadOnlySpan<byte> second);
}

Design

It would fit along nicely with other memory functions defined in Buffer such as memcpy or memmove. The arrays would have to be primitives and can be efficiently compared.

Implementation

Different implementations can be used for different lengths. Here are some examples:

  • Jump tables like this for lengths <= 16
  • comparisons in long*'s on x64, int*'s on x86 for shorter arrays
  • using SIMD-Enabled Vector Types for medium/longer arrays
  • CLR memcmp InternalCall for longer arrays.

The individual thresholds would have to be profiled of course.

Original Proposal

class static SpanExtensions
{
    static int SequenceCompare(this ReadOnlySpan<byte> first, ReadOnlySpan<byte> second);
}

[EDIT] Add C# syntax highlight for code by @karelz

@mellinoe
Copy link
Contributor

mellinoe commented Mar 9, 2017

This may be a helpful addition, but I don't think that using Array in new API's is a good idea, generally speaking. More likely we would want overloads taking T[], and perhaps void*, length ?

@karelz
Copy link
Member

karelz commented Mar 9, 2017

Might be a good idea for Span?

@AronParker
Copy link
Author

AronParker commented Mar 9, 2017

This may be a helpful addition, but I don't think that using Array in new API's is a good idea, generally speaking. More likely we would want overloads taking T[], and perhaps void*, length ?

True, I've chosen Array in order to make it analog to Buffer.BlockCopy(Array,int,Array,int,int) but T[] would probably be even more elegant, void* & length also definitely. The issue with T[] is that the efficient comparison effectively only works for primitives and structs that don't redefine Equals. If you have a reference type or a value type that redefines Equals, it'd have to call a virtual method for each element, rendering the efficiency of memcmp useless (a for loop would have been fine here for clarity).

Alternatively one could just redefine the contract to do "binary comparison" only, where for reference types it only compares its references, for struct types it only compares the entire structure regardless whether Equals is redefined. However I don't think it'd be intuitive or useful to users, so limiting it to primitives might still be useful.

Might be a good idea for Span?

Yes, Span would greatly benefit from it aswell, had it in my mind aswell. Just added it for a start.

@jamesqo
Copy link
Contributor

jamesqo commented Mar 9, 2017

Only skimmed the proposal, but I think System.Runtime.CompilerServices.Unsafe may be a better choice for proposing new APIs than Buffer as the former is not tied to the .NET Framework. There have been lots of new Unsafe APIs recently (e.g. https://github.com/dotnet/corefx/issues/16479) so you may have more success getting it in there.

@jkotas
Copy link
Member

jkotas commented Mar 9, 2017

All raw memory APIs should go to Span going forward. There is static bool SequenceEqual(this ReadOnlySpan<byte> first, ReadOnlySpan<byte> second) already. I can imagine we can have static int SequenceCompare(this ReadOnlySpan<byte> first, ReadOnlySpan<byte> second) too.

@AronParker
Copy link
Author

All raw memory APIs should go to Span going forward. There is static bool SequenceEqual(this ReadOnlySpan first, ReadOnlySpan second) already. I can imagine we can have static int SequenceCompare(this ReadOnlySpan first, ReadOnlySpan second) too.

Fair enough, however I firmly believe that it is important to have comparison functions even outside of Span's, as there are many variants of memcpy but not a single one of memcmp. Nearly all modern programming languages offer a way to compare primitive arrays in a fast, efficient way without too much overhead.

While Span is amazing and it should be there aswell, I do believe there should be a way to access that function without the overhead of a Span. While Span is a great API for working with memory efficiently, I don't think something as specific as Span should be forced onto the user for something general like comparing memory.

@jkotas
Copy link
Member

jkotas commented Mar 9, 2017

without the overhead of a Span

The overhead of Span is neglible, and it is often less than the overhead of alternatives (passing indices manually, etc.). You can watch a case study on this topic in dotnet/corefxlab#1278 .

@AronParker
Copy link
Author

The overhead of Span is neglible, and it is often less than the overhead of alternatives (passing indices manually, etc.). You can watch a case study on this topic in dotnet/corefxlab#1278 .

Alright, I didn't know that. Thanks for the quick reply! I've adjusted the API.

@ahsonkhan
Copy link
Member

I don't think something as specific as Span should be forced onto the user for something general like comparing memory.

@AronParker, does this thinking still hold?

Also, can you explain what SequenceCompare should return and in what cases? Essentially, can you provide code sample showing usage of this API?

@AronParker
Copy link
Author

@ahsonkhan No. I was actually mainly looking for a way to compare equality, not relation so SequenceEqual is actually sufficient.

@Bobris
Copy link

Bobris commented Nov 18, 2017

It is pity that this didn't fit Span/ReadOnlySpan. memcmp is critical function for embedded databases.

@jkotas
Copy link
Member

jkotas commented Nov 19, 2017

@Bobris Span APIs are still work in progress. It was mentioned above that static int SequenceCompare(this ReadOnlySpan<byte> first, ReadOnlySpan<byte> second) may be a reasonable thing to add. We just did not see a good motivation to add it so far - SequenceEqual was sufficient for @AronParker who opened this issue originally.

If you believe that SequenceCompare is an important API to have in additional to SequenceEqual, could you please provide some real examples what you would use it for, and how you would like to see it behave e.g. for spans of different lengths?

@Bobris
Copy link

Bobris commented Nov 19, 2017

@jkotas Thanks for looking at this. I need lexicographical order in my KeyValue DB. My current pour man implementation is https://github.com/Bobris/BTDB/blob/58c1bf9bbf555e98d836c6de094f431b5dfdd8f9/BTDB/Buffer/BitArrayManipulation.cs#L171. This operation is basic for any B-Tree implementation when key is just sequence of bytes.
You probably already saw this https://ayende.com/blog/169825/excerpts-from-the-ravendb-performance-team-report-optimizing-memory-compare-copy-costs ... Would like to not need to use unsafe code.

@jkotas jkotas reopened this Nov 19, 2017
@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Nov 28, 2017

I have seen several people asking for this API. I think it would be good to discuss this at the API review. @karelz, I will change the tag to ready for API review. It seems like the API proposal is now for Span APIs, which is what you asked for when you changed the tag to "needs more work".

@KrzysztofCwalina KrzysztofCwalina changed the title Add a way to efficiently compare memory MemoryExtensions.SequenceCompare Nov 28, 2017
@terrajobst
Copy link
Member

terrajobst commented Dec 5, 2017

Video

We should change the name to SequenceCompareTo (notice the To at the end).

class static SpanExtensions
{
    static int SequenceCompareTo(this ReadOnlySpan<byte> first, ReadOnlySpan<byte> second);
}

@karelz
Copy link
Member

karelz commented Dec 5, 2017

FYI: The API review discussion was recorded - see https://youtu.be/BI3iXFT8H7E?t=4684 (5 min duration)

@karelz
Copy link
Member

karelz commented Dec 5, 2017

Top post updated per API name change in API review: https://github.com/dotnet/corefx/issues/16878#issuecomment-349414287

@Bobris
Copy link

Bobris commented Dec 6, 2017

I looked at implementation mentioned in video in https://github.com/dotnet/corefxlab/blob/master/src/System.Buffers.Primitives/System/Buffers/BufferExtensions.cs and it is correct, but "slow", so hopefully it is planned to have it with optimizations from beginning, please pretty please.

@karelz
Copy link
Member

karelz commented Dec 6, 2017

@Bobris what else would you propose? Do you want to submit a PR?

@Bobris
Copy link

Bobris commented Dec 6, 2017

Not whole PR, but even "hand inlining" CompareTo to simple subtract (lengths are only positive so it is safe from wrapping) is improvement (inputs are identical):

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 2 [1703, Creators Update] (10.0.15063.726)
Processor=Intel Core i7-6500U CPU 2.50GHz (Skylake), ProcessorCount=4
Frequency=2531250 Hz, Resolution=395.0617 ns, Timer=TSC
.NET Core SDK=2.1.2
[Host] : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
DefaultJob : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT

Method Length Mean Error StdDev
ByteByByte 1 2.593 ns 0.0764 ns 0.0715 ns
ByteByByteInline 1 1.727 ns 0.0564 ns 0.0528 ns
ByteByByte 16 21.093 ns 0.4430 ns 0.3699 ns
ByteByByteInline 16 19.231 ns 0.3909 ns 0.3656 ns
ByteByByte 200 298.712 ns 5.9729 ns 9.1213 ns
ByteByByteInline 200 293.655 ns 6.0369 ns 6.9521 ns

@ahsonkhan
Copy link
Member

Should this method be a generic method just like SequenceEqual?
i.e.

public static int SequenceCompareTo<T>(this ReadOnlySpan<T> first, ReadOnlySpan<T> second) where T : IEquatable<T> { throw null; }

@jkotas
Copy link
Member

jkotas commented Dec 7, 2017

It would need to be where T : IComparable<T>.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Memory
Projects
None yet
Development

No branches or pull requests

10 participants