Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding more IndexOf variations #75175

Closed
stephentoub opened this issue Sep 7, 2022 · 8 comments
Closed

Consider adding more IndexOf variations #75175

stephentoub opened this issue Sep 7, 2022 · 8 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Memory tenet-performance Performance related issue
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Sep 7, 2022

Searching is a key aspect of many workloads. Historically, we provided just the variations that everyone needed, e.g. IndexOf. Then we expanded to IndexOfAny with a few arguments. We just recently expanded to IndexOfAnyExcept. But beyond these, our general stance has been "if you need something any more customized, it's easy to just write it yourself". And historically, it has been easy. Let's say I wanted a static int IndexOfRange(ReadOnlySpan<char> span, char lowInclusive, highInclusive); it's straightforward to implement that "in just a few lines":

static int IndexOfRange(ReadOnlySpan<char> span, char lowInclusive, char highInclusive)
{
    uint range = (uint)(highInclusive - lowInclusive);
    for (int i = 0; i < span.Length; i++)
        if ((uint)(span[i] - lowInclusive) <= range)
            return i;

    return -1;
}

Not a big deal. But now, the expectation is such operations will take advantage of vectorization, and that's no longer "just a few lines". That ends up being something more like this (untested):

static int IndexOfRange(ReadOnlySpan<char> span, char lowInclusive, char highInclusive)
{
    if (Vector128.IsHardwareAccelerated && span.Length >= Vector128<ushort>.Count)
    {
        if (Vector256.IsHardwareAccelerated && span.Length >= Vector256<ushort>.Count)
        {
            ref ushort searchSpace = ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(span));
            ref ushort current = ref searchSpace;

            Vector256<ushort> lowVector = Vector256.Create((ushort)lowInclusive);
            Vector256<ushort> rangeVector = Vector256.Create((ushort)highInclusive) - lowVector;
            Vector256<ushort> lessThanOrEqual;

            ref ushort endMinusOneVector = ref Unsafe.Add(ref current, span.Length - Vector256<ushort>.Count);
            do
            {
                lessThanOrEqual = Vector256.LessThanOrEqual(Vector256.LoadUnsafe(ref current) - lowVector, rangeVector);
                if (lessThanOrEqual != Vector256<ushort>.Zero)
                {
                    return
                        BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
                        (int)(Unsafe.ByteOffset(ref searchSpace, ref current) / Unsafe.SizeOf<ushort>());
                }

                current = ref Unsafe.Add(ref current, Vector256<ushort>.Count);
            }
            while (Unsafe.IsAddressLessThan(ref current, ref endMinusOneVector));

            lessThanOrEqual = Vector256.LessThanOrEqual(Vector256.LoadUnsafe(ref endMinusOneVector) - lowVector, rangeVector);
            if (lessThanOrEqual != Vector256<ushort>.Zero)
            {
                return
                    BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
                    (int)(Unsafe.ByteOffset(ref searchSpace, ref endMinusOneVector) / Unsafe.SizeOf<ushort>());
            }
        }
        else
        {
            ref ushort searchSpace = ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(span));
            ref ushort current = ref searchSpace;

            Vector128<ushort> lowVector = Vector128.Create((ushort)lowInclusive);
            Vector128<ushort> rangeVector = Vector128.Create((ushort)highInclusive) - lowVector;
            Vector128<ushort> lessThanOrEqual;

            ref ushort endMinusOneVector = ref Unsafe.Add(ref current, span.Length - Vector128<ushort>.Count);
            do
            {
                lessThanOrEqual = Vector128.LessThanOrEqual(Vector128.LoadUnsafe(ref current) - lowVector, rangeVector);
                if (lessThanOrEqual != Vector128<ushort>.Zero)
                {
                    return
                        BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
                        (int)(Unsafe.ByteOffset(ref searchSpace, ref current) / Unsafe.SizeOf<ushort>());
                }

                current = ref Unsafe.Add(ref current, Vector128<ushort>.Count);
            }
            while (Unsafe.IsAddressLessThan(ref current, ref endMinusOneVector));

            lessThanOrEqual = Vector128.LessThanOrEqual(Vector128.LoadUnsafe(ref endMinusOneVector) - lowVector, rangeVector);
            if (lessThanOrEqual != Vector128<ushort>.Zero)
            {
                return
                    BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
                    (int)(Unsafe.ByteOffset(ref searchSpace, ref endMinusOneVector) / Unsafe.SizeOf<ushort>());
            }
        }
    }
    else
    {
        uint range = (uint)(highInclusive - lowInclusive);
        for (int i = 0; i < span.Length; i++)
        {
            if ((uint)(span[i] - lowInclusive) <= range)
            {
                return i;
            }
        }
    }

    return -1;
}

which is closer to ~80 lines of non-trivial code, not something someone just whips up.

In addition to finding ways to reduce the development cost of this (e.g. could we make it easier to share 128/256 code paths via having those types implement an interface and employing generic specialization?), it seems like we should consider lowering our bar a bit for what we consider worthy of inclusion, such that we can provide more such operations with good vectorized implementations.

This could include:

  • IndexOfRange and IndexOfAnyExceptRange (plus "Last" variations)
  • IndexOfAsciiLetter, IndexOfAsciiDigit, etc.
  • IndexOfAny and IndexOfAnyExcept where the set of items is provided via a bitmap (ala Vectorize IndexOfAny on more than 5 chars #68328)
  • ... and probably other variations.

We should:

  • Audit dotnet/runtime and dotnet/aspnetcore to see where such implementations could be used. For example, System.Text.Json implements its own vectorized IndexOfQuoteOrAnyControlOrBackSlash (which is implemented more generally as searching for one of two specific characters or anything less than a third). And HttpValidationHelpers.cs searches for characters less than something or greater than something else (so basically two ranges instead of one).
  • Audit the regex corpus to see what patterns we're currently unable to vectorize a starting search for and see what would enable us to fix that.
  • Audit grep.app, GitHub, various 3rd party appropriately-licensed repos, etc. for the same.
  • Solicit feedback from everyone reading this about variations that would be useful to them.
  • Consider which subset should be added and do so.

Also, would it be possible to expose a generalized helper for this, where someone just needs to supply the kernel of the thing being searched for and plug that into the overall algorithm? e.g. an interface with a method accepting the vector to analyze and a scaffolded IndexOf search that uses generic specialization to make it fast to plug in the caller's code.

This concept also extends beyond IndexOf, of course. There are multiple places where we've already vectorized implementations that could be generalized to be used by others, e.g.

  • BitArray implements vectorized and, or, xor, and not algorithms. Could those be exposed for operating on arbitrary spans of bytes, used by BitArray but then also available for others to consume?
  • WebSocket implements a vectorized xor with a mask. Would anyone else benefit from that being public?
  • Enumerable has a vectorized Min/Max/Sum/Average. Could those implementations be exposed on MemoryExtensions to be usable with any span and not just an array/list we can fish out of an enumerable?
  • String.Replace implements a vectorized replace. Should that be exposed on MemoryExtensions as a generalized Replace on a span?
@stephentoub stephentoub added this to the 8.0.0 milestone Sep 7, 2022
@ghost
Copy link

ghost commented Sep 7, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Searching is a key aspect of many workloads. Historically, we provided just the variations that everyone needed, e.g. IndexOf. Then we expanded to IndexOfAny with a few arguments. We just recently expanded to IndexOfAnyExcept. But beyond these, our general stance has been "if you need something any more customized, it's easy to just write it yourself". And historically, it has been easy. Let's say I wanted a static int IndexOfRange(ReadOnlySpan<char> span, char lowInclusive, highInclusive); it's straightforward to implement that "in just a few lines":

static int IndexOfRange(ReadOnlySpan<char> span, char lowInclusive, char highInclusive)
{
    uint range = (uint)(highInclusive - lowInclusive);
    for (int i = 0; i < span.Length; i++)
        if ((uint)(span[i] - lowInclusive) <= range)
            return i;

    return -1;
}

Not a big deal. But now, the expectation is such operations will take advantage of vectorization, and that's no longer "just a few lines". That ends up being something more like this (untested):

static int IndexOfRange(ReadOnlySpan<char> span, char lowInclusive, char highInclusive)
{
    if (Vector128.IsHardwareAccelerated && span.Length >= Vector128<ushort>.Count)
    {
        if (Vector256.IsHardwareAccelerated && span.Length >= Vector256<ushort>.Count)
        {
            ref ushort searchSpace = ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(span));
            ref ushort current = ref searchSpace;

            Vector256<ushort> lowVector = Vector256.Create((ushort)lowInclusive);
            Vector256<ushort> rangeVector = Vector256.Create((ushort)highInclusive) - lowVector;
            Vector256<ushort> lessThanOrEqual;

            ref ushort endMinusOneVector = ref Unsafe.Add(ref current, span.Length - Vector256<ushort>.Count);
            do
            {
                lessThanOrEqual = Vector256.LessThanOrEqual(Vector256.LoadUnsafe(ref current) - lowVector, rangeVector);
                if (lessThanOrEqual != Vector256<ushort>.Zero)
                {
                    return
                        BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
                        (int)(Unsafe.ByteOffset(ref searchSpace, ref current) / Unsafe.SizeOf<ushort>());
                }

                current = ref Unsafe.Add(ref current, Vector256<ushort>.Count);
            }
            while (Unsafe.IsAddressLessThan(ref current, ref endMinusOneVector));

            lessThanOrEqual = Vector256.LessThanOrEqual(Vector256.LoadUnsafe(ref endMinusOneVector) - lowVector, rangeVector);
            if (lessThanOrEqual != Vector256<ushort>.Zero)
            {
                return
                    BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
                    (int)(Unsafe.ByteOffset(ref searchSpace, ref endMinusOneVector) / Unsafe.SizeOf<ushort>());
            }
        }
        else
        {
            ref ushort searchSpace = ref Unsafe.As<char, ushort>(ref MemoryMarshal.GetReference(span));
            ref ushort current = ref searchSpace;

            Vector128<ushort> lowVector = Vector128.Create((ushort)lowInclusive);
            Vector128<ushort> rangeVector = Vector128.Create((ushort)highInclusive) - lowVector;
            Vector128<ushort> lessThanOrEqual;

            ref ushort endMinusOneVector = ref Unsafe.Add(ref current, span.Length - Vector128<ushort>.Count);
            do
            {
                lessThanOrEqual = Vector128.LessThanOrEqual(Vector128.LoadUnsafe(ref current) - lowVector, rangeVector);
                if (lessThanOrEqual != Vector128<ushort>.Zero)
                {
                    return
                        BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
                        (int)(Unsafe.ByteOffset(ref searchSpace, ref current) / Unsafe.SizeOf<ushort>());
                }

                current = ref Unsafe.Add(ref current, Vector128<ushort>.Count);
            }
            while (Unsafe.IsAddressLessThan(ref current, ref endMinusOneVector));

            lessThanOrEqual = Vector128.LessThanOrEqual(Vector128.LoadUnsafe(ref endMinusOneVector) - lowVector, rangeVector);
            if (lessThanOrEqual != Vector128<ushort>.Zero)
            {
                return
                    BitOperations.TrailingZeroCount(lessThanOrEqual.ExtractMostSignificantBits()) +
                    (int)(Unsafe.ByteOffset(ref searchSpace, ref endMinusOneVector) / Unsafe.SizeOf<ushort>());
            }
        }
    }
    else
    {
        uint range = (uint)(highInclusive - lowInclusive);
        for (int i = 0; i < span.Length; i++)
        {
            if ((uint)(span[i] - lowInclusive) <= range)
            {
                return i;
            }
        }
    }

    return -1;
}

which is closer to ~80 lines of non-trivial code, not something someone just whips up.

In addition to finding ways to reduce the development cost of this (e.g. could we make it easier to share 128/256 code paths via having those types implement an interface and employing generic specialization?), it seems like we should consider lowering our bar a bit for what we consider worthy of inclusion, such that we can provide more such operations with good vectorized implementations.

This could include:

  • IndexOfRange and IndexOfAnyExceptRange (plus "Last" variations)
  • IndexOfAsciiLetter, IndexOfAsciiDigit, etc.
  • IndexOfAny and IndexOfAnyExcept where the set of items is provided via a bitmap (ala Vectorize IndexOfAny on more than 5 chars #68328)
  • ... and probably other variations.

We should:

  • Audit dotnet/runtime and dotnet/aspnetcore to see where such implementations could be used. For example, System.Text.Json implements its own vectorized IndexOfQuoteOrAnyControlOrBackSlash (which is implemented more generally as searching for one of two specific characters or anything less than a third).
  • Audit the regex corpus to see what patterns we're currently unable to vectorize a starting search for and see what would enable us to fix that.
  • Solicit feedback from everyone reading this about variations that would be useful to them.
  • Consider which subset should be added and do so.

This concept also extends beyond IndexOf, of course. There are multiple places where we've already vectorized implementations that could be generalized to be used by others, e.g.

  • BitArray implements vectorized and, or, xor, and not algorithms. Could those be exposed for operating on arbitrary spans of bytes, used by BitArray but then also available for others to consume?
  • WebSocket implements a vectorized xor with a mask. Would anyone else benefit from that being public?
  • Enumerable has a vectorized Min/Max/Sum/Average. Could those implementations be exposed on MemoryExtensions to be usable with any span and not just an array/list we can fish out of an enumerable?
  • String.Replace implements a vectorized replace. Should that be exposed on MemoryExtensions as a generalized Replace on a span?
Author: stephentoub
Assignees: -
Labels:

area-System.Memory, tenet-performance

Milestone: 8.0.0

@jkotas
Copy link
Member

jkotas commented Sep 7, 2022

An alternative to providing number of specialized APIs may be to create a source generator or something like that that produces the vectorized version from the simple form.

@jkotas
Copy link
Member

jkotas commented Sep 7, 2022

@stephentoub
Copy link
Member Author

An alternative to providing number of specialized APIs may be to create a source generator or something like that that produces the vectorized version from the simple form

It's an option; Tanner had a prototype of something similar.

But if it's possible to do that with perfect fidelity, it should be possible to do the same in the compiler/JIT with an auto-vectorizing implementation and save the dev the trouble. And if it's not possible with perfect fidelity and the generated code would be more of a starting point the dev would need to copy and mutate, it's not really addressing the core problem this aims to.

I think pursuing both approaches is reasonable, with built-in implementations for the things we ourselves use and others could as well plus the next layer of variation beyond what we already have, and then a source generator to help with the next layer still. As an example, I'd want to use some of these from the regex source generator and compiler, and it'd be nice to avoid injecting a lot of such unsafe code into the assembly that would also likely be duplicated across many assemblies.

@MihaZupan
Copy link
Member

IndexOfAsciiLetter, IndexOfAsciiDigit, etc.

Just curious, would these just delegate to #68328, or did you have a different approach in mind?

@gfoidl
Copy link
Member

gfoidl commented Oct 23, 2022

Also #76106 can be used (e.g. for IndexOfAsciiDigit).

@stephentoub
Copy link
Member Author

IndexOfAsciiLetter, IndexOfAsciiDigit, etc.

Just curious, would these just delegate to #68328, or did you have a different approach in mind?

If there were to be an IndexOfAsciiDigit, my assumption is it would just delegate to IndexOfAnyInRange. If there were to be an IndexOfAsciiLetter, my assumption is it could have a more efficient implementation than delegating to the implementation that supports any combination of ASCII chars, though that would need to be measured, e.g. using an implementation that enabled efficiently searching for two ranges.

But, warranting new APIs for these in particular would necessitate both a) showing significant usage potential and b) significantly better performance potential than just using one of the other "existing" APIs.

@tannergooding tannergooding added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Aug 14, 2023
@tannergooding tannergooding modified the milestones: 8.0.0, Future Aug 14, 2023
@stephentoub
Copy link
Member Author

The scope of work covered by this issue is done. We can always add more IndexOf helpers on an as-needed basis.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Memory tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants