New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Span IndexOf that takes an index and count #1278

Closed
wants to merge 9 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@ahsonkhan
Member

ahsonkhan commented Mar 7, 2017

  • Also fixing StartsWith impl (calling SequenceEqual)

Approximately 5% performance gain calling Span IndexOf that takes an index and count versus Slice.

 Test Name                                                                                 | Metric                | Iterations |   AVERAGE |   STDEV.S |       MIN |        MAX
:----------------------------------------------------------------------------------------- |:--------------------- |:----------:| ---------:| ---------:| ---------:| ----------:
 System.Slices.Tests.PerformanceTests.SpanIndexOfWithIndexAndCountComparison(length: 1000) | Duration              |    101     |    90.687 |    13.232 |    77.279 |    139.205
 System.Slices.Tests.PerformanceTests.SpanIndexOfWithIndexAndCountComparison(length: 1000) | Branch Mispredictions |    101     | 71984.158 | 39412.573 | 16384.000 | 2.499E+005
 System.Slices.Tests.PerformanceTests.SpanIndexOfWithIndexAndCountComparison(length: 1000) | Cache Misses          |    101     |  7705.347 |  5190.365 |     0.000 |  20480.000
 System.Slices.Tests.PerformanceTests.SpanIndexOfWithSliceComparison(length: 1000)         | Duration              |    101     |    98.103 |     8.345 |    88.652 |    127.297
 System.Slices.Tests.PerformanceTests.SpanIndexOfWithSliceComparison(length: 1000)         | Branch Mispredictions |    101     | 64724.911 | 29497.001 | 20480.000 | 1.884E+005
 System.Slices.Tests.PerformanceTests.SpanIndexOfWithSliceComparison(length: 1000)         | Cache Misses          |    101     |  8678.653 |  5771.935 |     0.000 |  28672.000

For Span.StartsWith using SequenceEqual, the break even point is around Span of size 3-4. Using SequenceEqual for span.Length > 3 gives 30%+ perf gains.

image

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan
Member

ahsonkhan commented Mar 7, 2017

@KrzysztofCwalina

This comment has been minimized.

Show comment
Hide comment
@KrzysztofCwalina

KrzysztofCwalina Mar 7, 2017

Member

So what is the break even point for IndexOf? i.e. at what point slicing and then regular indexof is fast enough such that the new IndexOf does not pay off?

Member

KrzysztofCwalina commented Mar 7, 2017

So what is the break even point for IndexOf? i.e. at what point slicing and then regular indexof is fast enough such that the new IndexOf does not pay off?

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 7, 2017

Member

So what is the break even point for IndexOf? i.e. at what point slicing and then regular indexof is fast enough such that the new IndexOf does not pay off?

Essentially, it is roughly the same for < 10 (+/- 10%) and you get performance gains for > 10 (the gains become significant in all cases by around 500). The results imply it is almost always better to use this new IndexOf rather than Slice and Index, especially if the lookup value is close to the beginning of the span.

image

Member

ahsonkhan commented Mar 7, 2017

So what is the break even point for IndexOf? i.e. at what point slicing and then regular indexof is fast enough such that the new IndexOf does not pay off?

Essentially, it is roughly the same for < 10 (+/- 10%) and you get performance gains for > 10 (the gains become significant in all cases by around 500). The results imply it is almost always better to use this new IndexOf rather than Slice and Index, especially if the lookup value is close to the beginning of the span.

image

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 7, 2017

Member

cc @KrzysztofCwalina Please take a look at the implementation and performance test results.

Member

ahsonkhan commented Mar 7, 2017

cc @KrzysztofCwalina Please take a look at the implementation and performance test results.

@KrzysztofCwalina

This comment has been minimized.

Show comment
Hide comment
@KrzysztofCwalina

KrzysztofCwalina Mar 7, 2017

Member

hmm, this is very surprising. I would think that the longer the search, the less benefit there would be as the slicing operation gets amortized, i.e. Slice is O(1) and search is O(n), so as N gets larger the sum approaches O(N).

cc: @vancem, @jkotas, @benaadams

Member

KrzysztofCwalina commented Mar 7, 2017

hmm, this is very surprising. I would think that the longer the search, the less benefit there would be as the slicing operation gets amortized, i.e. Slice is O(1) and search is O(n), so as N gets larger the sum approaches O(N).

cc: @vancem, @jkotas, @benaadams

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 7, 2017

Member

hmm, this is very surprising. I would think that the longer the search, the less benefit there would be as the slicing operation gets amortized, i.e. Slice is O(1) and search is O(n), so as N gets larger the sum approaches O(N).

Well, yes, of course Slice is O(1), but if you can slice closer to the index where the value might be found, you can pass that index to the new IndexOf overload too. In any case, if you keep the information you have between the two consistent, the results make sense.

For example, see below:
image

In any case the O(n) search should be the same. Either you index through 500 bytes or 5 bytes and that doesn't really change whether you slice first or not. With slicing, you have that extra Slice operation that you wouldn't have for IndexOf.

Am I missing something?

Member

ahsonkhan commented Mar 7, 2017

hmm, this is very surprising. I would think that the longer the search, the less benefit there would be as the slicing operation gets amortized, i.e. Slice is O(1) and search is O(n), so as N gets larger the sum approaches O(N).

Well, yes, of course Slice is O(1), but if you can slice closer to the index where the value might be found, you can pass that index to the new IndexOf overload too. In any case, if you keep the information you have between the two consistent, the results make sense.

For example, see below:
image

In any case the O(n) search should be the same. Either you index through 500 bytes or 5 bytes and that doesn't really change whether you slice first or not. With slicing, you have that extra Slice operation that you wouldn't have for IndexOf.

Am I missing something?

@KrzysztofCwalina

This comment has been minimized.

Show comment
Hide comment
@KrzysztofCwalina

KrzysztofCwalina Mar 8, 2017

Member

Yeah, it's not surprising that slicing makes the operation slower. What's surprising is that the ratio get more and more in favor of the overloads that take index as the size of the buffer (and so the index of the found byte) gets larger.

Member

KrzysztofCwalina commented Mar 8, 2017

Yeah, it's not surprising that slicing makes the operation slower. What's surprising is that the ratio get more and more in favor of the overloads that take index as the size of the buffer (and so the index of the found byte) gets larger.

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Mar 8, 2017

Member

What's surprising is that the ratio get more and more in favor of the overloads that take index as the size of the buffer get larger.

This suggests that something else going on, and the overload is not actually what's improving the performance.

Member

jkotas commented Mar 8, 2017

What's surprising is that the ratio get more and more in favor of the overloads that take index as the size of the buffer get larger.

This suggests that something else going on, and the overload is not actually what's improving the performance.

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 8, 2017

Member

What's surprising is that the ratio get more and more in favor of the overloads that take index as the size of the buffer get larger.

This suggests that something else going on, and the overload is not actually what's improving the performance.

This seems to be a strange artefact of our benchmark testing.

When I add the InlineData(1000) attribute, I get different results compared to when I have no InlineData attritubte.

        [Benchmark(InnerIterationCount = 1000000)]
        [InlineData(1000)]
        public int SpanIndexOfWithIndexAndCountComparisonValueAtEnd(int length)
        {
            const byte lookupVal = 99;
            var a = new byte[length];
            a[length-1] = lookupVal;

            var bytes = new Span<byte>(a);
            int startIndex = length/2;
            int count = length/2;
            var result = -1;

            foreach (var iteration in Benchmark.Iterations)
            {
                using (iteration.StartMeasurement())
                {
                    for (int i = 0; i < Benchmark.InnerIterationCount; i++)
                    {
                        result = bytes.IndexOf(lookupVal, startIndex, count);
                    }
                }
            }
            Console.WriteLine(result);
            return result;
        }

Runtime: 217.668 (much faster than Slice)

        [Benchmark(InnerIterationCount = 1000000)]
        public int SpanIndexOfWithIndexAndCountComparisonValueAtEnd()
        {
            const byte lookupVal = 99;
            var a = new byte[1000];
            a[999] = lookupVal;

            var bytes = new Span<byte>(a);
            const int startIndex = 500;
            const int count = 500;
            var result = -1;

            foreach (var iteration in Benchmark.Iterations)
            {
                using (iteration.StartMeasurement())
                {
                    for (int i = 0; i < Benchmark.InnerIterationCount; i++)
                    {
                        result = bytes.IndexOf(lookupVal, startIndex, count);
                    }
                }
            }
            Console.WriteLine(result);
            return result;
        }

Runtime: 394.825 (similar to Slice)

For some reason the results remain consistent for SpanIndexOfWithSliceComparisonValueAtEnd
InlineData Attribute: 393.290
No InlineData Attribute: 393.135

cc @jorive, why is this discrepancy occurring?

image

Member

ahsonkhan commented Mar 8, 2017

What's surprising is that the ratio get more and more in favor of the overloads that take index as the size of the buffer get larger.

This suggests that something else going on, and the overload is not actually what's improving the performance.

This seems to be a strange artefact of our benchmark testing.

When I add the InlineData(1000) attribute, I get different results compared to when I have no InlineData attritubte.

        [Benchmark(InnerIterationCount = 1000000)]
        [InlineData(1000)]
        public int SpanIndexOfWithIndexAndCountComparisonValueAtEnd(int length)
        {
            const byte lookupVal = 99;
            var a = new byte[length];
            a[length-1] = lookupVal;

            var bytes = new Span<byte>(a);
            int startIndex = length/2;
            int count = length/2;
            var result = -1;

            foreach (var iteration in Benchmark.Iterations)
            {
                using (iteration.StartMeasurement())
                {
                    for (int i = 0; i < Benchmark.InnerIterationCount; i++)
                    {
                        result = bytes.IndexOf(lookupVal, startIndex, count);
                    }
                }
            }
            Console.WriteLine(result);
            return result;
        }

Runtime: 217.668 (much faster than Slice)

        [Benchmark(InnerIterationCount = 1000000)]
        public int SpanIndexOfWithIndexAndCountComparisonValueAtEnd()
        {
            const byte lookupVal = 99;
            var a = new byte[1000];
            a[999] = lookupVal;

            var bytes = new Span<byte>(a);
            const int startIndex = 500;
            const int count = 500;
            var result = -1;

            foreach (var iteration in Benchmark.Iterations)
            {
                using (iteration.StartMeasurement())
                {
                    for (int i = 0; i < Benchmark.InnerIterationCount; i++)
                    {
                        result = bytes.IndexOf(lookupVal, startIndex, count);
                    }
                }
            }
            Console.WriteLine(result);
            return result;
        }

Runtime: 394.825 (similar to Slice)

For some reason the results remain consistent for SpanIndexOfWithSliceComparisonValueAtEnd
InlineData Attribute: 393.290
No InlineData Attribute: 393.135

cc @jorive, why is this discrepancy occurring?

image

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 8, 2017

Member

I talked to @jorive offline and provided the details of the test runs to him for investigation.
Running the tests multiple times shows the above mentioned inconsistency in the results.

Going back to:

So what is the break even point for IndexOf? i.e. at what point slicing and then regular indexof is fast enough such that the new IndexOf does not pay off?

I am not sure if this question is relevant.
We have

_sResult += bytes.IndexOf(lookupVal, startIndex, count);

VS

_sResult += bytes.Slice(startIndex, count).IndexOf(lookupVal);

Both of these loop through the same amount of bytes to find the lookupVal, but the second one has the Slice overhead. They should be on par with each other except for the slice overhead, seen below:
image

Member

ahsonkhan commented Mar 8, 2017

I talked to @jorive offline and provided the details of the test runs to him for investigation.
Running the tests multiple times shows the above mentioned inconsistency in the results.

Going back to:

So what is the break even point for IndexOf? i.e. at what point slicing and then regular indexof is fast enough such that the new IndexOf does not pay off?

I am not sure if this question is relevant.
We have

_sResult += bytes.IndexOf(lookupVal, startIndex, count);

VS

_sResult += bytes.Slice(startIndex, count).IndexOf(lookupVal);

Both of these loop through the same amount of bytes to find the lookupVal, but the second one has the Slice overhead. They should be on par with each other except for the slice overhead, seen below:
image

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Mar 8, 2017

Member

I have run a few experiments myself since these numbers do not make sense to me.

As far as I can tell, for fast Span - the Slice+IndexOf is always same or better than the IndexOf that you are adding - when everything else is equal.

I think that the reason why you are seeing improvements from the IndexOf that takes index are some secondary effects (different register allocations). We need to rather understand and address these secondary effects than to add this IndexOf overload.

Here is the disassembly for the simplest one of my experiments, I have also tried more complex cases with varying indices. The results were always same or better for Slice+IndexOf.

byte[] a = new byte[100];
Span<byte> span = new Span<byte>(a);
for (int i = 0; i < 2000000000; i++)
    span.Slice(1, 10).IndexOf(0) vs. span.IndexOf(0, 1, 10)
Span.Slice(1, 10).IndexOf(0)
2G iterations = 5030ms, 19 instructions per iteration

00007ffb`319304ac 488b4c2430      mov     rcx,qword ptr [rsp+30h] ss:0000004b`e137de20=000001e58d1ca520
00007ffb`319304b1 48ffc1          inc     rcx
00007ffb`319304b4 48894c2428      mov     qword ptr [rsp+28h],rcx
00007ffb`319304b9 488b4c2428      mov     rcx,qword ptr [rsp+28h]
00007ffb`319304be 41b80a000000    mov     r8d,0Ah
00007ffb`319304c4 33d2            xor     edx,edx
00007ffb`319304c6 e8c5fbffff      call    IndexOf(Byte ByRef, Byte, Int32) (00007ffb`31930090)

00007ffb`31930520 41b9ffffffff    mov     r9d,0FFFFFFFFh
00007ffb`31930526 4183f808        cmp     r8d,8
00007ffb`3193052a 0f8cca000000    jl      test!Extensions.IndexOf(Byte ByRef, Byte, Int32)+0xda (00007ffb`319305fa)
00007ffb`31930530 440fb6d2        movzx   r10d,dl
00007ffb`31930534 41ffc1          inc     r9d
00007ffb`31930537 4963c1          movsxd  rax,r9d
00007ffb`3193053a 0fb60401        movzx   eax,byte ptr [rcx+rax]
00007ffb`3193053e 413bc2          cmp     eax,r10d
00007ffb`31930541 7508            jne     test!Extensions.IndexOf(Byte ByRef, Byte, Int32)+0x2b (00007ffb`3193054b)
00007ffb`31930543 418bc1          mov     eax,r9d
00007ffb`31930546 e93d010000      jmp     test!Extensions.IndexOf(Byte ByRef, Byte, Int32)+0x168 (00007ffb`31930688)

00007ffb`31930688 c3              ret
span.IndexOf(0, 1,10)
2G iterations = 6343ms, 27 instructions per iteration

00007ffb`319404ac 488b4c2430      mov     rcx,qword ptr [rsp+30h]
00007ffb`319404b1 c744242064000000 mov     dword ptr [rsp+20h],64h
00007ffb`319404b9 ba01000000      mov     edx,1
00007ffb`319404be 41b80a000000    mov     r8d,0Ah
00007ffb`319404c4 4533c9          xor     r9d,r9d
00007ffb`319404c7 e8e4fbffff      call    IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32) (00007ffb`319400b0)

IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)
00007ffb`31940520 8b442428        mov     eax,dword ptr [rsp+28h] ss:00000093`51b7e380=00000064
00007ffb`31940524 3bd0            cmp     edx,eax
00007ffb`31940526 7d05            jge     test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0xd (00007ffb`3194052d)
00007ffb`31940528 4585c0          test    r8d,r8d
00007ffb`3194052b 750a            jne     test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0x17 (00007ffb`31940537)
..
00007ffb`31940537 2bc2            sub     eax,edx
00007ffb`31940539 443bc0          cmp     r8d,eax
00007ffb`3194053c 7e02            jle     test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0x20 (00007ffb`31940540)
..
00007ffb`31940540 418bc0          mov     eax,r8d
00007ffb`31940543 ffca            dec     edx
00007ffb`31940545 83f808          cmp     eax,8
00007ffb`31940548 0f8cc0000000    jl      test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0xee (00007ffb`3194060e)
00007ffb`3194054e 450fb6c1        movzx   r8d,r9b
00007ffb`31940552 ffc2            inc     edx
00007ffb`31940554 4c63d2          movsxd  r10,edx
00007ffb`31940557 460fb61411      movzx   r10d,byte ptr [rcx+r10]
00007ffb`3194055c 453bd0          cmp     r10d,r8d
00007ffb`3194055f 7507            jne     test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0x48 (00007ffb`31940568)
00007ffb`31940561 8bc2            mov     eax,edx
00007ffb`31940563 e929010000      jmp     test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0x171 (00007ffb`31940691)

00007ffb`31940691 c3              ret
Member

jkotas commented Mar 8, 2017

I have run a few experiments myself since these numbers do not make sense to me.

As far as I can tell, for fast Span - the Slice+IndexOf is always same or better than the IndexOf that you are adding - when everything else is equal.

I think that the reason why you are seeing improvements from the IndexOf that takes index are some secondary effects (different register allocations). We need to rather understand and address these secondary effects than to add this IndexOf overload.

Here is the disassembly for the simplest one of my experiments, I have also tried more complex cases with varying indices. The results were always same or better for Slice+IndexOf.

byte[] a = new byte[100];
Span<byte> span = new Span<byte>(a);
for (int i = 0; i < 2000000000; i++)
    span.Slice(1, 10).IndexOf(0) vs. span.IndexOf(0, 1, 10)
Span.Slice(1, 10).IndexOf(0)
2G iterations = 5030ms, 19 instructions per iteration

00007ffb`319304ac 488b4c2430      mov     rcx,qword ptr [rsp+30h] ss:0000004b`e137de20=000001e58d1ca520
00007ffb`319304b1 48ffc1          inc     rcx
00007ffb`319304b4 48894c2428      mov     qword ptr [rsp+28h],rcx
00007ffb`319304b9 488b4c2428      mov     rcx,qword ptr [rsp+28h]
00007ffb`319304be 41b80a000000    mov     r8d,0Ah
00007ffb`319304c4 33d2            xor     edx,edx
00007ffb`319304c6 e8c5fbffff      call    IndexOf(Byte ByRef, Byte, Int32) (00007ffb`31930090)

00007ffb`31930520 41b9ffffffff    mov     r9d,0FFFFFFFFh
00007ffb`31930526 4183f808        cmp     r8d,8
00007ffb`3193052a 0f8cca000000    jl      test!Extensions.IndexOf(Byte ByRef, Byte, Int32)+0xda (00007ffb`319305fa)
00007ffb`31930530 440fb6d2        movzx   r10d,dl
00007ffb`31930534 41ffc1          inc     r9d
00007ffb`31930537 4963c1          movsxd  rax,r9d
00007ffb`3193053a 0fb60401        movzx   eax,byte ptr [rcx+rax]
00007ffb`3193053e 413bc2          cmp     eax,r10d
00007ffb`31930541 7508            jne     test!Extensions.IndexOf(Byte ByRef, Byte, Int32)+0x2b (00007ffb`3193054b)
00007ffb`31930543 418bc1          mov     eax,r9d
00007ffb`31930546 e93d010000      jmp     test!Extensions.IndexOf(Byte ByRef, Byte, Int32)+0x168 (00007ffb`31930688)

00007ffb`31930688 c3              ret
span.IndexOf(0, 1,10)
2G iterations = 6343ms, 27 instructions per iteration

00007ffb`319404ac 488b4c2430      mov     rcx,qword ptr [rsp+30h]
00007ffb`319404b1 c744242064000000 mov     dword ptr [rsp+20h],64h
00007ffb`319404b9 ba01000000      mov     edx,1
00007ffb`319404be 41b80a000000    mov     r8d,0Ah
00007ffb`319404c4 4533c9          xor     r9d,r9d
00007ffb`319404c7 e8e4fbffff      call    IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32) (00007ffb`319400b0)

IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)
00007ffb`31940520 8b442428        mov     eax,dword ptr [rsp+28h] ss:00000093`51b7e380=00000064
00007ffb`31940524 3bd0            cmp     edx,eax
00007ffb`31940526 7d05            jge     test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0xd (00007ffb`3194052d)
00007ffb`31940528 4585c0          test    r8d,r8d
00007ffb`3194052b 750a            jne     test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0x17 (00007ffb`31940537)
..
00007ffb`31940537 2bc2            sub     eax,edx
00007ffb`31940539 443bc0          cmp     r8d,eax
00007ffb`3194053c 7e02            jle     test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0x20 (00007ffb`31940540)
..
00007ffb`31940540 418bc0          mov     eax,r8d
00007ffb`31940543 ffca            dec     edx
00007ffb`31940545 83f808          cmp     eax,8
00007ffb`31940548 0f8cc0000000    jl      test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0xee (00007ffb`3194060e)
00007ffb`3194054e 450fb6c1        movzx   r8d,r9b
00007ffb`31940552 ffc2            inc     edx
00007ffb`31940554 4c63d2          movsxd  r10,edx
00007ffb`31940557 460fb61411      movzx   r10d,byte ptr [rcx+r10]
00007ffb`3194055c 453bd0          cmp     r10d,r8d
00007ffb`3194055f 7507            jne     test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0x48 (00007ffb`31940568)
00007ffb`31940561 8bc2            mov     eax,edx
00007ffb`31940563 e929010000      jmp     test!IndexOfHelper(Byte ByRef, Int32, Int32, Byte, Int32)+0x171 (00007ffb`31940691)

00007ffb`31940691 c3              ret
@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Mar 8, 2017

Member

Also, the inlined return index; calls within the loop are bad for performance because of the known deficiency in the JIT when dealing with these.

Instead of adding this new IndexOf overload, you should look into changing the IndexOf code in corefx to do goto Done; + with just one return label at the end Done: return index;. It should make the IndexOf quite a bit faster.

Member

jkotas commented Mar 8, 2017

Also, the inlined return index; calls within the loop are bad for performance because of the known deficiency in the JIT when dealing with these.

Instead of adding this new IndexOf overload, you should look into changing the IndexOf code in corefx to do goto Done; + with just one return label at the end Done: return index;. It should make the IndexOf quite a bit faster.

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 8, 2017

Member

As far as I can tell, for fast Span - the Slice+IndexOf is always same or better than the IndexOf that you are adding - when everything else is equal.

I ran tests using the stopwatch and agree with the results that you see @jkotas

I ran tests using the stopwatch and disagree with the results that you see @jkotas

            Stopwatch stopWatch = new Stopwatch();

            byte[] a = new byte[100];
            Span<byte> span = new Span<byte>(a);
            stopWatch.Restart();
            for (int i = 0; i < 2000000000; i++)
                span.Slice(1, 10).IndexOf(0);
            Output(stopWatch.Elapsed);

            a = new byte[100];
            span = new Span<byte>(a);
            stopWatch.Restart();
            for (int i = 0; i < 2000000000; i++)
                span.IndexOf(0, 1, 10);
            Output(stopWatch.Elapsed);

RunTime 00:00:08.91 // Slice
RunTime 00:00:08.93 // IndexOf

There is some discrepancy within the results that come from benchmarks.

One thing that was recommended by @jorive to get more consistent results while benchmarking was to run the perfharness with high priority with affinity to a specific core.
Example, run:

START "title" /B /WAIT /HIGH /AFFINITY 0x2 dotnet.exe run -c Release -- --assembly System.Slices.Tests --perf:typenames System.Slices.Tests.PerformanceTests

This does give more consistent results.

The next step would be to figure out why the results through xunit.perf show IndexOf as almost 2x faster in some cases.

            const int length = 9;

            GetDataValueAtEnd(length, out Span<byte> bytes, out int startIndex, out int count);
            stopWatch.Restart();
            for (int i = 0; i < 200000000; i++)
                TestBenchIndexOf(bytes, startIndex, count);
            Output(stopWatch.Elapsed);

            GetDataValueAtEnd(length, out bytes, out startIndex, out count);
            stopWatch.Restart();
            for (int i = 0; i < 200000000; i++)
                TestBenchSlice(bytes, startIndex, count);
            Output(stopWatch.Elapsed);

RunTime 00:00:04.93 // Slice
RunTime 00:00:03.27 // IndexOf

Member

ahsonkhan commented Mar 8, 2017

As far as I can tell, for fast Span - the Slice+IndexOf is always same or better than the IndexOf that you are adding - when everything else is equal.

I ran tests using the stopwatch and agree with the results that you see @jkotas

I ran tests using the stopwatch and disagree with the results that you see @jkotas

            Stopwatch stopWatch = new Stopwatch();

            byte[] a = new byte[100];
            Span<byte> span = new Span<byte>(a);
            stopWatch.Restart();
            for (int i = 0; i < 2000000000; i++)
                span.Slice(1, 10).IndexOf(0);
            Output(stopWatch.Elapsed);

            a = new byte[100];
            span = new Span<byte>(a);
            stopWatch.Restart();
            for (int i = 0; i < 2000000000; i++)
                span.IndexOf(0, 1, 10);
            Output(stopWatch.Elapsed);

RunTime 00:00:08.91 // Slice
RunTime 00:00:08.93 // IndexOf

There is some discrepancy within the results that come from benchmarks.

One thing that was recommended by @jorive to get more consistent results while benchmarking was to run the perfharness with high priority with affinity to a specific core.
Example, run:

START "title" /B /WAIT /HIGH /AFFINITY 0x2 dotnet.exe run -c Release -- --assembly System.Slices.Tests --perf:typenames System.Slices.Tests.PerformanceTests

This does give more consistent results.

The next step would be to figure out why the results through xunit.perf show IndexOf as almost 2x faster in some cases.

            const int length = 9;

            GetDataValueAtEnd(length, out Span<byte> bytes, out int startIndex, out int count);
            stopWatch.Restart();
            for (int i = 0; i < 200000000; i++)
                TestBenchIndexOf(bytes, startIndex, count);
            Output(stopWatch.Elapsed);

            GetDataValueAtEnd(length, out bytes, out startIndex, out count);
            stopWatch.Restart();
            for (int i = 0; i < 200000000; i++)
                TestBenchSlice(bytes, startIndex, count);
            Output(stopWatch.Elapsed);

RunTime 00:00:04.93 // Slice
RunTime 00:00:03.27 // IndexOf

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 8, 2017

Member

We need to rather understand and address these secondary effects than to add this IndexOf overload.
Here is the disassembly for the simplest one of my experiments,

Is there any insight from looking at the disassembly? If we trust our results, the IndexOf overload isn't faster in any scenario, so there doesn't seem to be any secondary effects that is improving performance.

Member

ahsonkhan commented Mar 8, 2017

We need to rather understand and address these secondary effects than to add this IndexOf overload.
Here is the disassembly for the simplest one of my experiments,

Is there any insight from looking at the disassembly? If we trust our results, the IndexOf overload isn't faster in any scenario, so there doesn't seem to be any secondary effects that is improving performance.

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Mar 8, 2017

Member

Right, there are not any unexpected secondary effects in the piece of disassembly that I have shared.

My understanding was that we have seen the improvement from it somewhere (correct me if I am wrong) ... it is what to drill into if it is the case.

Member

jkotas commented Mar 8, 2017

Right, there are not any unexpected secondary effects in the piece of disassembly that I have shared.

My understanding was that we have seen the improvement from it somewhere (correct me if I am wrong) ... it is what to drill into if it is the case.

@adamsitnik

This comment has been minimized.

Show comment
Hide comment
@adamsitnik

adamsitnik Mar 8, 2017

Member

@ahsonkhan @KrzysztofCwalina @jkotas have you guys tried BenchmarkDotNet? We have spent hundreds (or rather thousands) of hours to build a reliable library for benchmarking .NET code.

how the first of your benchmark would look like with BenchmarkDotNet:

    public class PerformanceTests
    {
        ReadOnlySpan<byte> bytes;
        ReadOnlySpan<byte> slice;
        
        [Params(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100, 500, 1000, 5000)]
        public int Length { get; set; }
        
        [Setup]
        public void Setup()
        {
            byte[] a = new byte[Length];
            byte[] b = {1, 2, 3, 4, 5};

            for (int i = 0; i < a.Length; i++)
            {
                a[i] = (byte)(i + 1);
            }

            bytes = new ReadOnlySpan<byte>(a);
            slice = new ReadOnlySpan<byte>(b, 0, Math.Min(length, b.Length));
        }
        
        [Benchmark]
        public bool SpanStartsWith() => bytes.StartsWith(slice);
}

and then to run it:

    class Program
    {
        static void Main(string[] args)
        {
            BenchmarkRunner.Run<PerformanceTests>();
        }
    }
Member

adamsitnik commented Mar 8, 2017

@ahsonkhan @KrzysztofCwalina @jkotas have you guys tried BenchmarkDotNet? We have spent hundreds (or rather thousands) of hours to build a reliable library for benchmarking .NET code.

how the first of your benchmark would look like with BenchmarkDotNet:

    public class PerformanceTests
    {
        ReadOnlySpan<byte> bytes;
        ReadOnlySpan<byte> slice;
        
        [Params(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100, 500, 1000, 5000)]
        public int Length { get; set; }
        
        [Setup]
        public void Setup()
        {
            byte[] a = new byte[Length];
            byte[] b = {1, 2, 3, 4, 5};

            for (int i = 0; i < a.Length; i++)
            {
                a[i] = (byte)(i + 1);
            }

            bytes = new ReadOnlySpan<byte>(a);
            slice = new ReadOnlySpan<byte>(b, 0, Math.Min(length, b.Length));
        }
        
        [Benchmark]
        public bool SpanStartsWith() => bytes.StartsWith(slice);
}

and then to run it:

    class Program
    {
        static void Main(string[] args)
        {
            BenchmarkRunner.Run<PerformanceTests>();
        }
    }
@KrzysztofCwalina

This comment has been minimized.

Show comment
Hide comment
@KrzysztofCwalina

KrzysztofCwalina Mar 8, 2017

Member

Yeah, we have. In general I love BenchmarkDotNet and use it in many cases. It does have better front end reporting capabilities and is more mature from reliability and statistical analysis perspective. But it seems like BenchmarkDotNet does not support measuring instructions retired, cache misses, and branch mispredictions that we often find very useful. Would be great to combine the efforts/features :-)

cc: @cmckinsey

Member

KrzysztofCwalina commented Mar 8, 2017

Yeah, we have. In general I love BenchmarkDotNet and use it in many cases. It does have better front end reporting capabilities and is more mature from reliability and statistical analysis perspective. But it seems like BenchmarkDotNet does not support measuring instructions retired, cache misses, and branch mispredictions that we often find very useful. Would be great to combine the efforts/features :-)

cc: @cmckinsey

@tannergooding

This comment has been minimized.

Show comment
Hide comment
@tannergooding

tannergooding Mar 8, 2017

Member

@KrzysztofCwalina, for Instructions Retired, you need [assembly: MeasureInstructionsRetired].

I also know cache misses and branch mispredictions are also supported, as they are displayed on the internal benchmark comparison site (I'm not sure how to enable those ones though).

Member

tannergooding commented Mar 8, 2017

@KrzysztofCwalina, for Instructions Retired, you need [assembly: MeasureInstructionsRetired].

I also know cache misses and branch mispredictions are also supported, as they are displayed on the internal benchmark comparison site (I'm not sure how to enable those ones though).

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 8, 2017

Member

I also know cache misses and branch mispredictions are also supported, as they are displayed on the internal benchmark comparison site (I'm not sure how to enable those ones though).

Are you referring to BenchmarkDotNet?

Member

ahsonkhan commented Mar 8, 2017

I also know cache misses and branch mispredictions are also supported, as they are displayed on the internal benchmark comparison site (I'm not sure how to enable those ones though).

Are you referring to BenchmarkDotNet?

@tannergooding

This comment has been minimized.

Show comment
Hide comment
@tannergooding

tannergooding Mar 8, 2017

Member

Ah, no, I mixed up https://github.com/dotnet/BenchmarkDotNet with https://github.com/Microsoft/xunit-performance (we use the latter for most of the official perf tests in CoreCLR/CoreFX/Roslyn/etc)

Member

tannergooding commented Mar 8, 2017

Ah, no, I mixed up https://github.com/dotnet/BenchmarkDotNet with https://github.com/Microsoft/xunit-performance (we use the latter for most of the official perf tests in CoreCLR/CoreFX/Roslyn/etc)

@KrzysztofCwalina

This comment has been minimized.

Show comment
Hide comment
@KrzysztofCwalina

KrzysztofCwalina Mar 8, 2017

Member

@jorive, you might want to look into this thread too. It would be great to do some statistical analysis on xunit.perf results, e.g. throw away outliers.

cc: @vancem

Member

KrzysztofCwalina commented Mar 8, 2017

@jorive, you might want to look into this thread too. It would be great to do some statistical analysis on xunit.perf results, e.g. throw away outliers.

cc: @vancem

@jorive

This comment has been minimized.

Show comment
Hide comment
@jorive

jorive Mar 8, 2017

Member

@KrzysztofCwalina I've been looking at this thread and working with @ahsonkhan. There are quite few interesting points here, and I'm compiling my results as I'm typing this :)

cc: @valenis, @brianrob

Member

jorive commented Mar 8, 2017

@KrzysztofCwalina I've been looking at this thread and working with @ahsonkhan. There are quite few interesting points here, and I'm compiling my results as I'm typing this :)

cc: @valenis, @brianrob

ahsonkhan added some commits Mar 8, 2017

@jorive

This comment has been minimized.

Show comment
Hide comment
@jorive

jorive Mar 9, 2017

Member

@KrzysztofCwalina I talked to Valentin, and he provided me with more information about the benchmarks are trying to achieve, and it seemed that the body of the loop was intentionally made such that it is loop invariant. Based on that, I did few runs and collected some statistics.

First, I modified the benchmarks and now they look like the code below. Essentially, I removed the whole setup part of the benchmark to be passed as a parameter to the benchmark when it is invoked:

[Benchmark(InnerIterationCount = 100000)]
[MemberData(nameof(InputData))]
public static int IndexOf_(byte[] array, int startIndex, int count)
{
    var bytes = new Span<byte>(array);
    var result = -1;

    foreach (var iteration in Benchmark.Iterations)
    {
        using (iteration.StartMeasurement())
        {
            for (int i = 0; i < Benchmark.InnerIterationCount; ++i)
            {
                result = bytes.IndexOf(LookupVal, startIndex, count);
            }
        }
    }

    return result;
}

[Benchmark(InnerIterationCount = 100000)]
[MemberData(nameof(InputData))]
public static int Slice_(byte[] array, int startIndex, int count)
{
    var bytes = new Span<byte>(array);
    var result = -1;

    foreach (var iteration in Benchmark.Iterations)
    {
        using (iteration.StartMeasurement())
        {
            for (int i = 0; i < Benchmark.InnerIterationCount; ++i)
            {
                result = bytes.Slice(startIndex, count).IndexOf(LookupVal);
            }
        }
    }

    return result;
}

public static IEnumerable<object[]> InputData()
{
    int[] lengths = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100, 500, 1000, 5000 };

    foreach (var length in lengths)
    {
        var array = new byte[length];
        array[length - 1] = LookupVal;
        int startIndex = length / 2;
        int count = length - startIndex;

        // NOTE: Currently the types in the array below need to be serializable.
        yield return new object[] { array, startIndex, count };
    }
}

private const byte LookupVal = 99;

Second, I did some ran the benchmarks ten times and collected some results:

Name IndexOf(avg) IndexOf(StdDev) IndexOf(stddev.s/avg) Slice(avg) Slice(StdDev) Slice(stddev.s/avg) Ratio(IndexOf/Slice) IsReal("REAL" IF abs(Ratio(IndexOf/Slice)) >= 2 * MAX(IndexOf(stddev.s/avg), Slice(stddev.s/avg)) ELSE "NOT REAL")
(array: [0, 0, 0, 0, 0, ...], startIndex: 25, count: 25) 2.19731223 0.222132801 10.11% 2.282528068 0.126808596 5.56% 0.96 NOT REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 250, count: 250) 12.46010155 0.384011658 3.08% 18.75715528 1.47832642 7.88% 0.66 REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 2500, count: 2500) 104.9715245 2.994452229 2.85% 157.5402025 6.236411082 3.96% 0.67 REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 3, count: 3) 0.904014435 0.027864577 3.08% 1.062126855 0.033305119 3.14% 0.85 REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 3, count: 4) 0.7846813 0.041208626 5.25% 0.859285643 0.062551198 7.28% 0.91 NOT REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 4, count: 4) 0.781107262 0.03945544 5.05% 0.962885996 0.215744747 22.41% 0.81 NOT REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 4, count: 5) 0.802975723 0.039544351 4.92% 0.964492546 0.036538671 3.79% 0.83 REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 5, count: 5) 0.830431613 0.077495771 9.33% 0.972801272 0.059066601 6.07% 0.85 NOT REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 50, count: 50) 4.112970229 0.092136674 2.24% 3.991588352 0.266916848 6.69% 1.03 NOT REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 500, count: 500) 22.85716477 0.684116196 2.99% 31.69440661 0.49522187 1.56% 0.72 REAL
(array: [0, 0, 0, 0, 99], startIndex: 2, count: 3) 0.911585888 0.030334049 3.33% 1.074646253 0.026142364 2.43% 0.85 REAL
(array: [0, 0, 0, 99], startIndex: 2, count: 2) 0.768047026 0.023549149 3.07% 0.962447794 0.098430024 10.23% 0.80 NOT REAL
(array: [0, 0, 99], startIndex: 1, count: 2) 0.799875239 0.070670774 8.84% 0.91428837 0.027582951 3.02% 0.87 NOT REAL
(array: [0, 99], startIndex: 1, count: 1) 0.657399807 0.045238182 6.88% 0.79014699 0.102281104 12.94% 0.83 NOT REAL
(array: [99], startIndex: 0, count: 1) 0.652204394 0.032368207 4.96% 0.757082289 0.026939294 3.56% 0.86 REAL

The table above compares the IndexOf vs. the Slice implementations. Then, I compute the ratio of their averages [base/diff == IndexOf/Slice] and added a last column "IsReal". This column determines if a result if to be trusted by comparing the ratio against two times the maximum of the computed sampled standard deviation for both base and diff.

In general, the results are very noisy (this is even after ignoring the first iteration of a run for every test, a fix I will add to my next pr).

I want to mention I have run the benchmark this way:

REM Running the process as high priority on core #2
START "SIMULATOR_PIPE" /B /WAIT /HIGH /AFFINITY 0x2 dotnet.exe run -c release

@KrzysztofCwalina @jkotas Please let me know if you have any questions or concerns regarding the performance API, writing benchmarks, or on how I have collected this data (I can share ETL traces generated by the API).

Regards,
-José

Member

jorive commented Mar 9, 2017

@KrzysztofCwalina I talked to Valentin, and he provided me with more information about the benchmarks are trying to achieve, and it seemed that the body of the loop was intentionally made such that it is loop invariant. Based on that, I did few runs and collected some statistics.

First, I modified the benchmarks and now they look like the code below. Essentially, I removed the whole setup part of the benchmark to be passed as a parameter to the benchmark when it is invoked:

[Benchmark(InnerIterationCount = 100000)]
[MemberData(nameof(InputData))]
public static int IndexOf_(byte[] array, int startIndex, int count)
{
    var bytes = new Span<byte>(array);
    var result = -1;

    foreach (var iteration in Benchmark.Iterations)
    {
        using (iteration.StartMeasurement())
        {
            for (int i = 0; i < Benchmark.InnerIterationCount; ++i)
            {
                result = bytes.IndexOf(LookupVal, startIndex, count);
            }
        }
    }

    return result;
}

[Benchmark(InnerIterationCount = 100000)]
[MemberData(nameof(InputData))]
public static int Slice_(byte[] array, int startIndex, int count)
{
    var bytes = new Span<byte>(array);
    var result = -1;

    foreach (var iteration in Benchmark.Iterations)
    {
        using (iteration.StartMeasurement())
        {
            for (int i = 0; i < Benchmark.InnerIterationCount; ++i)
            {
                result = bytes.Slice(startIndex, count).IndexOf(LookupVal);
            }
        }
    }

    return result;
}

public static IEnumerable<object[]> InputData()
{
    int[] lengths = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 100, 500, 1000, 5000 };

    foreach (var length in lengths)
    {
        var array = new byte[length];
        array[length - 1] = LookupVal;
        int startIndex = length / 2;
        int count = length - startIndex;

        // NOTE: Currently the types in the array below need to be serializable.
        yield return new object[] { array, startIndex, count };
    }
}

private const byte LookupVal = 99;

Second, I did some ran the benchmarks ten times and collected some results:

Name IndexOf(avg) IndexOf(StdDev) IndexOf(stddev.s/avg) Slice(avg) Slice(StdDev) Slice(stddev.s/avg) Ratio(IndexOf/Slice) IsReal("REAL" IF abs(Ratio(IndexOf/Slice)) >= 2 * MAX(IndexOf(stddev.s/avg), Slice(stddev.s/avg)) ELSE "NOT REAL")
(array: [0, 0, 0, 0, 0, ...], startIndex: 25, count: 25) 2.19731223 0.222132801 10.11% 2.282528068 0.126808596 5.56% 0.96 NOT REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 250, count: 250) 12.46010155 0.384011658 3.08% 18.75715528 1.47832642 7.88% 0.66 REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 2500, count: 2500) 104.9715245 2.994452229 2.85% 157.5402025 6.236411082 3.96% 0.67 REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 3, count: 3) 0.904014435 0.027864577 3.08% 1.062126855 0.033305119 3.14% 0.85 REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 3, count: 4) 0.7846813 0.041208626 5.25% 0.859285643 0.062551198 7.28% 0.91 NOT REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 4, count: 4) 0.781107262 0.03945544 5.05% 0.962885996 0.215744747 22.41% 0.81 NOT REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 4, count: 5) 0.802975723 0.039544351 4.92% 0.964492546 0.036538671 3.79% 0.83 REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 5, count: 5) 0.830431613 0.077495771 9.33% 0.972801272 0.059066601 6.07% 0.85 NOT REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 50, count: 50) 4.112970229 0.092136674 2.24% 3.991588352 0.266916848 6.69% 1.03 NOT REAL
(array: [0, 0, 0, 0, 0, ...], startIndex: 500, count: 500) 22.85716477 0.684116196 2.99% 31.69440661 0.49522187 1.56% 0.72 REAL
(array: [0, 0, 0, 0, 99], startIndex: 2, count: 3) 0.911585888 0.030334049 3.33% 1.074646253 0.026142364 2.43% 0.85 REAL
(array: [0, 0, 0, 99], startIndex: 2, count: 2) 0.768047026 0.023549149 3.07% 0.962447794 0.098430024 10.23% 0.80 NOT REAL
(array: [0, 0, 99], startIndex: 1, count: 2) 0.799875239 0.070670774 8.84% 0.91428837 0.027582951 3.02% 0.87 NOT REAL
(array: [0, 99], startIndex: 1, count: 1) 0.657399807 0.045238182 6.88% 0.79014699 0.102281104 12.94% 0.83 NOT REAL
(array: [99], startIndex: 0, count: 1) 0.652204394 0.032368207 4.96% 0.757082289 0.026939294 3.56% 0.86 REAL

The table above compares the IndexOf vs. the Slice implementations. Then, I compute the ratio of their averages [base/diff == IndexOf/Slice] and added a last column "IsReal". This column determines if a result if to be trusted by comparing the ratio against two times the maximum of the computed sampled standard deviation for both base and diff.

In general, the results are very noisy (this is even after ignoring the first iteration of a run for every test, a fix I will add to my next pr).

I want to mention I have run the benchmark this way:

REM Running the process as high priority on core #2
START "SIMULATOR_PIPE" /B /WAIT /HIGH /AFFINITY 0x2 dotnet.exe run -c release

@KrzysztofCwalina @jkotas Please let me know if you have any questions or concerns regarding the performance API, writing benchmarks, or on how I have collected this data (I can share ETL traces generated by the API).

Regards,
-José

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 9, 2017

Member

Here are the results that I found from running tests within my stand alone application.
In both cases (fast and slow span), for span length < 1000, it is better to do IndexOf rather than Slice. For >= 1000, the two are on par.

Given these results, we should push this PR through. @jkotas, thoughts?

image

image

Member

ahsonkhan commented Mar 9, 2017

Here are the results that I found from running tests within my stand alone application.
In both cases (fast and slow span), for span length < 1000, it is better to do IndexOf rather than Slice. For >= 1000, the two are on par.

Given these results, we should push this PR through. @jkotas, thoughts?

image

image

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 9, 2017

Member

@dotnet-bot test this please

Member

ahsonkhan commented Mar 9, 2017

@dotnet-bot test this please

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Mar 9, 2017

Member

As discussed over email, your results are against more than one month old CoreCLR that did not have any of the recent work on Span code quality. Rerun the tests against current CoreCLR.

Member

jkotas commented Mar 9, 2017

As discussed over email, your results are against more than one month old CoreCLR that did not have any of the recent work on Span code quality. Rerun the tests against current CoreCLR.

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 9, 2017

Member

Re-ran using latest available package (2.0.0-beta-001689-00).

There are improvements for SpanLength <= 100 using the new package for fast span which close the gap a bit. However, IndexOf overload still prevails for SpanLength <= 100.

image

Member

ahsonkhan commented Mar 9, 2017

Re-ran using latest available package (2.0.0-beta-001689-00).

There are improvements for SpanLength <= 100 using the new package for fast span which close the gap a bit. However, IndexOf overload still prevails for SpanLength <= 100.

image

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 10, 2017

Member

Essentially, for Span specifically, calling Slice then IndexOf is just as good (if not better) than calling the IndexOf overload which removes the need to Slice.

https://gist.github.com/ahsonkhan/e56281d9d0a0df6cb6e7837602ffeab3

Member

ahsonkhan commented Mar 10, 2017

Essentially, for Span specifically, calling Slice then IndexOf is just as good (if not better) than calling the IndexOf overload which removes the need to Slice.

https://gist.github.com/ahsonkhan/e56281d9d0a0df6cb6e7837602ffeab3

@KrzysztofCwalina

This comment has been minimized.

Show comment
Hide comment
@KrzysztofCwalina

KrzysztofCwalina Mar 10, 2017

Member

@ahsonkhan, so should we close this?

Member

KrzysztofCwalina commented Mar 10, 2017

@ahsonkhan, so should we close this?

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 10, 2017

Member

@KrzysztofCwalina, I believe we should. However, I think some of the tests and possibly benchmarks I wrote are useful along with some of the minor startswith impl changes.

I can clean up the changes to only keep the relevant parts.

Member

ahsonkhan commented Mar 10, 2017

@KrzysztofCwalina, I believe we should. However, I think some of the tests and possibly benchmarks I wrote are useful along with some of the minor startswith impl changes.

I can clean up the changes to only keep the relevant parts.

@KrzysztofCwalina

This comment has been minimized.

Show comment
Hide comment
@KrzysztofCwalina

KrzysztofCwalina Mar 17, 2017

Member

@ahsonkhan, should we close it?

Member

KrzysztofCwalina commented Mar 17, 2017

@ahsonkhan, should we close it?

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 17, 2017

Member

@KrzysztofCwalina, I haven't gotten a chance to refactor the tests and clean up the useful parts. Would it make more sense to close this PR and start a new one if necessary?

Member

ahsonkhan commented Mar 17, 2017

@KrzysztofCwalina, I haven't gotten a chance to refactor the tests and clean up the useful parts. Would it make more sense to close this PR and start a new one if necessary?

@KrzysztofCwalina

This comment has been minimized.

Show comment
Hide comment
@KrzysztofCwalina

KrzysztofCwalina Mar 17, 2017

Member

Your call.

Member

KrzysztofCwalina commented Mar 17, 2017

Your call.

@ahsonkhan

This comment has been minimized.

Show comment
Hide comment
@ahsonkhan

ahsonkhan Mar 27, 2017

Member

Your call.

I will take these tests and move them to a PR in corefx with StartsWith once the API review is in. I think this can be closed.

Member

ahsonkhan commented Mar 27, 2017

Your call.

I will take these tests and move them to a PR in corefx with StartsWith once the API review is in. I think this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment