Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SpanHelpers.SequenceCompareTo instead of CompareOrdinalHelper #402

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Nov 30, 2019

By string length (1- 64); difference in last position (lower is better)

image

By difference position (0-88) in long string (lower is better)
image

By difference position (0-1023) in long string (lower is better)

image

        Method | Toolchain |  Length |    Mean   |    Error  |         Op/s  | Ratio |      |
---------------|-----------|---------|-----------|-----------|---------------|-------|------|
CompareOrdinal |      base |       1 |  3.312 ns | 0.0116 ns | 301,900,985.4 |  1    |      |
CompareOrdinal |      diff |       1 |  3.495 ns | 0.0155 ns | 286,163,741.3 |  1.05 |  95% |
CompareOrdinal |      base |       2 |  4.763 ns | 0.0118 ns | 209,946,368.4 |  1    |      |
CompareOrdinal |      diff |       2 |  6.114 ns | 0.0560 ns | 163,555,899.2 |  1.28 |  78% |
CompareOrdinal |      base |       3 |  6.009 ns | 0.0139 ns | 166,404,826.9 |  1    |      | 
CompareOrdinal |      diff |       3 |  5.277 ns | 0.0188 ns | 189,499,820.2 |  0.88 | 114% |
CompareOrdinal |      base |       4 |  6.251 ns | 0.0125 ns | 159,966,995.5 |  1    |      |
CompareOrdinal |      diff |       4 |  5.027 ns | 0.0231 ns | 198,923,661.0 |  0.8  | 125% |
CompareOrdinal |      base |       5 |  6.255 ns | 0.0142 ns | 159,872,536.1 |  1    |      |
CompareOrdinal |      diff |       5 |  5.402 ns | 0.0172 ns | 185,125,753.1 |  0.86 | 116% |
CompareOrdinal |      base |       6 |  6.772 ns | 0.0132 ns | 147,664,221.0 |  1    |      |
CompareOrdinal |      diff |       6 |  6.142 ns | 0.1198 ns | 162,807,596.6 |  0.9  | 111% |
CompareOrdinal |      base |       7 |  7.166 ns | 0.0251 ns | 139,554,658.6 |  1    |      |
CompareOrdinal |      diff |       7 |  5.304 ns | 0.0277 ns | 188,526,101.1 |  0.74 | 135% |
CompareOrdinal |      base |       8 |  7.524 ns | 0.0223 ns | 132,912,885.0 |  1    |      |
CompareOrdinal |      diff |       8 |  6.214 ns | 0.0167 ns | 160,939,642.2 |  0.83 | 120% |
CompareOrdinal |      base |       9 |  7.408 ns | 0.0635 ns | 134,990,065.6 |  1    |      |
CompareOrdinal |      diff |       9 |  6.437 ns | 0.0060 ns | 155,355,707.9 |  0.87 | 115% |
CompareOrdinal |      base |      10 |  7.657 ns | 0.0389 ns | 130,606,710.9 |  1    |      |
CompareOrdinal |      diff |      10 |  6.435 ns | 0.0092 ns | 155,409,282.3 |  0.84 | 119% |
CompareOrdinal |      base |      11 |  8.040 ns | 0.0725 ns | 124,376,453.7 |  1    |      |
CompareOrdinal |      diff |      11 |  6.436 ns | 0.0105 ns | 155,378,710.7 |  0.8  | 125% |
CompareOrdinal |      base |      12 |  8.037 ns | 0.0348 ns | 124,422,050.9 |  1    |      |
CompareOrdinal |      diff |      12 |  6.439 ns | 0.0162 ns | 155,304,474.2 |  0.8  | 125% |
CompareOrdinal |      base |      13 |  8.456 ns | 0.0243 ns | 118,266,085.0 |  1    |      |
CompareOrdinal |      diff |      13 |  6.443 ns | 0.0157 ns | 155,212,797.2 |  0.76 | 132% |
CompareOrdinal |      base |      14 |  6.282 ns | 0.0187 ns | 159,176,779.5 |  1    |      |
CompareOrdinal |      diff |      14 |  6.437 ns | 0.0149 ns | 155,362,814.2 |  1.02 |  98% |
CompareOrdinal |      base |      15 |  6.268 ns | 0.0358 ns | 159,540,160.0 |  1    |      |
CompareOrdinal |      diff |      15 |  6.434 ns | 0.0138 ns | 155,425,241.0 |  1.03 |  97% |
CompareOrdinal |      base |      16 |  5.502 ns | 0.0780 ns | 181,759,720.4 |  1    |      |
CompareOrdinal |      diff |      16 |  6.249 ns | 0.0231 ns | 160,017,474.2 |  1.14 |  88% |
CompareOrdinal |      base |      17 |  6.615 ns | 0.0190 ns | 151,168,076.2 |  1    |      |
CompareOrdinal |      diff |      17 |  6.246 ns | 0.0127 ns | 160,093,226.6 |  0.94 | 106% |
CompareOrdinal |      base |      18 |  7.262 ns | 0.0194 ns | 137,710,614.1 |  1    |      |
CompareOrdinal |      diff |      18 |  6.253 ns | 0.0233 ns | 159,927,722.3 |  0.86 | 116% |
CompareOrdinal |      base |      19 |  7.395 ns | 0.0198 ns | 135,220,294.5 |  1    |      |
CompareOrdinal |      diff |      19 |  6.246 ns | 0.0120 ns | 160,101,708.5 |  0.84 | 119% |
CompareOrdinal |      base |      20 |  7.644 ns | 0.0213 ns | 130,827,428.6 |  1    |      |
CompareOrdinal |      diff |      20 |  6.244 ns | 0.0119 ns | 160,157,822.4 |  0.82 | 122% |
CompareOrdinal |      base |      21 |  7.711 ns | 0.0248 ns | 129,678,130.3 |  1    |      |
CompareOrdinal |      diff |      21 |  6.254 ns | 0.0280 ns | 159,904,313.4 |  0.81 | 123% |
CompareOrdinal |      base |      22 |  8.209 ns | 0.0229 ns | 121,815,140.6 |  1    |      |
CompareOrdinal |      diff |      22 |  6.253 ns | 0.0208 ns | 159,928,685.8 |  0.76 | 132% |
CompareOrdinal |      base |      23 |  8.437 ns | 0.0276 ns | 118,523,585.3 |  1    |      |
CompareOrdinal |      diff |      23 |  6.235 ns | 0.0332 ns | 160,381,594.4 |  0.74 | 135% |
CompareOrdinal |      base |      24 |  8.850 ns | 0.0482 ns | 112,992,528.4 |  1    |      |
CompareOrdinal |      diff |      24 |  6.252 ns | 0.0331 ns | 159,955,247.3 |  0.71 | 141% |
CompareOrdinal |      base |      25 |  8.666 ns | 0.0285 ns | 115,397,471.3 |  1    |      |
CompareOrdinal |      diff |      25 |  6.248 ns | 0.0276 ns | 160,044,673.9 |  0.72 | 139% |
CompareOrdinal |      base |      26 |  6.468 ns | 0.0328 ns | 154,606,877.3 |  1    |      |
CompareOrdinal |      diff |      26 |  6.317 ns | 0.0225 ns | 158,306,290.3 |  0.98 | 102% |
CompareOrdinal |      base |      27 |  7.804 ns | 0.0459 ns | 128,139,353.8 |  1    |      |
CompareOrdinal |      diff |      27 |  6.216 ns | 0.0250 ns | 160,865,918.0 |  0.8  | 125% |
CompareOrdinal |      base |      28 |  8.024 ns | 0.0549 ns | 124,628,196.5 |  1    |      |
CompareOrdinal |      diff |      28 |  6.217 ns | 0.0149 ns | 160,861,035.8 |  0.77 | 130% |
CompareOrdinal |      base |      29 |  7.678 ns | 0.0213 ns | 130,238,615.3 |  1    |      |
CompareOrdinal |      diff |      29 |  6.235 ns | 0.0351 ns | 160,384,748.7 |  0.81 | 123% |
CompareOrdinal |      base |      30 |  8.092 ns | 0.0348 ns | 123,578,752.7 |  1    |      |
CompareOrdinal |      diff |      30 |  6.250 ns | 0.0306 ns | 159,999,814.9 |  0.77 | 130% |
CompareOrdinal |      base |      31 |  8.227 ns | 0.0454 ns | 121,556,695.3 |  1    |      |
CompareOrdinal |      diff |      31 |  6.238 ns | 0.0491 ns | 160,304,019.0 |  0.76 | 132% |
CompareOrdinal |      base |      32 |  8.688 ns | 0.0732 ns | 115,096,742.4 |  1    |      |
CompareOrdinal |      diff |      32 |  6.222 ns | 0.0533 ns | 160,713,868.3 |  0.72 | 139% |
CompareOrdinal |      base |      33 |  8.484 ns | 0.0128 ns | 117,870,166.0 |  1    |      |
CompareOrdinal |      diff |      33 |  6.592 ns | 0.0313 ns | 151,693,637.7 |  0.78 | 128% |
CompareOrdinal |      base |      34 |  8.717 ns | 0.0287 ns | 114,722,837.7 |  1    |      |
CompareOrdinal |      diff |      34 |  6.624 ns | 0.0525 ns | 150,972,499.8 |  0.76 | 132% |
CompareOrdinal |      base |      35 |  8.885 ns | 0.0558 ns | 112,555,187.4 |  1    |      |
CompareOrdinal |      diff |      35 |  6.585 ns | 0.0340 ns | 151,859,163.7 |  0.74 | 135% |
CompareOrdinal |      base |      37 |  9.368 ns | 0.0286 ns | 106,743,397.5 |  1    |      |
CompareOrdinal |      diff |      37 |  6.506 ns | 0.0737 ns | 153,699,328.9 |  0.69 | 145% |
CompareOrdinal |      base |      38 |  7.894 ns | 0.0313 ns | 126,674,121.3 |  1    |      |
CompareOrdinal |      diff |      38 |  6.398 ns | 0.0539 ns | 156,310,293.0 |  0.81 | 123% |
CompareOrdinal |      base |      39 |  7.642 ns | 0.0299 ns | 130,863,753.5 |  1    |      |
CompareOrdinal |      diff |      39 |  6.409 ns | 0.0808 ns | 156,036,825.9 |  0.84 | 119% |
CompareOrdinal |      base |      40 |  8.036 ns | 0.0337 ns | 124,437,137.6 |  1    |      |
CompareOrdinal |      diff |      40 |  6.422 ns | 0.0550 ns | 155,716,826.9 |  0.8  | 125% |
CompareOrdinal |      base |      41 |  7.243 ns | 0.0500 ns | 138,065,007.0 |  1    |      |
CompareOrdinal |      diff |      41 |  6.397 ns | 0.0623 ns | 156,331,361.9 |  0.88 | 114% |
CompareOrdinal |      base |      42 |  8.312 ns | 0.0180 ns | 120,308,418.1 |  1    |      |
CompareOrdinal |      diff |      42 |  6.425 ns | 0.0536 ns | 155,641,772.3 |  0.77 | 130% |
CompareOrdinal |      base |      43 |  8.400 ns | 0.0130 ns | 119,040,855.7 |  1    |      |
CompareOrdinal |      diff |      43 |  6.449 ns | 0.0580 ns | 155,071,546.1 |  0.77 | 130% |
CompareOrdinal |      base |      44 |  8.938 ns | 0.0231 ns | 111,877,493.2 |  1    |      |
CompareOrdinal |      diff |      44 |  6.440 ns | 0.0435 ns | 155,284,991.9 |  0.72 | 139% |
CompareOrdinal |      base |      45 |  8.925 ns | 0.0339 ns | 112,040,562.5 |  1    |      |
CompareOrdinal |      diff |      45 |  6.572 ns | 0.0421 ns | 152,153,450.1 |  0.74 | 135% |
CompareOrdinal |      base |      46 |  9.430 ns | 0.0310 ns | 106,040,924.9 |  1    |      |
CompareOrdinal |      diff |      46 |  6.562 ns | 0.0265 ns | 152,383,295.4 |  0.7  | 143% |
CompareOrdinal |      base |      47 |  9.447 ns | 0.0193 ns | 105,848,695.4 |  1    |      |
CompareOrdinal |      diff |      47 |  6.593 ns | 0.0385 ns | 151,666,514.8 |  0.7  | 143% |
CompareOrdinal |      base |      48 |  9.695 ns | 0.0478 ns | 103,145,515.4 |  1    |      |
CompareOrdinal |      diff |      48 |  6.421 ns | 0.0763 ns | 155,747,630.9 |  0.66 | 152% |
CompareOrdinal |      base |      49 |  9.891 ns | 0.0330 ns | 101,103,394.2 |  1    |      |
CompareOrdinal |      diff |      49 |  7.075 ns | 0.0324 ns | 141,341,240.6 |  0.72 | 139% |
CompareOrdinal |      base |      50 |  8.753 ns | 0.0143 ns | 114,247,034.3 |  1    |      |
CompareOrdinal |      diff |      50 |  7.094 ns | 0.0274 ns | 140,956,633.9 |  0.81 | 123% |
CompareOrdinal |      base |      51 |  7.489 ns | 0.0294 ns | 133,530,478.3 |  1    |      |
CompareOrdinal |      diff |      51 |  7.104 ns | 0.0377 ns | 140,769,956.1 |  0.95 | 105% |
CompareOrdinal |      base |      52 |  8.964 ns | 0.0104 ns | 111,558,760.4 |  1    |      |
CompareOrdinal |      diff |      52 |  7.089 ns | 0.0358 ns | 141,058,316.1 |  0.79 | 127% |
CompareOrdinal |      base |      53 |  8.543 ns | 0.0213 ns | 117,060,863.3 |  1    |      |
CompareOrdinal |      diff |      53 |  7.085 ns | 0.0356 ns | 141,141,750.9 |  0.83 | 120% |
CompareOrdinal |      base |      54 |  7.812 ns | 0.0527 ns | 128,007,289.5 |  1    |      |
CompareOrdinal |      diff |      54 |  7.094 ns | 0.0340 ns | 140,961,837.0 |  0.91 | 110% |
CompareOrdinal |      base |      55 |  9.610 ns | 0.0889 ns | 104,058,536.2 |  1    |      |
CompareOrdinal |      diff |      55 |  7.260 ns | 0.0300 ns | 137,732,735.0 |  0.76 | 132% |
CompareOrdinal |      base |      56 |  9.944 ns | 0.0589 ns | 100,566,043.8 |  1    |      |
CompareOrdinal |      diff |      56 |  7.262 ns | 0.0405 ns | 137,700,869.7 |  0.73 | 137% |
CompareOrdinal |      base |      57 |  9.928 ns | 0.0338 ns | 100,720,650.4 |  1    |      |
CompareOrdinal |      diff |      57 |  7.237 ns | 0.0308 ns | 138,188,219.8 |  0.73 | 137% |
CompareOrdinal |      base |      58 | 10.416 ns | 0.0973 ns |  96,001,992.2 |  1    |      |
CompareOrdinal |      diff |      58 |  7.219 ns | 0.0292 ns | 138,522,886.3 |  0.69 | 145% |
CompareOrdinal |      base |      59 | 10.308 ns | 0.0192 ns |  97,012,227.6 |  1    |      |
CompareOrdinal |      diff |      59 |  7.208 ns | 0.0217 ns | 138,726,321.6 |  0.7  | 143% |
CompareOrdinal |      base |      60 | 10.676 ns | 0.0529 ns |  93,665,185.6 |  1    |      |
CompareOrdinal |      diff |      60 |  7.264 ns | 0.0396 ns | 137,666,794.4 |  0.68 | 147% |
CompareOrdinal |      base |      61 | 10.764 ns | 0.0165 ns |  92,898,125.6 |  1    |      |
CompareOrdinal |      diff |      61 |  7.215 ns | 0.0352 ns | 138,594,708.9 |  0.67 | 149% |
CompareOrdinal |      base |      62 |  8.420 ns | 0.0542 ns | 118,770,530.4 |  1    |      |
CompareOrdinal |      diff |      62 |  7.235 ns | 0.0427 ns | 138,224,679.8 |  0.86 | 116% |
CompareOrdinal |      base |      63 |  9.165 ns | 0.0116 ns | 109,116,464.3 |  1    |      |
CompareOrdinal |      diff |      63 |  7.194 ns | 0.0312 ns | 139,007,668.5 |  0.78 | 128% |
CompareOrdinal |      base |      64 |  9.707 ns | 0.0061 ns | 103,018,766.5 |  1    |      |
CompareOrdinal |      diff |      64 |  7.200 ns | 0.0206 ns | 138,883,359.7 |  0.74 | 135% |

Coreclr PR: dotnet/coreclr#22479
Resolves: https://github.com/dotnet/coreclr/issues/22763

@benaadams
Copy link
Member Author

System.Net.Security.Tests.SslClientAuthenticationOptionsTest.ClientOptions_ServerOptions_NotMutatedDuringAuthentication

System.TimeoutException : VirtualNetwork: Timeout reading the next frame.

Raised issue #404

@drieseng
Copy link
Contributor

I'm just an outsider, but - as @jkotas asked in dotnet/coreclr#22479 - "Could you please share the up to date perf numbers?".

@stephentoub stephentoub reopened this Jan 15, 2020
@stephentoub
Copy link
Member

@benaadams, did you have perf numbers here?

@benaadams benaadams force-pushed the Use-CompareOrdinalHelper-for-SpanHelpers.SequenceCompareTo- branch from e6ef143 to 7d6304c Compare January 15, 2020 17:39
@benaadams
Copy link
Member Author

Ah, this additionally needs the intrinsicification of SequenceCompareTo or it cuts in very late with just Vector on a machine that supports Avx.

Have the additional change, just testing it.

@benaadams benaadams changed the title Use CompareOrdinalHelper for SpanHelpers.SequenceCompareTo Use SpanHelpers.SequenceCompareTo instead of CompareOrdinalHelper Jan 16, 2020
@benaadams benaadams force-pushed the Use-CompareOrdinalHelper-for-SpanHelpers.SequenceCompareTo- branch 2 times, most recently from 29b89a7 to 385d550 Compare January 26, 2020 06:56
@benaadams
Copy link
Member Author

benaadams commented Jan 26, 2020

    G_M29673_IG01:
        push     rsi
        vzeroupper 
                            
    G_M29673_IG02:
        cmp      rcx, r8
+-<     je       SHORT G_M29673_IG10             ; Equal
|                           
|   G_M29673_IG03:
|       cmp      edx, r9d
|       jle      SHORT G_M29673_IG04
|       mov      eax, r9d
|       jmp      SHORT G_M29673_IG05
|                           
|   G_M29673_IG04:
|       mov      eax, edx
|                           
|   G_M29673_IG05:
|       movsxd   r10, eax
|       xor      r11, r11
|       cmp      r10, 8
|       jge      G_M29673_IG15                     ; IntrinsicsCompare ------------>+
|       cmp      r10, 4                                                             |
|       jl       SHORT G_M29673_IG07                                                |
|                                                                                   |
|   G_M29673_IG06:                         <-----+ (long)                           |
|       mov      rax, qword ptr [rcx+2*r11]      |                                  |
|       mov      rsi, qword ptr [r8+2*r11]       |                                  |
|       xor      rsi, rax                        L                                  |
|       test     rsi, rsi                        O                                  |
|       jne      SHORT G_M29673_IG12             O ; LongDifference ------->+       |
|       add      r11, 4                          P                          |       |
|       lea      rax, [r11+4]                    |                          |       |
|       cmp      r10, rax                        |                          |       |
|       jge      SHORT G_M29673_IG06       ------+                          |       |
|                                                                           |       |
|   G_M29673_IG07:                                                          |       |
|       lea      rax, [r11+2]                                               |       |
|       cmp      r10, rax                                                   |       |
|       jl       SHORT G_M29673_IG08                                        |       |
|       mov      eax, dword ptr [rcx+2*r11]                                 |       |
|       cmp      dword ptr [r8+2*r11], eax                                  |       |
|       jne      SHORT G_M29673_IG08                                        |       |
|       add      r11, 2                                                     |       |
|                                                                           |       |
|   G_M29673_IG08:                                                          |       |
|       cmp      r11, r10                                                   |       |
+-<     jge      SHORT G_M29673_IG10             ; Equal                    |       |
|                                                                           |       |
|   G_M29673_IG09:                        <-----+ (char)                    |       |
|       lea      rax, bword ptr [rcx+2*r11]     |                           |       |
|       movzx    rsi, word  ptr [r8+2*r11]      |                           |       |
|       movzx    rax, word  ptr [rax]           L                           |       |
|       sub      eax, esi                       O                           |       |
|       test     eax, eax                       O                           |       |
|       jne      SHORT G_M29673_IG14            P ; ResultDifference --->+  |       |
|       inc      r11                            |                        |  |       |
|       cmp      r11, r10                       |                        |  |       |
|       jl       SHORT G_M29673_IG09      ------+                        |  |       |
\                                                                        |  |       |
 -> G_M29673_IG10:        ; <--- Equal                                   |  |       |
/       mov      eax, edx                                                |  |       |
|       sub      eax, r9d                                                |  |       |
|                                                                        |  |       |
|   G_M29673_IG11:                                                       |  |       |
|       vzeroupper                                                       |  |       |
|       pop      rsi                                                     |  |       |
|       ret                                                              |  |       |
|                                                                        |  |       |
|   G_M29673_IG12:                         ; <-- LongDifference  -----------+       |
|       xor      eax, eax                                                |          |
|       tzcnt    rax, rsi                                                |          |
|       sar      eax, 4                                                  |          |
|       movsxd   rax, eax                                                |          |
|       add      r11, rax                                                |          |
|                                                                        |          |
|   G_M29673_IG13:                         ; <-- OffsetDifference -------|------+   |
|       lea      rax, bword ptr [rcx+2*r11]                              |      |   |
|       movzx    r10, word  ptr [r8+2*r11]                               |      |   |
|       movzx    rax, word  ptr [rax]                                    |      |   |
|       sub      eax, r10d                                               |      |   |
|                                                                        |      |   |
|   G_M29673_IG14:                         ; <-- ResultDifference--------+      |   |
|       vzeroupper                                                              |   |
|       pop      rsi                                                            |   |
|       ret                                                                     |   |
|                                                                               |   |
|   G_M29673_IG15:                          ; <-- IntrinsicsCompare ----------------+
|       lea      rax, [r10-16]                                                  |
|       test     rax, rax                                                       |
|       jl       SHORT G_M29673_IG18                                            |
|       test     rax, rax                                                       |
|       jle      SHORT G_M29673_IG17                                            |
|                                                                               |
|   G_M29673_IG16:                        <-----+ (Vector256)                   |
|       vmovupd  ymm0, ymmword ptr[rcx+2*r11]   |                               |
|       vmovupd  ymm1, ymmword ptr[r8+2*r11]    |                               |
|       vpcmpeqw ymm0, ymm0, ymm1               L                               |
|       vpmovmskb esi, ymm0                     O                               |
|       cmp      esi, -1                        O                               |
|       jne      G_M29673_IG21                  P ; IntrinsicsDifference --->+  |
|       add      r11, 16                        |                            |  |
|       cmp      rax, r11                       |                            |  |
|       jg       SHORT G_M29673_IG16      ------+                            |  |
|                                                                            |  |
|   G_M29673_IG17:                                                           |  |
|       mov      r11, rax                                                    |  |
|       vmovupd  ymm0, ymmword ptr[rcx+2*r11]                                |  |
|       vmovupd  ymm1, ymmword ptr[r8+2*r11]                                 |  |
|       vpcmpeqw ymm0, ymm0, ymm1                                            |  |
|       vpmovmskb esi, ymm0                                                  |  |
|       cmp      esi, -1                                                     |  |
|       jne      SHORT G_M29673_IG21              ; IntrinsicsDifference --->+  |
+-<     jmp      SHORT G_M29673_IG10              ; Equal                    |  |
|                                                                            |  |
|   G_M29673_IG18:                                                           |  |
|       lea      rax, [r10-8]                                                |  |
|       test     rax, rax                                                    |  |
|       jle      SHORT G_M29673_IG20                                         |  |
|                                                                            |  |
|   G_M29673_IG19:                        <-----+ (Vector128)                |  |
|       vmovupd  xmm0, xmmword ptr [rcx+2*r11]  |                            |  |
|       vmovupd  xmm1, xmmword ptr [r8+2*r11]   |                            |  |
|       vpcmpeqw xmm0, xmm0, xmm1               L                            |  |
|       vpmovmskb esi, xmm0                     O                            |  |
|       cmp      esi, 0xFFFF                    O                            |  |
|       jne      SHORT G_M29673_IG21            P ; IntrinsicsDifference --->+  |
|       add      r11, 8                         |                            |  |
|       cmp      rax, r11                       |                            |  |
|       jg       SHORT G_M29673_IG19      ------+                            |  |
|                                                                            |  |
|   G_M29673_IG20:                                                           |  |
|       mov      r11, rax                                                    |  |
|       vmovupd  xmm0, xmmword ptr [rcx+2*r11]                               |  |
|       vmovupd  xmm1, xmmword ptr [r8+2*r11]                                |  |
|       vpcmpeqw xmm0, xmm0, xmm1                                            |  |
|       vpmovmskb esi, xmm0                                                  |  |
|       cmp      esi, 0xFFFF                                                 |  |
+-<     je       G_M29673_IG10                     ; Equal                   |  |
                                                                             |  |
    G_M29673_IG21:         ; <--------------------- IntrinsicsDifference ----+  |
        mov      eax, esi                                                       |
        not      eax                                                            |
        tzcnt    eax, eax                                                       |
        sar      eax, 1                                                         |
        movsxd   rax, eax                                                       |
        add      r11, rax                                                       |
        jmp      G_M29673_IG13                 ; OffsetDifference ------------->+
                            
    
; Total bytes of code 355, prolog size 4, PerfScore 222.98, for method Program:SequenceCompareTo(byref,int,byref,int):int

@benaadams benaadams force-pushed the Use-CompareOrdinalHelper-for-SpanHelpers.SequenceCompareTo- branch from 385d550 to 9431416 Compare January 26, 2020 07:31
@benaadams
Copy link
Member Author

benaadams commented Jan 27, 2020

By string length (1- 64); difference in last position (lower is better)

image

By difference position (0-88) in long string (lower is better)
image

By difference position (0-1023) in long string (lower is better)

image

@benaadams
Copy link
Member Author

        Method | Toolchain |  Length |    Mean   |    Error  |         Op/s  | Ratio |      |
---------------|-----------|---------|-----------|-----------|---------------|-------|------|
CompareOrdinal |      base |       1 |  3.312 ns | 0.0116 ns | 301,900,985.4 |  1    |      |
CompareOrdinal |      diff |       1 |  3.495 ns | 0.0155 ns | 286,163,741.3 |  1.05 |  95% |
CompareOrdinal |      base |       2 |  4.763 ns | 0.0118 ns | 209,946,368.4 |  1    |      |
CompareOrdinal |      diff |       2 |  6.114 ns | 0.0560 ns | 163,555,899.2 |  1.28 |  78% |
CompareOrdinal |      base |       3 |  6.009 ns | 0.0139 ns | 166,404,826.9 |  1    |      | 
CompareOrdinal |      diff |       3 |  5.277 ns | 0.0188 ns | 189,499,820.2 |  0.88 | 114% |
CompareOrdinal |      base |       4 |  6.251 ns | 0.0125 ns | 159,966,995.5 |  1    |      |
CompareOrdinal |      diff |       4 |  5.027 ns | 0.0231 ns | 198,923,661.0 |  0.8  | 125% |
CompareOrdinal |      base |       5 |  6.255 ns | 0.0142 ns | 159,872,536.1 |  1    |      |
CompareOrdinal |      diff |       5 |  5.402 ns | 0.0172 ns | 185,125,753.1 |  0.86 | 116% |
CompareOrdinal |      base |       6 |  6.772 ns | 0.0132 ns | 147,664,221.0 |  1    |      |
CompareOrdinal |      diff |       6 |  6.142 ns | 0.1198 ns | 162,807,596.6 |  0.9  | 111% |
CompareOrdinal |      base |       7 |  7.166 ns | 0.0251 ns | 139,554,658.6 |  1    |      |
CompareOrdinal |      diff |       7 |  5.304 ns | 0.0277 ns | 188,526,101.1 |  0.74 | 135% |
CompareOrdinal |      base |       8 |  7.524 ns | 0.0223 ns | 132,912,885.0 |  1    |      |
CompareOrdinal |      diff |       8 |  6.214 ns | 0.0167 ns | 160,939,642.2 |  0.83 | 120% |
CompareOrdinal |      base |       9 |  7.408 ns | 0.0635 ns | 134,990,065.6 |  1    |      |
CompareOrdinal |      diff |       9 |  6.437 ns | 0.0060 ns | 155,355,707.9 |  0.87 | 115% |
CompareOrdinal |      base |      10 |  7.657 ns | 0.0389 ns | 130,606,710.9 |  1    |      |
CompareOrdinal |      diff |      10 |  6.435 ns | 0.0092 ns | 155,409,282.3 |  0.84 | 119% |
CompareOrdinal |      base |      11 |  8.040 ns | 0.0725 ns | 124,376,453.7 |  1    |      |
CompareOrdinal |      diff |      11 |  6.436 ns | 0.0105 ns | 155,378,710.7 |  0.8  | 125% |
CompareOrdinal |      base |      12 |  8.037 ns | 0.0348 ns | 124,422,050.9 |  1    |      |
CompareOrdinal |      diff |      12 |  6.439 ns | 0.0162 ns | 155,304,474.2 |  0.8  | 125% |
CompareOrdinal |      base |      13 |  8.456 ns | 0.0243 ns | 118,266,085.0 |  1    |      |
CompareOrdinal |      diff |      13 |  6.443 ns | 0.0157 ns | 155,212,797.2 |  0.76 | 132% |
CompareOrdinal |      base |      14 |  6.282 ns | 0.0187 ns | 159,176,779.5 |  1    |      |
CompareOrdinal |      diff |      14 |  6.437 ns | 0.0149 ns | 155,362,814.2 |  1.02 |  98% |
CompareOrdinal |      base |      15 |  6.268 ns | 0.0358 ns | 159,540,160.0 |  1    |      |
CompareOrdinal |      diff |      15 |  6.434 ns | 0.0138 ns | 155,425,241.0 |  1.03 |  97% |
CompareOrdinal |      base |      16 |  5.502 ns | 0.0780 ns | 181,759,720.4 |  1    |      |
CompareOrdinal |      diff |      16 |  6.249 ns | 0.0231 ns | 160,017,474.2 |  1.14 |  88% |
CompareOrdinal |      base |      17 |  6.615 ns | 0.0190 ns | 151,168,076.2 |  1    |      |
CompareOrdinal |      diff |      17 |  6.246 ns | 0.0127 ns | 160,093,226.6 |  0.94 | 106% |
CompareOrdinal |      base |      18 |  7.262 ns | 0.0194 ns | 137,710,614.1 |  1    |      |
CompareOrdinal |      diff |      18 |  6.253 ns | 0.0233 ns | 159,927,722.3 |  0.86 | 116% |
CompareOrdinal |      base |      19 |  7.395 ns | 0.0198 ns | 135,220,294.5 |  1    |      |
CompareOrdinal |      diff |      19 |  6.246 ns | 0.0120 ns | 160,101,708.5 |  0.84 | 119% |
CompareOrdinal |      base |      20 |  7.644 ns | 0.0213 ns | 130,827,428.6 |  1    |      |
CompareOrdinal |      diff |      20 |  6.244 ns | 0.0119 ns | 160,157,822.4 |  0.82 | 122% |
CompareOrdinal |      base |      21 |  7.711 ns | 0.0248 ns | 129,678,130.3 |  1    |      |
CompareOrdinal |      diff |      21 |  6.254 ns | 0.0280 ns | 159,904,313.4 |  0.81 | 123% |
CompareOrdinal |      base |      22 |  8.209 ns | 0.0229 ns | 121,815,140.6 |  1    |      |
CompareOrdinal |      diff |      22 |  6.253 ns | 0.0208 ns | 159,928,685.8 |  0.76 | 132% |
CompareOrdinal |      base |      23 |  8.437 ns | 0.0276 ns | 118,523,585.3 |  1    |      |
CompareOrdinal |      diff |      23 |  6.235 ns | 0.0332 ns | 160,381,594.4 |  0.74 | 135% |
CompareOrdinal |      base |      24 |  8.850 ns | 0.0482 ns | 112,992,528.4 |  1    |      |
CompareOrdinal |      diff |      24 |  6.252 ns | 0.0331 ns | 159,955,247.3 |  0.71 | 141% |
CompareOrdinal |      base |      25 |  8.666 ns | 0.0285 ns | 115,397,471.3 |  1    |      |
CompareOrdinal |      diff |      25 |  6.248 ns | 0.0276 ns | 160,044,673.9 |  0.72 | 139% |
CompareOrdinal |      base |      26 |  6.468 ns | 0.0328 ns | 154,606,877.3 |  1    |      |
CompareOrdinal |      diff |      26 |  6.317 ns | 0.0225 ns | 158,306,290.3 |  0.98 | 102% |
CompareOrdinal |      base |      27 |  7.804 ns | 0.0459 ns | 128,139,353.8 |  1    |      |
CompareOrdinal |      diff |      27 |  6.216 ns | 0.0250 ns | 160,865,918.0 |  0.8  | 125% |
CompareOrdinal |      base |      28 |  8.024 ns | 0.0549 ns | 124,628,196.5 |  1    |      |
CompareOrdinal |      diff |      28 |  6.217 ns | 0.0149 ns | 160,861,035.8 |  0.77 | 130% |
CompareOrdinal |      base |      29 |  7.678 ns | 0.0213 ns | 130,238,615.3 |  1    |      |
CompareOrdinal |      diff |      29 |  6.235 ns | 0.0351 ns | 160,384,748.7 |  0.81 | 123% |
CompareOrdinal |      base |      30 |  8.092 ns | 0.0348 ns | 123,578,752.7 |  1    |      |
CompareOrdinal |      diff |      30 |  6.250 ns | 0.0306 ns | 159,999,814.9 |  0.77 | 130% |
CompareOrdinal |      base |      31 |  8.227 ns | 0.0454 ns | 121,556,695.3 |  1    |      |
CompareOrdinal |      diff |      31 |  6.238 ns | 0.0491 ns | 160,304,019.0 |  0.76 | 132% |
CompareOrdinal |      base |      32 |  8.688 ns | 0.0732 ns | 115,096,742.4 |  1    |      |
CompareOrdinal |      diff |      32 |  6.222 ns | 0.0533 ns | 160,713,868.3 |  0.72 | 139% |
CompareOrdinal |      base |      33 |  8.484 ns | 0.0128 ns | 117,870,166.0 |  1    |      |
CompareOrdinal |      diff |      33 |  6.592 ns | 0.0313 ns | 151,693,637.7 |  0.78 | 128% |
CompareOrdinal |      base |      34 |  8.717 ns | 0.0287 ns | 114,722,837.7 |  1    |      |
CompareOrdinal |      diff |      34 |  6.624 ns | 0.0525 ns | 150,972,499.8 |  0.76 | 132% |
CompareOrdinal |      base |      35 |  8.885 ns | 0.0558 ns | 112,555,187.4 |  1    |      |
CompareOrdinal |      diff |      35 |  6.585 ns | 0.0340 ns | 151,859,163.7 |  0.74 | 135% |
CompareOrdinal |      base |      37 |  9.368 ns | 0.0286 ns | 106,743,397.5 |  1    |      |
CompareOrdinal |      diff |      37 |  6.506 ns | 0.0737 ns | 153,699,328.9 |  0.69 | 145% |
CompareOrdinal |      base |      38 |  7.894 ns | 0.0313 ns | 126,674,121.3 |  1    |      |
CompareOrdinal |      diff |      38 |  6.398 ns | 0.0539 ns | 156,310,293.0 |  0.81 | 123% |
CompareOrdinal |      base |      39 |  7.642 ns | 0.0299 ns | 130,863,753.5 |  1    |      |
CompareOrdinal |      diff |      39 |  6.409 ns | 0.0808 ns | 156,036,825.9 |  0.84 | 119% |
CompareOrdinal |      base |      40 |  8.036 ns | 0.0337 ns | 124,437,137.6 |  1    |      |
CompareOrdinal |      diff |      40 |  6.422 ns | 0.0550 ns | 155,716,826.9 |  0.8  | 125% |
CompareOrdinal |      base |      41 |  7.243 ns | 0.0500 ns | 138,065,007.0 |  1    |      |
CompareOrdinal |      diff |      41 |  6.397 ns | 0.0623 ns | 156,331,361.9 |  0.88 | 114% |
CompareOrdinal |      base |      42 |  8.312 ns | 0.0180 ns | 120,308,418.1 |  1    |      |
CompareOrdinal |      diff |      42 |  6.425 ns | 0.0536 ns | 155,641,772.3 |  0.77 | 130% |
CompareOrdinal |      base |      43 |  8.400 ns | 0.0130 ns | 119,040,855.7 |  1    |      |
CompareOrdinal |      diff |      43 |  6.449 ns | 0.0580 ns | 155,071,546.1 |  0.77 | 130% |
CompareOrdinal |      base |      44 |  8.938 ns | 0.0231 ns | 111,877,493.2 |  1    |      |
CompareOrdinal |      diff |      44 |  6.440 ns | 0.0435 ns | 155,284,991.9 |  0.72 | 139% |
CompareOrdinal |      base |      45 |  8.925 ns | 0.0339 ns | 112,040,562.5 |  1    |      |
CompareOrdinal |      diff |      45 |  6.572 ns | 0.0421 ns | 152,153,450.1 |  0.74 | 135% |
CompareOrdinal |      base |      46 |  9.430 ns | 0.0310 ns | 106,040,924.9 |  1    |      |
CompareOrdinal |      diff |      46 |  6.562 ns | 0.0265 ns | 152,383,295.4 |  0.7  | 143% |
CompareOrdinal |      base |      47 |  9.447 ns | 0.0193 ns | 105,848,695.4 |  1    |      |
CompareOrdinal |      diff |      47 |  6.593 ns | 0.0385 ns | 151,666,514.8 |  0.7  | 143% |
CompareOrdinal |      base |      48 |  9.695 ns | 0.0478 ns | 103,145,515.4 |  1    |      |
CompareOrdinal |      diff |      48 |  6.421 ns | 0.0763 ns | 155,747,630.9 |  0.66 | 152% |
CompareOrdinal |      base |      49 |  9.891 ns | 0.0330 ns | 101,103,394.2 |  1    |      |
CompareOrdinal |      diff |      49 |  7.075 ns | 0.0324 ns | 141,341,240.6 |  0.72 | 139% |
CompareOrdinal |      base |      50 |  8.753 ns | 0.0143 ns | 114,247,034.3 |  1    |      |
CompareOrdinal |      diff |      50 |  7.094 ns | 0.0274 ns | 140,956,633.9 |  0.81 | 123% |
CompareOrdinal |      base |      51 |  7.489 ns | 0.0294 ns | 133,530,478.3 |  1    |      |
CompareOrdinal |      diff |      51 |  7.104 ns | 0.0377 ns | 140,769,956.1 |  0.95 | 105% |
CompareOrdinal |      base |      52 |  8.964 ns | 0.0104 ns | 111,558,760.4 |  1    |      |
CompareOrdinal |      diff |      52 |  7.089 ns | 0.0358 ns | 141,058,316.1 |  0.79 | 127% |
CompareOrdinal |      base |      53 |  8.543 ns | 0.0213 ns | 117,060,863.3 |  1    |      |
CompareOrdinal |      diff |      53 |  7.085 ns | 0.0356 ns | 141,141,750.9 |  0.83 | 120% |
CompareOrdinal |      base |      54 |  7.812 ns | 0.0527 ns | 128,007,289.5 |  1    |      |
CompareOrdinal |      diff |      54 |  7.094 ns | 0.0340 ns | 140,961,837.0 |  0.91 | 110% |
CompareOrdinal |      base |      55 |  9.610 ns | 0.0889 ns | 104,058,536.2 |  1    |      |
CompareOrdinal |      diff |      55 |  7.260 ns | 0.0300 ns | 137,732,735.0 |  0.76 | 132% |
CompareOrdinal |      base |      56 |  9.944 ns | 0.0589 ns | 100,566,043.8 |  1    |      |
CompareOrdinal |      diff |      56 |  7.262 ns | 0.0405 ns | 137,700,869.7 |  0.73 | 137% |
CompareOrdinal |      base |      57 |  9.928 ns | 0.0338 ns | 100,720,650.4 |  1    |      |
CompareOrdinal |      diff |      57 |  7.237 ns | 0.0308 ns | 138,188,219.8 |  0.73 | 137% |
CompareOrdinal |      base |      58 | 10.416 ns | 0.0973 ns |  96,001,992.2 |  1    |      |
CompareOrdinal |      diff |      58 |  7.219 ns | 0.0292 ns | 138,522,886.3 |  0.69 | 145% |
CompareOrdinal |      base |      59 | 10.308 ns | 0.0192 ns |  97,012,227.6 |  1    |      |
CompareOrdinal |      diff |      59 |  7.208 ns | 0.0217 ns | 138,726,321.6 |  0.7  | 143% |
CompareOrdinal |      base |      60 | 10.676 ns | 0.0529 ns |  93,665,185.6 |  1    |      |
CompareOrdinal |      diff |      60 |  7.264 ns | 0.0396 ns | 137,666,794.4 |  0.68 | 147% |
CompareOrdinal |      base |      61 | 10.764 ns | 0.0165 ns |  92,898,125.6 |  1    |      |
CompareOrdinal |      diff |      61 |  7.215 ns | 0.0352 ns | 138,594,708.9 |  0.67 | 149% |
CompareOrdinal |      base |      62 |  8.420 ns | 0.0542 ns | 118,770,530.4 |  1    |      |
CompareOrdinal |      diff |      62 |  7.235 ns | 0.0427 ns | 138,224,679.8 |  0.86 | 116% |
CompareOrdinal |      base |      63 |  9.165 ns | 0.0116 ns | 109,116,464.3 |  1    |      |
CompareOrdinal |      diff |      63 |  7.194 ns | 0.0312 ns | 139,007,668.5 |  0.78 | 128% |
CompareOrdinal |      base |      64 |  9.707 ns | 0.0061 ns | 103,018,766.5 |  1    |      |
CompareOrdinal |      diff |      64 |  7.200 ns | 0.0206 ns | 138,883,359.7 |  0.74 | 135% |

@benaadams benaadams force-pushed the Use-CompareOrdinalHelper-for-SpanHelpers.SequenceCompareTo- branch from 9431416 to e278276 Compare January 27, 2020 18:59
@benaadams
Copy link
Member Author

@stephentoub ready to go

@danmoseley
Copy link
Member

Rerunning failed jobs so hopefully we can merge this.

@danmoseley
Copy link
Member

@jkotas you signed off on this a while back. It’s green now. Do you believe this needs further review?

@jkotas
Copy link
Member

jkotas commented Aug 12, 2020

I have signed off on much simpler version of this change. This should be reviewed by somebody with PhD in hardware intrinsics.

@danmoseley
Copy link
Member

@tannergooding knows such a person. He is a contended resource at the moment though..

@adamsitnik adamsitnik added the tenet-performance Performance related issue label Aug 14, 2020
@adamsitnik adamsitnik added this to the 5.0.0 milestone Aug 14, 2020
@GrabYourPitchforks GrabYourPitchforks added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Aug 14, 2020
@GrabYourPitchforks
Copy link
Member

I've marked this NO MERGE for now since the latest iteration hasn't gone through review. Once there's a review on the latest iteration feel free to remove this label.

@tannergooding
Copy link
Member

@benaadams, if you could resolve the merge conflicts then I can give this a review 😄

@benaadams
Copy link
Member Author

@tannergooding #41097 is good to go, while I resolve these conflicts 😉

{
if (Vector.IsHardwareAccelerated && minLength >= (nuint)Vector<ushort>.Count)
// Calucate lengthToExamine here for test, rather than just testing as it used later, rather than doing it twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Calucate/Calculate

}
else if (Vector.IsHardwareAccelerated)
{
// Calucate lengthToExamine here for test, rather than just testing as it used later, rather than doing it twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Calucate/Calculate

@danmoseley
Copy link
Member

@benaadams do you think you'll be able to resolve those conflicts? I'm keeping an eye on this because it's one of our oldest PR's 🙂

Base automatically changed from master to main March 1, 2021 09:06
@carlossanlop
Copy link
Member

Ping @benaadams can you please address the latest comments?

@danmoseley
Copy link
Member

Thanks for the PR, @benaadams . I'm going to close this, feel free to reopen if you plan to pick it up again.

@danmoseley danmoseley closed this Mar 19, 2021
MichalStrehovsky pushed a commit to MichalStrehovsky/runtime that referenced this pull request Mar 25, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 18, 2021
radical pushed a commit to radical/runtime that referenced this pull request Jul 7, 2022
- `UninstallApp()` wasn't triggering for devices
- mlaunch failures when running app didn't get detected

Resolves dotnet#402
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Memory NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet