Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
We compare the new x86-64-v2 kernel with the generic C kernel (pre), the new scalar kernel, and a variant x86-64-v2 kernel based on the Muła/Langdale approach that performs a lot worse (likely due to higher setup cost). glibc is competitive, but ultimately slower except for the 0/1 character cases, where glibc likely has an AVX or AVX-512 implementation. os: FreeBSD arch: amd64 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ strcspn.x86-64-v2.out │ strcspn.pre.out │ strcspn.scalar.out │ strcspn.mula.out │ │ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ Short0 51.98µ ± 1% 141.85µ ± 0% +172.89% (p=0.000 n=20) 75.92µ ± 0% +46.05% (p=0.000 n=20) 52.12µ ± 1% ~ (p=0.925 n=20) Mid0 11.10µ ± 0% 78.30µ ± 0% +605.49% (p=0.000 n=20) 20.81µ ± 1% +87.50% (p=0.000 n=20) 11.11µ ± 0% ~ (p=0.718 n=20) Long0 2.227µ ± 0% 48.457µ ± 0% +2075.56% (p=0.000 n=20) 7.944µ ± 0% +256.66% (p=0.000 n=20) 2.228µ ± 0% ~ (p=0.355 n=20) Short1 63.47µ ± 0% 143.43µ ± 0% +125.97% (p=0.000 n=20) 73.99µ ± 0% +16.57% (p=0.000 n=20) 62.53µ ± 0% -1.49% (p=0.000 n=20) Mid1 13.17µ ± 0% 80.17µ ± 0% +508.70% (p=0.000 n=20) 20.63µ ± 0% +56.63% (p=0.000 n=20) 13.21µ ± 0% +0.32% (p=0.000 n=20) Long1 3.295µ ± 0% 48.440µ ± 0% +1369.95% (p=0.000 n=20) 11.034µ ± 0% +234.84% (p=0.000 n=20) 3.293µ ± 0% ~ (p=0.231 n=20) Short5 65.70µ ± 0% 165.04µ ± 0% +151.20% (p=0.000 n=20) 121.95µ ± 0% +85.61% (p=0.000 n=20) 139.56µ ± 1% +112.42% (p=0.000 n=20) Mid5 14.52µ ± 0% 85.18µ ± 0% +486.50% (p=0.000 n=20) 54.45µ ± 0% +274.95% (p=0.000 n=20) 29.62µ ± 1% +103.98% (p=0.000 n=20) Long5 6.029µ ± 0% 50.073µ ± 2% +730.53% (p=0.000 n=20) 35.285µ ± 0% +485.24% (p=0.000 n=20) 7.032µ ± 0% +16.64% (p=0.000 n=20) Short20 66.58µ ± 0% 319.29µ ± 0% +379.52% (p=0.000 n=20) 155.40µ ± 0% +133.39% (p=0.000 n=20) 240.59µ ± 0% +261.33% (p=0.000 n=20) Mid20 19.94µ ± 0% 131.60µ ± 0% +559.84% (p=0.000 n=20) 65.12µ ± 0% +226.51% (p=0.000 n=20) 67.42µ ± 0% +238.06% (p=0.000 n=20) Long20 12.039µ ± 0% 63.676µ ± 0% +428.91% (p=0.000 n=20) 35.195µ ± 0% +192.34% (p=0.000 n=20) 6.801µ ± 0% -43.51% (p=0.000 n=20) Short40 178.3µ ± 2% 565.0µ ± 0% +216.80% (p=0.000 n=20) 196.2µ ± 1% +10.02% (p=0.000 n=20) 409.9µ ± 0% +129.83% (p=0.000 n=20) Mid40 72.20µ ± 1% 198.46µ ± 0% +174.87% (p=0.000 n=20) 74.61µ ± 0% +3.33% (p=0.000 n=20) 115.74µ ± 0% +60.30% (p=0.000 n=20) Long40 35.186µ ± 0% 53.497µ ± 0% +52.04% (p=0.000 n=20) 35.302µ ± 0% +0.33% (p=0.000 n=20) 7.066µ ± 0% -79.92% (p=0.000 n=20) geomean 22.11µ 108.5µ +390.60% 46.13µ +108.64% 27.44µ +24.12% │ strcspn.x86-64-v2.out │ strcspn.pre.out │ strcspn.scalar.out │ strcspn.mula.out │ │ B/s │ B/s vs base │ B/s vs base │ B/s vs base │ Short0 2293.3Mi ± 1% 840.4Mi ± 0% -63.36% (p=0.000 n=20) 1570.2Mi ± 0% -31.53% (p=0.000 n=20) 2287.2Mi ± 1% ~ (p=0.925 n=20) Mid0 10.489Gi ± 0% 1.487Gi ± 0% -85.83% (p=0.000 n=20) 5.594Gi ± 1% -46.67% (p=0.000 n=20) 10.481Gi ± 0% ~ (p=0.718 n=20) Long0 52.266Gi ± 0% 2.402Gi ± 0% -95.40% (p=0.000 n=20) 14.654Gi ± 0% -71.96% (p=0.000 n=20) 52.260Gi ± 0% ~ (p=0.355 n=20) Short1 1878.1Mi ± 0% 831.1Mi ± 0% -55.75% (p=0.000 n=20) 1611.2Mi ± 0% -14.21% (p=0.000 n=20) 1906.5Mi ± 0% +1.51% (p=0.000 n=20) Mid1 8.839Gi ± 0% 1.452Gi ± 0% -83.57% (p=0.000 n=20) 5.643Gi ± 0% -36.16% (p=0.000 n=20) 8.811Gi ± 0% -0.31% (p=0.000 n=20) Long1 35.327Gi ± 0% 2.403Gi ± 0% -93.20% (p=0.000 n=20) 10.551Gi ± 0% -70.13% (p=0.000 n=20) 35.351Gi ± 0% ~ (p=0.231 n=20) Short5 1814.4Mi ± 0% 722.3Mi ± 0% -60.19% (p=0.000 n=20) 977.5Mi ± 0% -46.12% (p=0.000 n=20) 854.2Mi ± 1% -52.92% (p=0.000 n=20) Mid5 8.016Gi ± 0% 1.367Gi ± 0% -82.95% (p=0.000 n=20) 2.138Gi ± 0% -73.33% (p=0.000 n=20) 3.930Gi ± 1% -50.98% (p=0.000 n=20) Long5 19.309Gi ± 0% 2.325Gi ± 2% -87.96% (p=0.000 n=20) 3.299Gi ± 0% -82.91% (p=0.000 n=20) 16.555Gi ± 0% -14.26% (p=0.000 n=20) Short20 1790.3Mi ± 0% 373.4Mi ± 0% -79.15% (p=0.000 n=20) 767.1Mi ± 0% -57.15% (p=0.000 n=20) 495.5Mi ± 0% -72.32% (p=0.000 n=20) Mid20 5977.2Mi ± 0% 905.9Mi ± 0% -84.84% (p=0.000 n=20) 1830.6Mi ± 0% -69.37% (p=0.000 n=20) 1768.1Mi ± 0% -70.42% (p=0.000 n=20) Long20 9.670Gi ± 0% 1.828Gi ± 0% -81.09% (p=0.000 n=20) 3.308Gi ± 0% -65.79% (p=0.000 n=20) 17.117Gi ± 0% +77.01% (p=0.000 n=20) Short40 668.5Mi ± 2% 211.0Mi ± 0% -68.43% (p=0.000 n=20) 607.6Mi ± 1% -9.11% (p=0.000 n=20) 290.9Mi ± 0% -56.49% (p=0.000 n=20) Mid40 1651.1Mi ± 1% 600.7Mi ± 0% -63.62% (p=0.000 n=20) 1597.8Mi ± 0% -3.23% (p=0.000 n=20) 1030.0Mi ± 0% -37.62% (p=0.000 n=20) Long40 3.309Gi ± 0% 2.176Gi ± 0% -34.23% (p=0.000 n=20) 3.298Gi ± 0% -0.33% (p=0.000 n=20) 16.474Gi ± 0% +397.92% (p=0.000 n=20) geomean 5.265Gi 1.073Gi -79.62% 2.524Gi -52.07% 4.242Gi -19.44% os: Linux arch: x86_64 cpu: │ strcspn.glibc.out │ │ sec/op │ Short0 49.97µ ± 0% Mid0 13.48µ ± 0% Long0 1.033µ ± 0% Short1 70.64µ ± 0% Mid1 21.18µ ± 0% Long1 7.920µ ± 0% Short5 70.53µ ± 0% Mid5 21.18µ ± 0% Long5 7.920µ ± 0% Short20 201.2µ ± 0% Mid20 78.00µ ± 0% Long20 35.65µ ± 0% Short40 253.6µ ± 0% Mid40 92.19µ ± 0% Long40 35.68µ ± 0% geomean 32.40µ │ strcspn.glibc.out │ │ B/s │ Short0 2.330Gi ± 0% Mid0 8.639Gi ± 0% Long0 112.7Gi ± 0% Short1 1.648Gi ± 0% Mid1 5.496Gi ± 0% Long1 14.70Gi ± 0% Short5 1.651Gi ± 0% Mid5 5.497Gi ± 0% Long5 14.70Gi ± 0% Short20 592.6Mi ± 0% Mid20 1.493Gi ± 0% Long20 3.266Gi ± 0% Short40 470.0Mi ± 0% Mid40 1.263Gi ± 0% Long40 3.262Gi ± 0% geomean 3.593Gi
- Loading branch information