Implementing cuda::std::is_partitioned I found thrust::is_partitioned to be surprisingly slower.
We should change the implementation to the one the PSTL uses
['thrust_is_partitioned.json', 'pstl_is_partitioned.json']
# base
## [1] NVIDIA RTX A6000
| T | Elements | MismatchAt | Ref Time | Ref Noise | Cmp Time | Cmp Noise | Diff | %Diff | Status |
|------|------------|--------------|------------|-------------|------------|-------------|------------|---------|----------|
| I8 | 2^16 | 1 | 27.546 us | 17.21% | 28.499 us | 3.26% | 0.953 us | 3.46% | SLOW |
| I8 | 2^20 | 1 | 40.005 us | 4.63% | 28.600 us | 5.05% | -11.405 us | -28.51% | FAST |
| I8 | 2^24 | 1 | 44.150 us | 3.21% | 29.748 us | 2.22% | -14.402 us | -32.62% | FAST |
| I8 | 2^28 | 1 | 44.043 us | 2.97% | 29.753 us | 3.56% | -14.290 us | -32.44% | FAST |
| I8 | 2^16 | 0.5 | 27.401 us | 16.20% | 28.531 us | 2.44% | 1.130 us | 4.12% | SLOW |
| I8 | 2^20 | 0.5 | 39.896 us | 7.46% | 28.999 us | 2.56% | -10.897 us | -27.31% | FAST |
| I8 | 2^24 | 0.5 | 44.811 us | 5.06% | 29.756 us | 5.36% | -15.055 us | -33.60% | FAST |
| I8 | 2^28 | 0.5 | 44.868 us | 3.90% | 29.911 us | 5.08% | -14.957 us | -33.34% | FAST |
| I8 | 2^16 | 0.01 | 23.143 us | 7.23% | 28.647 us | 4.68% | 5.505 us | 23.79% | SLOW |
| I8 | 2^20 | 0.01 | 36.946 us | 11.22% | 28.920 us | 5.03% | -8.026 us | -21.72% | FAST |
| I8 | 2^24 | 0.01 | 44.058 us | 2.89% | 29.548 us | 2.72% | -14.510 us | -32.93% | FAST |
| I8 | 2^28 | 0.01 | 44.072 us | 3.32% | 29.509 us | 3.36% | -14.563 us | -33.04% | FAST |
| I16 | 2^16 | 1 | 22.837 us | 9.70% | 20.372 us | 7.05% | -2.465 us | -10.79% | FAST |
| I16 | 2^20 | 1 | 40.235 us | 8.01% | 21.322 us | 3.39% | -18.912 us | -47.00% | FAST |
| I16 | 2^24 | 1 | 57.594 us | 5.41% | 29.454 us | 4.29% | -28.140 us | -48.86% | FAST |
| I16 | 2^28 | 1 | 57.373 us | 5.21% | 29.337 us | 4.99% | -28.036 us | -48.87% | FAST |
| I16 | 2^16 | 0.5 | 23.358 us | 14.01% | 20.446 us | 6.87% | -2.912 us | -12.47% | FAST |
| I16 | 2^20 | 0.5 | 41.257 us | 6.04% | 21.547 us | 4.95% | -19.709 us | -47.77% | FAST |
| I16 | 2^24 | 0.5 | 56.872 us | 5.06% | 29.374 us | 5.70% | -27.498 us | -48.35% | FAST |
| I16 | 2^28 | 0.5 | 56.533 us | 5.41% | 29.575 us | 5.09% | -26.958 us | -47.69% | FAST |
| I16 | 2^16 | 0.01 | 31.402 us | 8.20% | 20.423 us | 6.35% | -10.979 us | -34.96% | FAST |
| I16 | 2^20 | 0.01 | 41.088 us | 5.79% | 21.274 us | 4.43% | -19.814 us | -48.22% | FAST |
| I16 | 2^24 | 0.01 | 57.190 us | 5.28% | 29.204 us | 3.89% | -27.986 us | -48.94% | FAST |
| I16 | 2^28 | 0.01 | 57.313 us | 4.85% | 29.569 us | 3.47% | -27.743 us | -48.41% | FAST |
| I32 | 2^16 | 1 | 26.414 us | 15.98% | 20.829 us | 6.10% | -5.585 us | -21.14% | FAST |
| I32 | 2^20 | 1 | 37.743 us | 10.71% | 27.640 us | 11.59% | -10.103 us | -26.77% | FAST |
| I32 | 2^24 | 1 | 127.694 us | 3.02% | 120.244 us | 2.42% | -7.450 us | -5.83% | FAST |
| I32 | 2^28 | 1 | 1.542 ms | 0.26% | 1.558 ms | 0.21% | 16.491 us | 1.07% | SLOW |
| I32 | 2^16 | 0.5 | 27.423 us | 17.77% | 20.860 us | 7.99% | -6.563 us | -23.93% | FAST |
| I32 | 2^20 | 0.5 | 42.098 us | 8.72% | 27.065 us | 12.18% | -15.033 us | -35.71% | FAST |
| I32 | 2^24 | 0.5 | 132.868 us | 1.94% | 120.489 us | 2.26% | -12.379 us | -9.32% | FAST |
| I32 | 2^28 | 0.5 | 1.544 ms | 0.20% | 1.558 ms | 0.21% | 14.199 us | 0.92% | SLOW |
| I32 | 2^16 | 0.01 | 33.136 us | 10.97% | 20.949 us | 3.64% | -12.188 us | -36.78% | FAST |
| I32 | 2^20 | 0.01 | 45.060 us | 3.52% | 27.374 us | 11.52% | -17.686 us | -39.25% | FAST |
| I32 | 2^24 | 0.01 | 140.128 us | 1.73% | 120.500 us | 2.23% | -19.628 us | -14.01% | FAST |
| I32 | 2^28 | 0.01 | 1.575 ms | 0.23% | 1.558 ms | 0.20% | -17.192 us | -1.09% | FAST |
| I64 | 2^16 | 1 | 28.096 us | 15.99% | 20.761 us | 10.21% | -7.335 us | -26.11% | FAST |
| I64 | 2^20 | 1 | 43.207 us | 8.04% | 32.163 us | 4.09% | -11.044 us | -25.56% | FAST |
| I64 | 2^24 | 1 | 221.420 us | 1.57% | 213.216 us | 0.93% | -8.204 us | -3.71% | FAST |
| I64 | 2^28 | 1 | 3.045 ms | 0.13% | 3.053 ms | 0.10% | 8.694 us | 0.29% | SLOW |
| I64 | 2^16 | 0.5 | 29.339 us | 17.37% | 20.227 us | 13.33% | -9.112 us | -31.06% | FAST |
| I64 | 2^20 | 0.5 | 49.119 us | 5.23% | 31.783 us | 3.12% | -17.336 us | -35.29% | FAST |
| I64 | 2^24 | 0.5 | 225.832 us | 1.30% | 212.691 us | 1.01% | -13.141 us | -5.82% | FAST |
| I64 | 2^28 | 0.5 | 3.051 ms | 0.12% | 3.054 ms | 0.10% | 2.910 us | 0.10% | SAME |
| I64 | 2^16 | 0.01 | 33.130 us | 15.74% | 20.375 us | 12.36% | -12.755 us | -38.50% | FAST |
| I64 | 2^20 | 0.01 | 59.184 us | 3.94% | 31.743 us | 3.41% | -27.442 us | -46.37% | FAST |
| I64 | 2^24 | 0.01 | 232.287 us | 1.17% | 212.891 us | 0.99% | -19.396 us | -8.35% | FAST |
| I64 | 2^28 | 0.01 | 3.067 ms | 0.17% | 3.054 ms | 0.10% | -13.476 us | -0.44% | FAST |
| I128 | 2^16 | 1 | 31.740 us | 14.75% | 19.700 us | 13.59% | -12.041 us | -37.93% | FAST |
| I128 | 2^20 | 1 | 57.898 us | 5.94% | 45.876 us | 2.75% | -12.022 us | -20.76% | FAST |
| I128 | 2^24 | 1 | 410.712 us | 0.84% | 399.823 us | 0.50% | -10.889 us | -2.65% | FAST |
| I128 | 2^28 | 1 | 6.063 ms | 0.07% | 6.068 ms | 0.06% | 5.202 us | 0.09% | SLOW |
| I128 | 2^16 | 0.5 | 32.399 us | 15.57% | 19.633 us | 13.80% | -12.765 us | -39.40% | FAST |
| I128 | 2^20 | 0.5 | 63.398 us | 4.80% | 45.724 us | 3.30% | -17.673 us | -27.88% | FAST |
| I128 | 2^24 | 0.5 | 415.465 us | 0.64% | 399.304 us | 0.56% | -16.160 us | -3.89% | FAST |
| I128 | 2^28 | 0.5 | 6.069 ms | 0.07% | 6.069 ms | 0.08% | 0.645 us | 0.01% | SAME |
| I128 | 2^16 | 0.01 | 35.523 us | 11.93% | 20.166 us | 10.62% | -15.357 us | -43.23% | FAST |
| I128 | 2^20 | 0.01 | 70.409 us | 4.16% | 46.151 us | 4.18% | -24.258 us | -34.45% | FAST |
| I128 | 2^24 | 0.01 | 419.973 us | 0.70% | 399.424 us | 0.54% | -20.549 us | -4.89% | FAST |
| I128 | 2^28 | 0.01 | 6.064 ms | 0.06% | 6.069 ms | 0.07% | 5.563 us | 0.09% | SLOW |
| F32 | 2^16 | 1 | 28.074 us | 17.49% | 20.866 us | 3.64% | -7.208 us | -25.68% | FAST |
| F32 | 2^20 | 1 | 38.896 us | 10.61% | 27.617 us | 11.05% | -11.279 us | -29.00% | FAST |
| F32 | 2^24 | 1 | 128.001 us | 3.22% | 121.680 us | 3.21% | -6.321 us | -4.94% | FAST |
| F32 | 2^28 | 1 | 1.541 ms | 0.27% | 1.559 ms | 0.21% | 18.025 us | 1.17% | SLOW |
| F32 | 2^16 | 0.5 | 27.228 us | 17.89% | 21.872 us | 7.13% | -5.356 us | -19.67% | FAST |
| F32 | 2^20 | 0.5 | 42.419 us | 7.58% | 27.866 us | 11.23% | -14.553 us | -34.31% | FAST |
| F32 | 2^24 | 0.5 | 133.476 us | 1.56% | 120.766 us | 2.28% | -12.711 us | -9.52% | FAST |
| F32 | 2^28 | 0.5 | 1.547 ms | 0.23% | 1.560 ms | 0.22% | 12.914 us | 0.83% | SLOW |
| F32 | 2^16 | 0.01 | 33.216 us | 11.36% | 21.204 us | 3.59% | -12.012 us | -36.16% | FAST |
| F32 | 2^20 | 0.01 | 44.582 us | 3.13% | 28.483 us | 9.93% | -16.099 us | -36.11% | FAST |
| F32 | 2^24 | 0.01 | 140.023 us | 2.55% | 120.785 us | 2.33% | -19.237 us | -13.74% | FAST |
| F32 | 2^28 | 0.01 | 1.576 ms | 0.23% | 1.560 ms | 0.22% | -16.229 us | -1.03% | FAST |
| F64 | 2^16 | 1 | 29.611 us | 17.77% | 21.021 us | 5.69% | -8.590 us | -29.01% | FAST |
| F64 | 2^20 | 1 | 43.189 us | 8.08% | 32.027 us | 4.00% | -11.162 us | -25.84% | FAST |
| F64 | 2^24 | 1 | 222.566 us | 1.68% | 212.886 us | 1.09% | -9.680 us | -4.35% | FAST |
| F64 | 2^28 | 1 | 3.049 ms | 0.13% | 3.064 ms | 0.13% | 15.418 us | 0.51% | SLOW |
| F64 | 2^16 | 0.5 | 29.851 us | 17.14% | 20.992 us | 8.56% | -8.858 us | -29.68% | FAST |
| F64 | 2^20 | 0.5 | 49.574 us | 3.53% | 32.210 us | 4.72% | -17.363 us | -35.03% | FAST |
| F64 | 2^24 | 0.5 | 227.709 us | 1.17% | 212.638 us | 0.93% | -15.071 us | -6.62% | FAST |
| F64 | 2^28 | 0.5 | 3.056 ms | 0.11% | 3.064 ms | 0.14% | 8.216 us | 0.27% | SLOW |
| F64 | 2^16 | 0.01 | 34.506 us | 11.54% | 20.874 us | 6.80% | -13.632 us | -39.51% | FAST |
| F64 | 2^20 | 0.01 | 59.260 us | 3.68% | 31.868 us | 4.18% | -27.392 us | -46.22% | FAST |
| F64 | 2^24 | 0.01 | 232.807 us | 1.38% | 212.515 us | 0.91% | -20.291 us | -8.72% | FAST |
| F64 | 2^28 | 0.01 | 3.070 ms | 0.13% | 3.064 ms | 0.13% | -5.700 us | -0.19% | FAST |
Implementing
cuda::std::is_partitionedI foundthrust::is_partitionedto be surprisingly slower.We should change the implementation to the one the PSTL uses