Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds new virtual shared memory facility to DeviceSelect::UniqueByKey #1197

Merged
merged 8 commits into from
Dec 15, 2023

Conversation

elstehle
Copy link
Collaborator

@elstehle elstehle commented Dec 8, 2023

Description

Closes #159.

This PR adds the new virtual shared memory facility to DeviceSelect::UniqueByKey and adds tests for vsmem and fallback policies.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@elstehle elstehle requested review from a team as code owners December 8, 2023 17:08
@elstehle elstehle force-pushed the enh/unique-by-key-vsmem branch 3 times, most recently from cfc077f to 7e302bc Compare December 9, 2023 21:03
Copy link
Collaborator

@gevtushenko gevtushenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for benchmarking results to approve this PR. @elstehle are you going to transition thrust implementation as part of this or a follow up PR?

cub/test/test_device_merge_sort.cuh Outdated Show resolved Hide resolved
cub/test/c2h/huge_type.cuh Outdated Show resolved Hide resolved
cub/cub/device/dispatch/dispatch_unique_by_key.cuh Outdated Show resolved Hide resolved
@elstehle
Copy link
Collaborator Author

Sass for SM 70 on all our unique_by_key benchmarks (non-virtual shared memory) compares equal except for the kernel signature that has an extra NS0_6detail7vsmem_tE parameter, as expected. As a result, the benchmarks also compare equal within the range of noise.

sass_old_new_sm70.zip

cub.bench.select.unique_by_key.base ## [0] Tesla V100-SXM2-32GB
KeyT{ct} ValueT{ct} OffsetT{ct} Elements{io} MaxSegSize Ref Time Ref Noise Cmp Time Cmp Noise Diff %Diff Status
I8 I8 I32 2^16 2^1 10.070 us 6.33% 10.186 us 5.82% 0.116 us 1.15% PASS
I8 I8 I32 2^20 2^1 18.005 us 3.04% 18.142 us 2.95% 0.137 us 0.76% PASS
I8 I8 I32 2^24 2^1 157.141 us 1.02% 157.194 us 1.04% 0.053 us 0.03% PASS
I8 I8 I32 2^28 2^1 2.364 ms 0.50% 2.364 ms 0.50% -0.531 us -0.02% PASS
I8 I8 I32 2^16 2^4 9.572 us 5.11% 9.656 us 5.27% 0.083 us 0.87% PASS
I8 I8 I32 2^20 2^4 17.283 us 2.88% 17.380 us 2.74% 0.097 us 0.56% PASS
I8 I8 I32 2^24 2^4 126.718 us 0.70% 126.842 us 0.71% 0.124 us 0.10% PASS
I8 I8 I32 2^28 2^4 1.864 ms 0.58% 1.863 ms 0.58% -0.730 us -0.04% PASS
I8 I8 I32 2^16 2^8 9.514 us 4.92% 9.591 us 5.16% 0.077 us 0.81% PASS
I8 I8 I32 2^20 2^8 17.208 us 2.98% 17.296 us 2.86% 0.088 us 0.51% PASS
I8 I8 I32 2^24 2^8 122.472 us 0.68% 122.415 us 0.68% -0.058 us -0.05% PASS
I8 I8 I32 2^28 2^8 1.772 ms 0.55% 1.772 ms 0.55% -0.050 us -0.00% PASS
I8 I16 I32 2^16 2^1 10.383 us 4.23% 10.386 us 4.63% 0.003 us 0.03% PASS
I8 I16 I32 2^20 2^1 19.660 us 2.85% 19.598 us 2.87% -0.061 us -0.31% PASS
I8 I16 I32 2^24 2^1 171.195 us 1.20% 171.154 us 1.26% -0.041 us -0.02% PASS
I8 I16 I32 2^28 2^1 2.619 ms 0.45% 2.620 ms 0.44% 1.067 us 0.04% PASS
I8 I16 I32 2^16 2^4 10.037 us 5.13% 9.955 us 5.31% -0.082 us -0.82% PASS
I8 I16 I32 2^20 2^4 18.445 us 2.76% 18.365 us 2.82% -0.080 us -0.43% PASS
I8 I16 I32 2^24 2^4 135.371 us 0.92% 135.169 us 0.89% -0.202 us -0.15% PASS
I8 I16 I32 2^28 2^4 1.992 ms 0.66% 1.992 ms 0.67% -0.175 us -0.01% PASS
I8 I16 I32 2^16 2^8 10.023 us 4.95% 9.927 us 5.13% -0.096 us -0.96% PASS
I8 I16 I32 2^20 2^8 18.397 us 2.93% 18.300 us 3.07% -0.097 us -0.53% PASS
I8 I16 I32 2^24 2^8 128.196 us 0.83% 128.070 us 0.85% -0.126 us -0.10% PASS
I8 I16 I32 2^28 2^8 1.848 ms 0.66% 1.848 ms 0.67% -0.636 us -0.03% PASS
I8 I32 I32 2^16 2^1 10.334 us 4.02% 10.119 us 3.77% -0.215 us -2.08% PASS
I8 I32 I32 2^20 2^1 23.834 us 3.83% 23.811 us 3.88% -0.023 us -0.09% PASS
I8 I32 I32 2^24 2^1 214.627 us 1.35% 214.636 us 1.35% 0.008 us 0.00% PASS
I8 I32 I32 2^28 2^1 3.279 ms 1.17% 3.279 ms 1.17% -0.114 us -0.00% PASS
I8 I32 I32 2^16 2^4 9.708 us 5.34% 9.808 us 5.25% 0.100 us 1.03% PASS
I8 I32 I32 2^20 2^4 22.007 us 3.19% 22.103 us 3.17% 0.096 us 0.44% PASS
I8 I32 I32 2^24 2^4 165.137 us 1.08% 165.089 us 1.09% -0.047 us -0.03% PASS
I8 I32 I32 2^28 2^4 2.436 ms 0.84% 2.435 ms 0.85% -0.086 us -0.00% PASS
I8 I32 I32 2^16 2^8 9.698 us 5.33% 9.798 us 5.27% 0.099 us 1.02% PASS
I8 I32 I32 2^20 2^8 21.812 us 3.01% 21.888 us 3.00% 0.076 us 0.35% PASS
I8 I32 I32 2^24 2^8 145.354 us 1.18% 145.420 us 1.15% 0.066 us 0.05% PASS
I8 I32 I32 2^28 2^8 2.074 ms 1.20% 2.074 ms 1.20% 0.102 us 0.00% PASS
I8 I64 I32 2^16 2^1 11.167 us 4.36% 11.104 us 3.93% -0.064 us -0.57% PASS
I8 I64 I32 2^20 2^1 31.923 us 3.41% 31.979 us 3.39% 0.056 us 0.18% PASS
I8 I64 I32 2^24 2^1 349.280 us 1.00% 349.323 us 1.00% 0.042 us 0.01% PASS
I8 I64 I32 2^28 2^1 5.424 ms 1.03% 5.424 ms 1.04% 0.276 us 0.01% PASS
I8 I64 I32 2^16 2^4 10.496 us 4.33% 10.603 us 4.71% 0.107 us 1.02% PASS
I8 I64 I32 2^20 2^4 29.662 us 3.07% 29.751 us 3.05% 0.089 us 0.30% PASS
I8 I64 I32 2^24 2^4 257.070 us 0.93% 257.047 us 0.93% -0.023 us -0.01% PASS
I8 I64 I32 2^28 2^4 3.895 ms 0.69% 3.895 ms 0.69% 0.193 us 0.00% PASS
I8 I64 I32 2^16 2^8 10.538 us 4.53% 10.514 us 4.41% -0.024 us -0.23% PASS
I8 I64 I32 2^20 2^8 29.856 us 3.08% 29.817 us 3.04% -0.039 us -0.13% PASS
I8 I64 I32 2^24 2^8 217.487 us 1.10% 217.413 us 1.09% -0.075 us -0.03% PASS
I8 I64 I32 2^28 2^8 3.187 ms 1.16% 3.187 ms 1.16% 0.098 us 0.00% PASS
I8 I128 I32 2^16 2^1 12.025 us 4.34% 12.025 us 4.24% 0.000 us 0.00% PASS
I8 I128 I32 2^20 2^1 49.648 us 2.05% 49.595 us 2.00% -0.053 us -0.11% PASS
I8 I128 I32 2^24 2^1 623.626 us 0.72% 624.586 us 0.72% 0.959 us 0.15% PASS
I8 I128 I32 2^28 2^1 9.813 ms 0.92% 9.813 ms 0.92% 0.200 us 0.00% PASS
I8 I128 I32 2^16 2^4 11.644 us 4.37% 11.528 us 4.02% -0.116 us -1.00% PASS
I8 I128 I32 2^20 2^4 42.288 us 1.95% 42.233 us 1.97% -0.055 us -0.13% PASS
I8 I128 I32 2^24 2^4 454.917 us 0.66% 454.865 us 0.67% -0.053 us -0.01% PASS
I8 I128 I32 2^28 2^4 7.062 ms 0.59% 7.061 ms 0.60% -0.248 us -0.00% PASS
I8 I128 I32 2^16 2^8 11.467 us 3.88% 11.461 us 3.86% -0.006 us -0.05% PASS
I8 I128 I32 2^20 2^8 40.902 us 1.92% 40.879 us 1.89% -0.023 us -0.06% PASS
I8 I128 I32 2^24 2^8 375.495 us 0.86% 375.966 us 0.84% 0.471 us 0.13% PASS
I8 I128 I32 2^28 2^8 5.727 ms 0.98% 5.727 ms 0.98% 0.046 us 0.00% PASS
I8 F32 I32 2^16 2^1 10.155 us 4.14% 10.178 us 3.95% 0.024 us 0.24% PASS
I8 F32 I32 2^20 2^1 23.852 us 3.82% 23.867 us 3.83% 0.015 us 0.06% PASS
I8 F32 I32 2^24 2^1 214.609 us 1.32% 214.590 us 1.33% -0.019 us -0.01% PASS
I8 F32 I32 2^28 2^1 3.279 ms 1.17% 3.279 ms 1.17% -0.046 us -0.00% PASS
I8 F32 I32 2^16 2^4 9.725 us 5.33% 9.719 us 5.34% -0.006 us -0.06% PASS
I8 F32 I32 2^20 2^4 22.106 us 3.13% 22.123 us 3.13% 0.017 us 0.08% PASS
I8 F32 I32 2^24 2^4 165.237 us 1.08% 165.114 us 1.05% -0.123 us -0.07% PASS
I8 F32 I32 2^28 2^4 2.435 ms 0.84% 2.435 ms 0.84% -0.082 us -0.00% PASS
I8 F32 I32 2^16 2^8 9.690 us 5.32% 9.673 us 5.32% -0.017 us -0.18% PASS
I8 F32 I32 2^20 2^8 21.847 us 2.99% 21.794 us 2.98% -0.053 us -0.24% PASS
I8 F32 I32 2^24 2^8 145.506 us 1.19% 145.410 us 1.16% -0.095 us -0.07% PASS
I8 F32 I32 2^28 2^8 2.074 ms 1.20% 2.074 ms 1.19% 0.080 us 0.00% PASS
I8 F64 I32 2^16 2^1 11.119 us 4.48% 11.116 us 4.33% -0.003 us -0.03% PASS
I8 F64 I32 2^20 2^1 31.947 us 3.28% 32.360 us 3.41% 0.413 us 1.29% PASS
I8 F64 I32 2^24 2^1 349.427 us 1.01% 349.288 us 1.00% -0.139 us -0.04% PASS
I8 F64 I32 2^28 2^1 5.424 ms 1.04% 5.424 ms 1.04% 0.006 us 0.00% PASS
I8 F64 I32 2^16 2^4 10.691 us 4.89% 10.601 us 4.71% -0.090 us -0.84% PASS
I8 F64 I32 2^20 2^4 29.760 us 3.08% 29.747 us 3.06% -0.014 us -0.05% PASS
I8 F64 I32 2^24 2^4 257.087 us 0.92% 256.942 us 0.93% -0.145 us -0.06% PASS
I8 F64 I32 2^28 2^4 3.895 ms 0.69% 3.895 ms 0.69% 0.192 us 0.00% PASS
I8 F64 I32 2^16 2^8 10.558 us 4.57% 10.541 us 4.54% -0.016 us -0.15% PASS
I8 F64 I32 2^20 2^8 29.890 us 2.98% 29.850 us 3.01% -0.040 us -0.13% PASS
I8 F64 I32 2^24 2^8 217.426 us 1.10% 217.551 us 1.06% 0.126 us 0.06% PASS
I8 F64 I32 2^28 2^8 3.187 ms 1.16% 3.187 ms 1.17% 0.086 us 0.00% PASS
I8 C64 I32 2^16 2^1 11.099 us 4.28% 11.018 us 4.33% -0.081 us -0.73% PASS
I8 C64 I32 2^20 2^1 31.924 us 3.34% 31.973 us 3.28% 0.049 us 0.15% PASS
I8 C64 I32 2^24 2^1 348.835 us 1.00% 348.831 us 1.02% -0.004 us -0.00% PASS
I8 C64 I32 2^28 2^1 5.426 ms 1.04% 5.426 ms 1.04% 0.236 us 0.00% PASS
I8 C64 I32 2^16 2^4 10.500 us 9.52% 10.528 us 4.49% 0.028 us 0.27% PASS
I8 C64 I32 2^20 2^4 29.708 us 4.48% 29.686 us 3.03% -0.022 us -0.07% PASS
I8 C64 I32 2^24 2^4 257.585 us 0.91% 257.418 us 0.89% -0.167 us -0.06% PASS
I8 C64 I32 2^28 2^4 3.898 ms 0.69% 3.898 ms 0.69% -0.223 us -0.01% PASS
I8 C64 I32 2^16 2^8 10.565 us 9.24% 10.478 us 4.30% -0.087 us -0.82% PASS
I8 C64 I32 2^20 2^8 29.883 us 4.30% 29.733 us 3.03% -0.150 us -0.50% PASS
I8 C64 I32 2^24 2^8 216.410 us 1.20% 216.212 us 1.08% -0.198 us -0.09% PASS
I8 C64 I32 2^28 2^8 3.187 ms 1.17% 3.186 ms 1.17% -0.088 us -0.00% PASS
I16 I8 I32 2^16 2^1 10.417 us 10.19% 10.291 us 4.22% -0.126 us -1.21% PASS
I16 I8 I32 2^20 2^1 19.924 us 5.15% 19.794 us 2.85% -0.130 us -0.65% PASS
I16 I8 I32 2^24 2^1 173.375 us 1.29% 173.097 us 1.16% -0.279 us -0.16% PASS
I16 I8 I32 2^28 2^1 2.643 ms 0.38% 2.642 ms 0.36% -0.744 us -0.03% PASS
I16 I8 I32 2^16 2^4 10.114 us 9.95% 10.004 us 4.79% -0.110 us -1.09% PASS
I16 I8 I32 2^20 2^4 18.588 us 3.84% 18.504 us 2.67% -0.084 us -0.45% PASS
I16 I8 I32 2^24 2^4 137.845 us 0.88% 137.797 us 0.85% -0.048 us -0.03% PASS
I16 I8 I32 2^28 2^4 2.018 ms 0.64% 2.018 ms 0.64% -0.185 us -0.01% PASS
I16 I8 I32 2^16 2^8 10.072 us 9.91% 9.964 us 4.92% -0.108 us -1.07% PASS
I16 I8 I32 2^20 2^8 18.628 us 5.85% 18.502 us 2.82% -0.126 us -0.68% PASS
I16 I8 I32 2^24 2^8 129.966 us 1.03% 129.999 us 0.83% 0.033 us 0.03% PASS
I16 I8 I32 2^28 2^8 1.876 ms 0.63% 1.876 ms 0.63% -0.108 us -0.01% PASS
I16 I16 I32 2^16 2^1 10.118 us 10.07% 10.109 us 3.97% -0.009 us -0.09% PASS
I16 I16 I32 2^20 2^1 21.861 us 5.05% 21.989 us 3.37% 0.128 us 0.59% PASS
I16 I16 I32 2^24 2^1 195.482 us 1.62% 195.740 us 1.59% 0.258 us 0.13% PASS
I16 I16 I32 2^28 2^1 2.870 ms 0.94% 2.870 ms 0.95% 0.205 us 0.01% PASS
I16 I16 I32 2^16 2^4 9.812 us 9.97% 9.822 us 5.19% 0.010 us 0.10% PASS
I16 I16 I32 2^20 2^4 20.453 us 5.09% 20.468 us 2.87% 0.015 us 0.07% PASS
I16 I16 I32 2^24 2^4 150.766 us 1.20% 150.646 us 1.03% -0.120 us -0.08% PASS
I16 I16 I32 2^28 2^4 2.171 ms 0.82% 2.171 ms 0.81% -0.019 us -0.00% PASS
I16 I16 I32 2^16 2^8 9.855 us 10.75% 9.693 us 5.32% -0.162 us -1.64% PASS
I16 I16 I32 2^20 2^8 20.494 us 5.31% 20.366 us 2.87% -0.127 us -0.62% PASS
I16 I16 I32 2^24 2^8 139.899 us 1.10% 139.711 us 0.92% -0.188 us -0.13% PASS
I16 I16 I32 2^28 2^8 1.942 ms 0.81% 1.942 ms 0.81% -0.051 us -0.00% PASS
I16 I32 I32 2^16 2^1 10.488 us 9.50% 10.344 us 3.47% -0.144 us -1.37% PASS
I16 I32 I32 2^20 2^1 25.805 us 4.41% 25.663 us 3.11% -0.142 us -0.55% PASS
I16 I32 I32 2^24 2^1 250.532 us 1.23% 250.553 us 1.24% 0.021 us 0.01% PASS
I16 I32 I32 2^28 2^1 3.853 ms 1.17% 3.853 ms 1.17% 0.062 us 0.00% PASS
I16 I32 I32 2^16 2^4 10.027 us 11.08% 10.029 us 4.56% 0.002 us 0.02% PASS
I16 I32 I32 2^20 2^4 23.862 us 5.14% 23.814 us 2.73% -0.049 us -0.20% PASS
I16 I32 I32 2^24 2^4 187.112 us 1.12% 187.082 us 1.07% -0.030 us -0.02% PASS
I16 I32 I32 2^28 2^4 2.773 ms 0.84% 2.773 ms 0.84% -0.128 us -0.00% PASS
I16 I32 I32 2^16 2^8 9.992 us 10.46% 9.975 us 4.79% -0.017 us -0.17% PASS
I16 I32 I32 2^20 2^8 23.912 us 3.00% 23.953 us 2.69% 0.042 us 0.17% PASS
I16 I32 I32 2^24 2^8 161.127 us 1.15% 161.135 us 1.15% 0.008 us 0.01% PASS
I16 I32 I32 2^28 2^8 2.306 ms 1.29% 2.306 ms 1.28% 0.011 us 0.00% PASS
I16 I64 I32 2^16 2^1 11.153 us 9.40% 11.072 us 4.09% -0.080 us -0.72% PASS
I16 I64 I32 2^20 2^1 34.549 us 4.12% 35.060 us 3.42% 0.510 us 1.48% PASS
I16 I64 I32 2^24 2^1 387.618 us 1.02% 387.677 us 0.96% 0.059 us 0.02% PASS
I16 I64 I32 2^28 2^1 6.031 ms 1.05% 6.031 ms 1.05% 0.051 us 0.00% PASS
I16 I64 I32 2^16 2^4 10.839 us 8.30% 10.833 us 4.80% -0.007 us -0.06% PASS
I16 I64 I32 2^20 2^4 32.108 us 3.62% 32.103 us 3.06% -0.005 us -0.02% PASS
I16 I64 I32 2^24 2^4 282.612 us 0.91% 282.285 us 0.86% -0.328 us -0.12% PASS
I16 I64 I32 2^28 2^4 4.290 ms 0.65% 4.290 ms 0.65% -0.009 us -0.00% PASS
I16 I64 I32 2^16 2^8 10.814 us 9.82% 10.647 us 4.80% -0.168 us -1.55% PASS
I16 I64 I32 2^20 2^8 32.556 us 4.05% 32.338 us 2.85% -0.218 us -0.67% PASS
I16 I64 I32 2^24 2^8 235.706 us 1.16% 236.209 us 1.05% 0.503 us 0.21% PASS
I16 I64 I32 2^28 2^8 3.478 ms 1.14% 3.478 ms 1.14% 0.106 us 0.00% PASS
I16 I128 I32 2^16 2^1 12.313 us 7.68% 12.168 us 3.81% -0.146 us -1.18% PASS
I16 I128 I32 2^20 2^1 53.072 us 2.57% 52.856 us 2.11% -0.216 us -0.41% PASS
I16 I128 I32 2^24 2^1 661.450 us 0.74% 661.215 us 0.71% -0.235 us -0.04% PASS
I16 I128 I32 2^28 2^1 10.410 ms 0.94% 10.410 ms 0.94% -0.342 us -0.00% PASS
I16 I128 I32 2^16 2^4 11.741 us 9.33% 11.587 us 4.24% -0.154 us -1.31% PASS
I16 I128 I32 2^20 2^4 44.364 us 2.96% 44.182 us 2.04% -0.183 us -0.41% PASS
I16 I128 I32 2^24 2^4 483.902 us 0.72% 483.752 us 0.68% -0.150 us -0.03% PASS
I16 I128 I32 2^28 2^4 7.512 ms 0.61% 7.512 ms 0.61% -0.301 us -0.00% PASS
I16 I128 I32 2^16 2^8 11.715 us 9.18% 11.569 us 4.21% -0.146 us -1.25% PASS
I16 I128 I32 2^20 2^8 42.905 us 2.81% 42.773 us 1.92% -0.132 us -0.31% PASS
I16 I128 I32 2^24 2^8 397.101 us 0.88% 396.824 us 0.84% -0.277 us -0.07% PASS
I16 I128 I32 2^28 2^8 6.060 ms 0.98% 6.060 ms 0.98% 0.059 us 0.00% PASS
I16 F32 I32 2^16 2^1 10.347 us 7.60% 10.520 us 4.55% 0.173 us 1.67% PASS
I16 F32 I32 2^20 2^1 25.677 us 3.74% 25.744 us 3.12% 0.067 us 0.26% PASS
I16 F32 I32 2^24 2^1 250.583 us 1.27% 250.570 us 1.25% -0.013 us -0.01% PASS
I16 F32 I32 2^28 2^1 3.853 ms 1.17% 3.853 ms 1.17% -0.092 us -0.00% PASS
I16 F32 I32 2^16 2^4 9.998 us 10.23% 10.062 us 4.31% 0.064 us 0.64% PASS
I16 F32 I32 2^20 2^4 23.823 us 4.83% 23.863 us 2.74% 0.041 us 0.17% PASS
I16 F32 I32 2^24 2^4 187.264 us 1.29% 187.111 us 1.05% -0.153 us -0.08% PASS
I16 F32 I32 2^28 2^4 2.773 ms 0.84% 2.773 ms 0.84% 0.243 us 0.01% PASS
I16 F32 I32 2^16 2^8 9.941 us 7.52% 10.012 us 4.58% 0.072 us 0.72% PASS
I16 F32 I32 2^20 2^8 23.972 us 4.62% 24.014 us 2.72% 0.042 us 0.17% PASS
I16 F32 I32 2^24 2^8 161.284 us 1.24% 161.214 us 1.12% -0.070 us -0.04% PASS
I16 F32 I32 2^28 2^8 2.306 ms 1.29% 2.306 ms 1.29% 0.065 us 0.00% PASS
I16 F64 I32 2^16 2^1 11.147 us 9.19% 11.180 us 3.70% 0.034 us 0.30% PASS
I16 F64 I32 2^20 2^1 35.106 us 4.16% 35.205 us 3.43% 0.100 us 0.28% PASS
I16 F64 I32 2^24 2^1 387.684 us 1.01% 387.684 us 0.98% -0.001 us -0.00% PASS
I16 F64 I32 2^28 2^1 6.031 ms 1.05% 6.031 ms 1.05% -0.121 us -0.00% PASS
I16 F64 I32 2^16 2^4 10.952 us 9.21% 10.804 us 4.83% -0.148 us -1.35% PASS
I16 F64 I32 2^20 2^4 32.238 us 4.10% 32.056 us 2.94% -0.183 us -0.57% PASS
I16 F64 I32 2^24 2^4 282.666 us 0.91% 282.475 us 0.84% -0.191 us -0.07% PASS
I16 F64 I32 2^28 2^4 4.290 ms 0.65% 4.289 ms 0.66% -0.202 us -0.00% PASS
I16 F64 I32 2^16 2^8 10.838 us 9.89% 10.682 us 4.85% -0.156 us -1.44% PASS
I16 F64 I32 2^20 2^8 32.538 us 4.08% 32.400 us 2.88% -0.138 us -0.42% PASS
I16 F64 I32 2^24 2^8 236.367 us 1.19% 236.329 us 1.06% -0.038 us -0.02% PASS
I16 F64 I32 2^28 2^8 3.478 ms 1.14% 3.478 ms 1.14% 0.063 us 0.00% PASS
I16 C64 I32 2^16 2^1 11.239 us 8.76% 11.215 us 3.44% -0.024 us -0.21% PASS
I16 C64 I32 2^20 2^1 34.990 us 4.30% 34.999 us 3.37% 0.009 us 0.02% PASS
I16 C64 I32 2^24 2^1 387.730 us 1.03% 387.767 us 0.94% 0.037 us 0.01% PASS
I16 C64 I32 2^28 2^1 6.033 ms 1.05% 6.033 ms 1.04% -0.048 us -0.00% PASS
I16 C64 I32 2^16 2^4 10.892 us 9.89% 10.894 us 4.69% 0.002 us 0.02% PASS
I16 C64 I32 2^20 2^4 32.073 us 4.11% 32.057 us 2.93% -0.016 us -0.05% PASS
I16 C64 I32 2^24 2^4 282.628 us 0.90% 282.568 us 0.87% -0.059 us -0.02% PASS
I16 C64 I32 2^28 2^4 4.293 ms 0.65% 4.293 ms 0.65% 0.099 us 0.00% PASS
I16 C64 I32 2^16 2^8 10.813 us 9.85% 10.679 us 4.84% -0.134 us -1.24% PASS
I16 C64 I32 2^20 2^8 32.499 us 3.82% 32.315 us 2.73% -0.184 us -0.57% PASS
I16 C64 I32 2^24 2^8 235.721 us 1.11% 235.658 us 1.05% -0.062 us -0.03% PASS
I16 C64 I32 2^28 2^8 3.478 ms 1.14% 3.478 ms 1.14% 0.002 us 0.00% PASS
I32 I8 I32 2^16 2^1 10.216 us 7.86% 10.068 us 4.32% -0.148 us -1.45% PASS
I32 I8 I32 2^20 2^1 24.180 us 4.45% 24.036 us 3.72% -0.145 us -0.60% PASS
I32 I8 I32 2^24 2^1 217.028 us 1.39% 216.785 us 1.32% -0.244 us -0.11% PASS
I32 I8 I32 2^28 2^1 3.296 ms 1.13% 3.296 ms 1.13% -0.173 us -0.01% PASS
I32 I8 I32 2^16 2^4 9.840 us 10.67% 9.686 us 5.33% -0.154 us -1.57% PASS
I32 I8 I32 2^20 2^4 22.195 us 4.96% 22.055 us 2.79% -0.140 us -0.63% PASS
I32 I8 I32 2^24 2^4 167.009 us 1.28% 167.011 us 1.14% 0.002 us 0.00% PASS
I32 I8 I32 2^28 2^4 2.460 ms 0.86% 2.460 ms 0.87% -0.242 us -0.01% PASS
I32 I8 I32 2^16 2^8 9.819 us 10.85% 9.666 us 5.31% -0.153 us -1.56% PASS
I32 I8 I32 2^20 2^8 22.021 us 3.37% 21.917 us 2.75% -0.104 us -0.47% PASS
I32 I8 I32 2^24 2^8 146.900 us 1.17% 146.929 us 1.19% 0.029 us 0.02% PASS
I32 I8 I32 2^28 2^8 2.079 ms 1.27% 2.079 ms 1.27% 0.092 us 0.00% PASS
I32 I16 I32 2^16 2^1 10.300 us 9.44% 10.267 us 3.39% -0.034 us -0.33% PASS
I32 I16 I32 2^20 2^1 25.953 us 4.67% 25.902 us 3.31% -0.051 us -0.20% PASS
I32 I16 I32 2^24 2^1 252.758 us 1.28% 252.764 us 1.20% 0.006 us 0.00% PASS
I32 I16 I32 2^28 2^1 3.863 ms 1.17% 3.863 ms 1.15% 0.153 us 0.00% PASS
I32 I16 I32 2^16 2^4 9.977 us 10.84% 9.978 us 4.83% 0.001 us 0.01% PASS
I32 I16 I32 2^20 2^4 23.878 us 4.45% 23.865 us 2.82% -0.013 us -0.05% PASS
I32 I16 I32 2^24 2^4 187.966 us 1.03% 187.947 us 1.07% -0.018 us -0.01% PASS
I32 I16 I32 2^28 2^4 2.793 ms 0.83% 2.793 ms 0.84% -0.110 us -0.00% PASS
I32 I16 I32 2^16 2^8 10.036 us 9.87% 9.908 us 5.01% -0.128 us -1.27% PASS
I32 I16 I32 2^20 2^8 24.118 us 4.88% 23.978 us 2.79% -0.140 us -0.58% PASS
I32 I16 I32 2^24 2^8 161.517 us 1.22% 161.576 us 1.17% 0.059 us 0.04% PASS
I32 I16 I32 2^28 2^8 2.309 ms 1.35% 2.309 ms 1.34% -0.082 us -0.00% PASS
I32 I32 I32 2^16 2^1 10.504 us 9.19% 10.370 us 3.74% -0.135 us -1.28% PASS
I32 I32 I32 2^20 2^1 30.141 us 4.17% 29.983 us 2.80% -0.158 us -0.52% PASS
I32 I32 I32 2^24 2^1 322.604 us 1.14% 322.492 us 1.13% -0.112 us -0.03% PASS
I32 I32 I32 2^28 2^1 5.049 ms 1.20% 5.049 ms 1.20% -0.032 us -0.00% PASS
I32 I32 I32 2^16 2^4 10.101 us 10.09% 9.984 us 4.82% -0.116 us -1.15% PASS
I32 I32 I32 2^20 2^4 27.153 us 4.21% 27.040 us 2.75% -0.113 us -0.41% PASS
I32 I32 I32 2^24 2^4 238.726 us 0.93% 238.507 us 0.84% -0.219 us -0.09% PASS
I32 I32 I32 2^28 2^4 3.663 ms 0.55% 3.663 ms 0.55% -0.083 us -0.00% PASS
I32 I32 I32 2^16 2^8 10.064 us 9.38% 9.952 us 4.94% -0.112 us -1.11% PASS
I32 I32 I32 2^20 2^8 27.067 us 2.84% 27.031 us 2.42% -0.036 us -0.13% PASS
I32 I32 I32 2^24 2^8 194.756 us 1.13% 194.682 us 1.08% -0.074 us -0.04% PASS
I32 I32 I32 2^28 2^8 2.840 ms 1.14% 2.840 ms 1.15% -0.105 us -0.00% PASS
I32 I64 I32 2^16 2^1 11.293 us 8.70% 11.181 us 3.60% -0.112 us -0.99% PASS
I32 I64 I32 2^20 2^1 39.306 us 3.63% 39.245 us 2.77% -0.061 us -0.16% PASS
I32 I64 I32 2^24 2^1 460.303 us 0.96% 460.183 us 0.89% -0.120 us -0.03% PASS
I32 I64 I32 2^28 2^1 7.187 ms 1.08% 7.187 ms 1.08% -0.197 us -0.00% PASS
I32 I64 I32 2^16 2^4 10.795 us 8.43% 10.696 us 4.92% -0.099 us -0.92% PASS
I32 I64 I32 2^20 2^4 34.358 us 3.43% 34.264 us 2.58% -0.094 us -0.27% PASS
I32 I64 I32 2^24 2^4 328.193 us 0.83% 328.183 us 0.76% -0.009 us -0.00% PASS
I32 I64 I32 2^28 2^4 5.027 ms 0.63% 5.027 ms 0.63% 0.201 us 0.00% PASS
I32 I64 I32 2^16 2^8 10.887 us 9.73% 10.760 us 4.94% -0.127 us -1.16% PASS
I32 I64 I32 2^20 2^8 34.731 us 3.72% 34.612 us 2.47% -0.119 us -0.34% PASS
I32 I64 I32 2^24 2^8 270.752 us 1.03% 270.641 us 0.96% -0.111 us -0.04% PASS
I32 I64 I32 2^28 2^8 4.049 ms 1.06% 4.049 ms 1.06% -0.011 us -0.00% PASS
I32 I128 I32 2^16 2^1 12.441 us 8.87% 12.274 us 3.42% -0.167 us -1.34% PASS
I32 I128 I32 2^20 2^1 57.595 us 2.40% 57.473 us 1.76% -0.122 us -0.21% PASS
I32 I128 I32 2^24 2^1 731.580 us 0.69% 731.328 us 0.66% -0.251 us -0.03% PASS
I32 I128 I32 2^28 2^1 11.535 ms 0.98% 11.534 ms 0.98% -0.667 us -0.01% PASS
I32 I128 I32 2^16 2^4 11.778 us 9.29% 11.697 us 4.44% -0.081 us -0.69% PASS
I32 I128 I32 2^20 2^4 47.116 us 2.34% 46.982 us 1.88% -0.134 us -0.28% PASS
I32 I128 I32 2^24 2^4 528.768 us 0.62% 528.610 us 0.62% -0.158 us -0.03% PASS
I32 I128 I32 2^28 2^4 8.247 ms 0.58% 8.247 ms 0.59% -0.011 us -0.00% PASS
I32 I128 I32 2^16 2^8 11.730 us 9.01% 11.620 us 4.30% -0.111 us -0.94% PASS
I32 I128 I32 2^20 2^8 45.295 us 2.62% 45.128 us 1.87% -0.167 us -0.37% PASS
I32 I128 I32 2^24 2^8 433.197 us 0.77% 433.143 us 0.76% -0.054 us -0.01% PASS
I32 I128 I32 2^28 2^8 6.648 ms 0.96% 6.648 ms 0.96% -0.167 us -0.00% PASS
I32 F32 I32 2^16 2^1 10.515 us 9.48% 10.454 us 4.41% -0.062 us -0.59% PASS
I32 F32 I32 2^20 2^1 29.950 us 3.96% 29.941 us 2.83% -0.008 us -0.03% PASS
I32 F32 I32 2^24 2^1 322.503 us 1.18% 322.623 us 1.10% 0.119 us 0.04% PASS
I32 F32 I32 2^28 2^1 5.049 ms 1.20% 5.049 ms 1.20% 0.145 us 0.00% PASS
I32 F32 I32 2^16 2^4 10.043 us 10.84% 10.030 us 4.56% -0.013 us -0.13% PASS
I32 F32 I32 2^20 2^4 27.108 us 4.19% 27.093 us 2.74% -0.015 us -0.06% PASS
I32 F32 I32 2^24 2^4 238.686 us 0.89% 238.706 us 0.85% 0.021 us 0.01% PASS
I32 F32 I32 2^28 2^4 3.663 ms 0.55% 3.663 ms 0.54% 0.007 us 0.00% PASS
I32 F32 I32 2^16 2^8 9.984 us 10.67% 9.991 us 4.79% 0.006 us 0.06% PASS
I32 F32 I32 2^20 2^8 27.066 us 4.21% 27.078 us 2.38% 0.012 us 0.04% PASS
I32 F32 I32 2^24 2^8 194.785 us 1.21% 194.775 us 1.08% -0.011 us -0.01% PASS
I32 F32 I32 2^28 2^8 2.840 ms 1.14% 2.840 ms 1.14% -0.088 us -0.00% PASS
I32 F64 I32 2^16 2^1 11.253 us 8.89% 11.303 us 3.46% 0.050 us 0.44% PASS
I32 F64 I32 2^20 2^1 39.274 us 3.46% 39.305 us 2.76% 0.032 us 0.08% PASS
I32 F64 I32 2^24 2^1 460.140 us 0.91% 460.070 us 0.90% -0.070 us -0.02% PASS
I32 F64 I32 2^28 2^1 7.187 ms 1.08% 7.187 ms 1.08% -0.162 us -0.00% PASS
I32 F64 I32 2^16 2^4 10.871 us 9.41% 10.739 us 4.95% -0.132 us -1.22% PASS
I32 F64 I32 2^20 2^4 34.453 us 3.64% 34.329 us 2.51% -0.124 us -0.36% PASS
I32 F64 I32 2^24 2^4 328.356 us 0.80% 327.937 us 0.79% -0.419 us -0.13% PASS
I32 F64 I32 2^28 2^4 5.027 ms 0.63% 5.027 ms 0.63% -0.154 us -0.00% PASS
I32 F64 I32 2^16 2^8 10.828 us 9.90% 10.703 us 4.93% -0.125 us -1.16% PASS
I32 F64 I32 2^20 2^8 34.626 us 3.62% 34.498 us 2.45% -0.129 us -0.37% PASS
I32 F64 I32 2^24 2^8 270.642 us 1.01% 270.658 us 0.95% 0.016 us 0.01% PASS
I32 F64 I32 2^28 2^8 4.049 ms 1.06% 4.049 ms 1.06% -0.284 us -0.01% PASS
I32 C64 I32 2^16 2^1 11.287 us 8.88% 11.208 us 3.68% -0.079 us -0.70% PASS
I32 C64 I32 2^20 2^1 39.173 us 3.43% 39.100 us 2.70% -0.073 us -0.19% PASS
I32 C64 I32 2^24 2^1 460.417 us 0.91% 460.291 us 0.89% -0.126 us -0.03% PASS
I32 C64 I32 2^28 2^1 7.188 ms 1.09% 7.188 ms 1.09% -0.094 us -0.00% PASS
I32 C64 I32 2^16 2^4 10.886 us 9.82% 10.779 us 4.94% -0.108 us -0.99% PASS
I32 C64 I32 2^20 2^4 34.455 us 3.46% 34.322 us 2.57% -0.132 us -0.38% PASS
I32 C64 I32 2^24 2^4 328.380 us 0.80% 328.474 us 0.75% 0.094 us 0.03% PASS
I32 C64 I32 2^28 2^4 5.028 ms 0.63% 5.028 ms 0.63% -0.161 us -0.00% PASS
I32 C64 I32 2^16 2^8 10.717 us 7.15% 10.786 us 4.94% 0.070 us 0.65% PASS
I32 C64 I32 2^20 2^8 34.567 us 3.43% 34.630 us 2.41% 0.063 us 0.18% PASS
I32 C64 I32 2^24 2^8 270.641 us 1.06% 270.639 us 0.94% -0.003 us -0.00% PASS
I32 C64 I32 2^28 2^8 4.049 ms 1.06% 4.049 ms 1.06% -0.044 us -0.00% PASS
I64 I8 I32 2^16 2^1 11.726 us 8.56% 11.879 us 4.45% 0.153 us 1.30% PASS
I64 I8 I32 2^20 2^1 33.845 us 3.96% 33.894 us 2.92% 0.049 us 0.14% PASS
I64 I8 I32 2^24 2^1 378.014 us 1.28% 378.025 us 1.27% 0.011 us 0.00% PASS
I64 I8 I32 2^28 2^1 5.864 ms 0.79% 5.864 ms 0.79% 0.754 us 0.01% PASS
I64 I8 I32 2^16 2^4 11.500 us 8.48% 11.644 us 4.34% 0.144 us 1.25% PASS
I64 I8 I32 2^20 2^4 31.833 us 3.82% 31.780 us 2.19% -0.054 us -0.17% PASS
I64 I8 I32 2^24 2^4 285.777 us 0.92% 285.816 us 0.80% 0.039 us 0.01% PASS
I64 I8 I32 2^28 2^4 4.363 ms 0.54% 4.363 ms 0.54% 0.034 us 0.00% PASS
I64 I8 I32 2^16 2^8 11.472 us 8.41% 11.647 us 4.34% 0.175 us 1.53% PASS
I64 I8 I32 2^20 2^8 31.914 us 3.52% 31.888 us 2.03% -0.026 us -0.08% PASS
I64 I8 I32 2^24 2^8 254.414 us 0.76% 254.160 us 0.73% -0.254 us -0.10% PASS
I64 I8 I32 2^28 2^8 3.827 ms 0.65% 3.827 ms 0.66% 0.102 us 0.00% PASS
I64 I16 I32 2^16 2^1 11.842 us 8.80% 11.792 us 4.45% -0.050 us -0.42% PASS
I64 I16 I32 2^20 2^1 35.641 us 3.59% 35.745 us 2.41% 0.104 us 0.29% PASS
I64 I16 I32 2^24 2^1 408.040 us 1.05% 408.153 us 0.99% 0.114 us 0.03% PASS
I64 I16 I32 2^28 2^1 6.357 ms 0.91% 6.358 ms 0.90% 0.506 us 0.01% PASS
I64 I16 I32 2^16 2^4 11.566 us 7.91% 11.502 us 3.94% -0.063 us -0.55% PASS
I64 I16 I32 2^20 2^4 32.950 us 2.46% 32.965 us 2.19% 0.015 us 0.05% PASS
I64 I16 I32 2^24 2^4 305.845 us 0.91% 305.747 us 0.88% -0.098 us -0.03% PASS
I64 I16 I32 2^28 2^4 4.664 ms 0.55% 4.664 ms 0.55% 0.388 us 0.01% PASS
I64 I16 I32 2^16 2^8 11.451 us 8.37% 11.427 us 3.67% -0.024 us -0.21% PASS
I64 I16 I32 2^20 2^8 33.212 us 3.34% 33.245 us 2.11% 0.033 us 0.10% PASS
I64 I16 I32 2^24 2^8 266.854 us 0.93% 266.805 us 0.87% -0.049 us -0.02% PASS
I64 I16 I32 2^28 2^8 4.003 ms 0.73% 4.003 ms 0.74% -0.115 us -0.00% PASS
I64 I32 I32 2^16 2^1 12.066 us 8.29% 12.005 us 4.13% -0.061 us -0.51% PASS
I64 I32 I32 2^20 2^1 39.126 us 2.96% 39.066 us 1.85% -0.060 us -0.15% PASS
I64 I32 I32 2^24 2^1 472.637 us 0.89% 472.645 us 0.87% 0.007 us 0.00% PASS
I64 I32 I32 2^28 2^1 7.409 ms 0.98% 7.409 ms 0.98% 0.102 us 0.00% PASS
I64 I32 I32 2^16 2^4 11.939 us 9.10% 11.807 us 4.52% -0.131 us -1.10% PASS
I64 I32 I32 2^20 2^4 35.460 us 3.17% 35.480 us 1.78% 0.020 us 0.06% PASS
I64 I32 I32 2^24 2^4 339.933 us 0.85% 339.982 us 0.81% 0.049 us 0.01% PASS
I64 I32 I32 2^28 2^4 5.222 ms 0.56% 5.222 ms 0.56% 0.448 us 0.01% PASS
I64 I32 I32 2^16 2^8 11.928 us 8.82% 11.791 us 4.52% -0.137 us -1.15% PASS
I64 I32 I32 2^20 2^8 35.592 us 3.06% 35.552 us 1.70% -0.039 us -0.11% PASS
I64 I32 I32 2^24 2^8 289.294 us 0.96% 289.184 us 0.85% -0.110 us -0.04% PASS
I64 I32 I32 2^28 2^8 4.356 ms 0.84% 4.357 ms 0.84% 0.239 us 0.01% PASS
I64 I64 I32 2^16 2^1 12.593 us 8.01% 12.370 us 3.40% -0.223 us -1.77% PASS
I64 I64 I32 2^20 2^1 47.079 us 1.98% 47.051 us 1.55% -0.027 us -0.06% PASS
I64 I64 I32 2^24 2^1 603.788 us 0.76% 603.447 us 0.73% -0.340 us -0.06% PASS
I64 I64 I32 2^28 2^1 9.572 ms 1.02% 9.572 ms 1.01% -0.127 us -0.00% PASS
I64 I64 I32 2^16 2^4 12.305 us 8.19% 12.146 us 3.79% -0.159 us -1.29% PASS
I64 I64 I32 2^20 2^4 41.041 us 2.96% 40.933 us 1.64% -0.108 us -0.26% PASS
I64 I64 I32 2^24 2^4 430.291 us 0.70% 429.936 us 0.67% -0.355 us -0.08% PASS
I64 I64 I32 2^28 2^4 7.010 ms 0.50% 7.010 ms 0.50% -0.547 us -0.01% PASS
I64 I64 I32 2^16 2^8 12.217 us 8.10% 12.007 us 4.19% -0.210 us -1.72% PASS
I64 I64 I32 2^20 2^8 40.725 us 2.70% 40.599 us 1.59% -0.126 us -0.31% PASS
I64 I64 I32 2^24 2^8 360.955 us 0.86% 360.608 us 0.80% -0.347 us -0.10% PASS
I64 I64 I32 2^28 2^8 5.480 ms 0.94% 5.480 ms 0.94% -0.357 us -0.01% PASS
I64 I128 I32 2^16 2^1 13.588 us 7.20% 13.537 us 3.45% -0.050 us -0.37% PASS
I64 I128 I32 2^20 2^1 65.710 us 1.98% 65.592 us 1.54% -0.118 us -0.18% PASS
I64 I128 I32 2^24 2^1 871.973 us 0.58% 871.843 us 0.57% -0.130 us -0.01% PASS
I64 I128 I32 2^28 2^1 13.797 ms 0.98% 13.797 ms 0.99% 0.160 us 0.00% PASS
I64 I128 I32 2^16 2^4 13.296 us 5.17% 13.275 us 3.45% -0.021 us -0.16% PASS
I64 I128 I32 2^20 2^4 53.156 us 2.04% 53.060 us 1.79% -0.096 us -0.18% PASS
I64 I128 I32 2^24 2^4 634.515 us 0.51% 634.411 us 0.50% -0.104 us -0.02% PASS
I64 I128 I32 2^28 2^4 9.937 ms 0.57% 9.938 ms 0.57% 0.295 us 0.00% PASS
I64 I128 I32 2^16 2^8 13.232 us 5.48% 13.199 us 3.57% -0.033 us -0.25% PASS
I64 I128 I32 2^20 2^8 50.870 us 2.10% 50.770 us 1.78% -0.101 us -0.20% PASS
I64 I128 I32 2^24 2^8 520.137 us 0.66% 519.995 us 0.62% -0.142 us -0.03% PASS
I64 I128 I32 2^28 2^8 8.034 ms 0.91% 8.034 ms 0.91% -0.119 us -0.00% PASS
I64 F32 I32 2^16 2^1 12.295 us 8.11% 12.091 us 4.16% -0.204 us -1.66% PASS
I64 F32 I32 2^20 2^1 39.262 us 3.11% 39.103 us 1.83% -0.159 us -0.40% PASS
I64 F32 I32 2^24 2^1 472.882 us 0.90% 472.685 us 0.87% -0.197 us -0.04% PASS
I64 F32 I32 2^28 2^1 7.409 ms 0.97% 7.409 ms 0.97% 0.483 us 0.01% PASS
I64 F32 I32 2^16 2^4 12.021 us 8.58% 11.764 us 4.55% -0.257 us -2.14% PASS
I64 F32 I32 2^20 2^4 35.508 us 2.86% 35.427 us 1.77% -0.081 us -0.23% PASS
I64 F32 I32 2^24 2^4 339.982 us 0.84% 339.854 us 0.81% -0.128 us -0.04% PASS
I64 F32 I32 2^28 2^4 5.222 ms 0.56% 5.222 ms 0.56% -0.038 us -0.00% PASS
I64 F32 I32 2^16 2^8 11.963 us 7.53% 11.769 us 4.51% -0.194 us -1.62% PASS
I64 F32 I32 2^20 2^8 35.632 us 3.09% 35.551 us 1.70% -0.080 us -0.22% PASS
I64 F32 I32 2^24 2^8 289.253 us 0.91% 289.193 us 0.86% -0.059 us -0.02% PASS
I64 F32 I32 2^28 2^8 4.357 ms 0.83% 4.357 ms 0.83% -0.049 us -0.00% PASS
I64 F64 I32 2^16 2^1 12.649 us 8.47% 12.399 us 3.43% -0.250 us -1.98% PASS
I64 F64 I32 2^20 2^1 47.215 us 2.42% 46.982 us 1.54% -0.233 us -0.49% PASS
I64 F64 I32 2^24 2^1 603.693 us 0.77% 603.261 us 0.73% -0.431 us -0.07% PASS
I64 F64 I32 2^28 2^1 9.572 ms 1.01% 9.572 ms 1.02% -0.256 us -0.00% PASS
I64 F64 I32 2^16 2^4 12.272 us 8.64% 12.035 us 4.10% -0.238 us -1.94% PASS
I64 F64 I32 2^20 2^4 41.012 us 2.66% 40.839 us 1.66% -0.173 us -0.42% PASS
I64 F64 I32 2^24 2^4 430.192 us 0.67% 429.984 us 0.66% -0.208 us -0.05% PASS
I64 F64 I32 2^28 2^4 7.010 ms 0.50% 7.010 ms 0.50% -0.309 us -0.00% PASS
I64 F64 I32 2^16 2^8 12.242 us 5.09% 12.031 us 4.13% -0.212 us -1.73% PASS
I64 F64 I32 2^20 2^8 40.736 us 1.54% 40.554 us 1.58% -0.182 us -0.45% PASS
I64 F64 I32 2^24 2^8 360.983 us 0.87% 360.792 us 0.82% -0.191 us -0.05% PASS
I64 F64 I32 2^28 2^8 5.480 ms 0.94% 5.480 ms 0.95% -0.278 us -0.01% PASS
I64 C64 I32 2^16 2^1 12.413 us 5.44% 12.369 us 3.43% -0.043 us -0.35% PASS
I64 C64 I32 2^20 2^1 46.976 us 1.98% 47.022 us 1.55% 0.046 us 0.10% PASS
I64 C64 I32 2^24 2^1 603.250 us 0.73% 603.134 us 0.74% -0.116 us -0.02% PASS
I64 C64 I32 2^28 2^1 9.566 ms 1.02% 9.567 ms 1.02% 0.141 us 0.00% PASS
I64 C64 I32 2^16 2^4 12.112 us 5.64% 12.104 us 3.92% -0.008 us -0.06% PASS
I64 C64 I32 2^20 2^4 40.950 us 2.40% 40.843 us 1.64% -0.107 us -0.26% PASS
I64 C64 I32 2^24 2^4 430.001 us 0.68% 429.649 us 0.67% -0.352 us -0.08% PASS
I64 C64 I32 2^28 2^4 7.009 ms 0.50% 7.008 ms 0.50% -0.657 us -0.01% PASS
I64 C64 I32 2^16 2^8 12.230 us 7.47% 12.002 us 4.20% -0.228 us -1.86% PASS
I64 C64 I32 2^20 2^8 40.810 us 2.73% 40.561 us 1.56% -0.249 us -0.61% PASS
I64 C64 I32 2^24 2^8 360.732 us 0.85% 360.442 us 0.81% -0.290 us -0.08% PASS
I64 C64 I32 2^28 2^8 5.477 ms 0.95% 5.476 ms 0.95% -0.507 us -0.01% PASS
I128 I8 I32 2^16 2^1 13.633 us 6.62% 13.392 us 4.29% -0.241 us -1.77% PASS
I128 I8 I32 2^20 2^1 54.564 us 2.09% 54.259 us 1.75% -0.304 us -0.56% PASS
I128 I8 I32 2^24 2^1 705.091 us 0.75% 704.797 us 0.75% -0.294 us -0.04% PASS
I128 I8 I32 2^28 2^1 11.127 ms 0.63% 11.126 ms 0.63% -1.520 us -0.01% PASS
I128 I8 I32 2^16 2^4 13.460 us 6.81% 13.225 us 3.90% -0.235 us -1.75% PASS
I128 I8 I32 2^20 2^4 49.384 us 2.16% 49.174 us 1.44% -0.210 us -0.43% PASS
I128 I8 I32 2^24 2^4 572.106 us 0.50% 571.978 us 0.50% -0.128 us -0.02% PASS
I128 I8 I32 2^28 2^4 8.940 ms 0.50% 8.939 ms 0.50% -0.583 us -0.01% PASS
I128 I8 I32 2^16 2^8 13.394 us 5.63% 13.211 us 3.81% -0.183 us -1.37% PASS
I128 I8 I32 2^20 2^8 47.898 us 1.74% 47.719 us 1.33% -0.180 us -0.37% PASS
I128 I8 I32 2^24 2^8 529.110 us 0.50% 529.063 us 0.49% -0.047 us -0.01% PASS
I128 I8 I32 2^28 2^8 8.230 ms 0.50% 8.230 ms 0.50% -0.131 us -0.00% PASS
I128 I16 I32 2^16 2^1 13.392 us 5.18% 13.405 us 3.73% 0.013 us 0.10% PASS
I128 I16 I32 2^20 2^1 58.631 us 1.96% 58.568 us 1.75% -0.064 us -0.11% PASS
I128 I16 I32 2^24 2^1 774.709 us 0.64% 774.489 us 0.62% -0.219 us -0.03% PASS
I128 I16 I32 2^28 2^1 12.228 ms 0.51% 12.228 ms 0.50% -0.139 us -0.00% PASS
I128 I16 I32 2^16 2^4 13.323 us 5.37% 13.299 us 3.55% -0.024 us -0.18% PASS
I128 I16 I32 2^20 2^4 52.889 us 1.87% 52.817 us 1.52% -0.071 us -0.13% PASS
I128 I16 I32 2^24 2^4 633.794 us 0.50% 633.789 us 0.49% -0.006 us -0.00% PASS
I128 I16 I32 2^28 2^4 9.950 ms 0.50% 9.949 ms 0.50% -0.911 us -0.01% PASS
I128 I16 I32 2^16 2^8 13.261 us 5.28% 13.282 us 3.72% 0.022 us 0.16% PASS
I128 I16 I32 2^20 2^8 51.106 us 1.73% 51.061 us 1.41% -0.045 us -0.09% PASS
I128 I16 I32 2^24 2^8 586.935 us 0.50% 586.822 us 0.50% -0.113 us -0.02% PASS
I128 I16 I32 2^28 2^8 9.177 ms 0.50% 9.177 ms 0.50% 0.722 us 0.01% PASS
I128 I32 I32 2^16 2^1 13.541 us 5.19% 13.587 us 3.67% 0.046 us 0.34% PASS
I128 I32 I32 2^20 2^1 59.674 us 1.81% 59.702 us 1.70% 0.029 us 0.05% PASS
I128 I32 I32 2^24 2^1 792.652 us 0.79% 792.422 us 0.79% -0.230 us -0.03% PASS
I128 I32 I32 2^28 2^1 12.515 ms 0.89% 12.515 ms 0.89% -0.262 us -0.00% PASS
I128 I32 I32 2^16 2^4 13.544 us 4.59% 13.460 us 3.36% -0.083 us -0.61% PASS
I128 I32 I32 2^20 2^4 52.798 us 1.94% 52.713 us 1.46% -0.085 us -0.16% PASS
I128 I32 I32 2^24 2^4 614.256 us 0.52% 614.061 us 0.51% -0.195 us -0.03% PASS
I128 I32 I32 2^28 2^4 9.628 ms 0.50% 9.627 ms 0.50% -0.732 us -0.01% PASS
I128 I32 I32 2^16 2^8 13.566 us 5.57% 13.472 us 3.30% -0.094 us -0.70% PASS
I128 I32 I32 2^20 2^8 50.969 us 2.10% 50.848 us 1.33% -0.121 us -0.24% PASS
I128 I32 I32 2^24 2^8 551.864 us 0.50% 551.633 us 0.50% -0.230 us -0.04% PASS
I128 I32 I32 2^28 2^8 8.595 ms 0.50% 8.595 ms 0.50% 0.108 us 0.00% PASS
I128 I64 I32 2^16 2^1 14.385 us 6.99% 14.198 us 4.25% -0.188 us -1.30% PASS
I128 I64 I32 2^20 2^1 66.787 us 2.01% 66.656 us 1.39% -0.131 us -0.20% PASS
I128 I64 I32 2^24 2^1 912.408 us 0.69% 912.340 us 0.67% -0.068 us -0.01% PASS
I128 I64 I32 2^28 2^1 14.465 ms 0.97% 14.464 ms 0.97% -0.966 us -0.01% PASS
I128 I64 I32 2^16 2^4 14.257 us 7.55% 14.116 us 3.92% -0.142 us -0.99% PASS
I128 I64 I32 2^20 2^4 57.418 us 2.09% 57.320 us 1.33% -0.098 us -0.17% PASS
I128 I64 I32 2^24 2^4 683.147 us 0.55% 683.120 us 0.54% -0.028 us -0.00% PASS
I128 I64 I32 2^28 2^4 10.705 ms 0.50% 10.705 ms 0.50% 0.507 us 0.00% PASS
I128 I64 I32 2^16 2^8 14.201 us 7.76% 14.177 us 3.70% -0.024 us -0.17% PASS
I128 I64 I32 2^20 2^8 54.708 us 1.96% 54.724 us 1.26% 0.016 us 0.03% PASS
I128 I64 I32 2^24 2^8 592.811 us 0.52% 592.880 us 0.50% 0.069 us 0.01% PASS
I128 I64 I32 2^28 2^8 9.244 ms 0.59% 9.244 ms 0.59% -0.332 us -0.00% PASS
I128 I128 I32 2^16 2^1 15.448 us 7.01% 15.415 us 4.09% -0.033 us -0.21% PASS
I128 I128 I32 2^20 2^1 102.569 us 1.15% 102.661 us 0.89% 0.091 us 0.09% PASS
I128 I128 I32 2^24 2^1 1.520 ms 0.59% 1.519 ms 0.58% -0.309 us -0.02% PASS
I128 I128 I32 2^28 2^1 24.245 ms 1.32% 24.245 ms 1.31% -0.073 us -0.00% PASS
I128 I128 I32 2^16 2^4 15.300 us 7.47% 15.103 us 4.15% -0.197 us -1.29% PASS
I128 I128 I32 2^20 2^4 78.355 us 1.58% 78.204 us 1.22% -0.150 us -0.19% PASS
I128 I128 I32 2^24 2^4 1.066 ms 0.43% 1.066 ms 0.45% -0.059 us -0.01% PASS
I128 I128 I32 2^28 2^4 16.865 ms 0.50% 16.863 ms 0.50% -2.140 us -0.01% PASS
I128 I128 I32 2^16 2^8 15.221 us 6.23% 15.090 us 4.05% -0.130 us -0.86% PASS
I128 I128 I32 2^20 2^8 75.527 us 1.26% 75.475 us 1.13% -0.052 us -0.07% PASS
I128 I128 I32 2^24 2^8 996.000 us 0.50% 995.998 us 0.50% -0.002 us -0.00% PASS
I128 I128 I32 2^28 2^8 15.718 ms 0.50% 15.715 ms 0.50% -2.617 us -0.02% PASS
I128 F32 I32 2^16 2^1 13.666 us 7.69% 13.687 us 3.96% 0.021 us 0.16% PASS
I128 F32 I32 2^20 2^1 59.721 us 2.20% 59.737 us 1.71% 0.016 us 0.03% PASS
I128 F32 I32 2^24 2^1 792.703 us 0.81% 792.627 us 0.78% -0.076 us -0.01% PASS
I128 F32 I32 2^28 2^1 12.515 ms 0.89% 12.515 ms 0.89% 0.266 us 0.00% PASS
I128 F32 I32 2^16 2^4 13.544 us 6.39% 13.579 us 3.53% 0.035 us 0.26% PASS
I128 F32 I32 2^20 2^4 52.762 us 2.12% 52.804 us 1.47% 0.042 us 0.08% PASS
I128 F32 I32 2^24 2^4 614.157 us 0.52% 614.215 us 0.51% 0.058 us 0.01% PASS
I128 F32 I32 2^28 2^4 9.628 ms 0.50% 9.628 ms 0.50% -0.259 us -0.00% PASS
I128 F32 I32 2^16 2^8 13.535 us 7.43% 13.624 us 7.63% 0.090 us 0.66% PASS
I128 F32 I32 2^20 2^8 50.886 us 1.98% 51.003 us 2.15% 0.117 us 0.23% PASS
I128 F32 I32 2^24 2^8 551.769 us 0.50% 551.931 us 0.50% 0.162 us 0.03% PASS
I128 F32 I32 2^28 2^8 8.594 ms 0.50% 8.595 ms 0.50% 0.344 us 0.00% PASS
I128 F64 I32 2^16 2^1 14.322 us 7.77% 14.373 us 7.53% 0.051 us 0.36% PASS
I128 F64 I32 2^20 2^1 66.735 us 1.94% 66.822 us 2.05% 0.087 us 0.13% PASS
I128 F64 I32 2^24 2^1 912.663 us 0.68% 912.298 us 0.68% -0.364 us -0.04% PASS
I128 F64 I32 2^28 2^1 14.466 ms 0.98% 14.465 ms 0.97% -0.915 us -0.01% PASS
I128 F64 I32 2^16 2^4 14.264 us 5.70% 14.214 us 7.74% -0.050 us -0.35% PASS
I128 F64 I32 2^20 2^4 57.487 us 1.73% 57.401 us 1.94% -0.086 us -0.15% PASS
I128 F64 I32 2^24 2^4 683.412 us 0.56% 683.067 us 0.53% -0.345 us -0.05% PASS
I128 F64 I32 2^28 2^4 10.705 ms 0.50% 10.706 ms 0.50% 0.359 us 0.00% PASS
I128 F64 I32 2^16 2^8 14.288 us 7.56% 14.186 us 7.33% -0.102 us -0.71% PASS
I128 F64 I32 2^20 2^8 54.785 us 1.90% 54.687 us 1.94% -0.098 us -0.18% PASS
I128 F64 I32 2^24 2^8 592.863 us 0.50% 592.805 us 0.50% -0.058 us -0.01% PASS
I128 F64 I32 2^28 2^8 9.244 ms 0.59% 9.244 ms 0.59% -0.206 us -0.00% PASS
I128 C64 I32 2^16 2^1 14.256 us 6.11% 14.238 us 7.63% -0.018 us -0.13% PASS
I128 C64 I32 2^20 2^1 66.694 us 1.98% 66.645 us 2.12% -0.049 us -0.07% PASS
I128 C64 I32 2^24 2^1 912.091 us 0.69% 911.854 us 0.69% -0.238 us -0.03% PASS
I128 C64 I32 2^28 2^1 14.460 ms 0.97% 14.460 ms 0.97% -0.071 us -0.00% PASS
I128 C64 I32 2^16 2^4 14.180 us 7.23% 14.139 us 7.59% -0.041 us -0.29% PASS
I128 C64 I32 2^20 2^4 57.231 us 2.06% 57.212 us 1.96% -0.019 us -0.03% PASS
I128 C64 I32 2^24 2^4 681.216 us 0.56% 681.339 us 0.55% 0.123 us 0.02% PASS
I128 C64 I32 2^28 2^4 10.672 ms 0.50% 10.672 ms 0.50% 0.823 us 0.01% PASS
I128 C64 I32 2^16 2^8 14.098 us 7.92% 14.205 us 7.31% 0.107 us 0.76% PASS
I128 C64 I32 2^20 2^8 54.396 us 1.97% 54.518 us 2.08% 0.122 us 0.22% PASS
I128 C64 I32 2^24 2^8 589.224 us 0.51% 589.914 us 0.50% 0.689 us 0.12% PASS
I128 C64 I32 2^28 2^8 9.189 ms 0.60% 9.189 ms 0.60% -0.023 us -0.00% PASS

cub/test/c2h/custom_type.cuh Outdated Show resolved Hide resolved
@elstehle elstehle merged commit 6ba3291 into NVIDIA:main Dec 15, 2023
537 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[BUG]: Unique by key doesn't use allocated vsmem
2 participants