Skip to content

Conversation

@sleeepyjack
Copy link
Collaborator

Fixes #532

@sleeepyjack sleeepyjack added type: bug Something isn't working P0: Must have Critical feature or bug fix Needs Review Awaiting reviews before merging labels Oct 30, 2024
@sleeepyjack sleeepyjack added P1: Should have Necessary but not critical and removed P0: Must have Critical feature or bug fix labels Oct 30, 2024
@sleeepyjack
Copy link
Collaborator Author

sleeepyjack commented Oct 30, 2024

With this PR, benchmarks with uniform distribution and a multiplicity of 1 are slightly faster:

['dev.json', 'pr.json']
# static_set_insert_uniform_multiplicity

## [0] NVIDIA L40S

|  Key  |  Distribution  |  NumInputs  |  Occupancy  |  Multiplicity  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------|----------------|-------------|-------------|----------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  I32  |    UNIFORM     |  100000000  |     0.5     |       1        |  44.668 ms |       0.09% |  38.967 ms |       0.18% | -5701.014 us | -12.76% |   FAIL   |
|  I32  |    UNIFORM     |  100000000  |     0.5     |       2        |  32.507 ms |       0.14% |  32.398 ms |       0.07% |  -109.818 us |  -0.34% |   FAIL   |
|  I32  |    UNIFORM     |  100000000  |     0.5     |       4        |  24.575 ms |       0.23% |  24.149 ms |       0.30% |  -426.423 us |  -1.74% |   FAIL   |
|  I32  |    UNIFORM     |  100000000  |     0.5     |       8        |  20.208 ms |       0.37% |  19.801 ms |       0.12% |  -407.334 us |  -2.02% |   FAIL   |
|  I32  |    UNIFORM     |  100000000  |     0.5     |       16       |  17.302 ms |       0.14% |  17.279 ms |       0.12% |   -22.946 us |  -0.13% |   FAIL   |
|  I64  |    UNIFORM     |  100000000  |     0.5     |       1        |  47.367 ms |       0.09% |  41.614 ms |       0.16% | -5752.903 us | -12.15% |   FAIL   |
|  I64  |    UNIFORM     |  100000000  |     0.5     |       2        |  34.789 ms |       0.21% |  34.482 ms |       0.24% |  -307.202 us |  -0.88% |   FAIL   |
|  I64  |    UNIFORM     |  100000000  |     0.5     |       4        |  26.511 ms |       0.42% |  25.955 ms |       0.37% |  -555.892 us |  -2.10% |   FAIL   |
|  I64  |    UNIFORM     |  100000000  |     0.5     |       8        |  21.807 ms |       0.47% |  21.381 ms |       0.54% |  -426.110 us |  -1.95% |   FAIL   |
|  I64  |    UNIFORM     |  100000000  |     0.5     |       16       |  18.584 ms |       1.05% |  18.637 ms |       1.17% |    52.149 us |   0.28% |   PASS   |

@sleeepyjack
Copy link
Collaborator Author

Can confirm that this PR actually fixes the issues described in #532:

N=10
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 6 0 4 8 3 7 1 5 9 3 ]

N=10
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 1 0 1 0 1 1 0 1 0 1 ]

N=10
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 0 0 0 0 0 0 0 0 0 0 ]

N=100
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 14 56 97 39 80 22 63 5 46 88 ]

N=100
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 10 18 7 15 3 12 0 8 16 5 ]

N=100
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 6 0 4 8 2 7 1 5 9 3 ]

N=1000
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 444 859 274 689 104 519 934 349 764 179 ]

N=1000
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 131 14 97 180 64 147 30 113 196 79 ]

N=1000
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 70 11 53 95 36 78 19 61 2 44 ]

N=10000
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 5987 138 4289 8440 2591 6742 893 5044 9194 3345 ]

N=10000
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 1583 413 1243 74 904 1734 564 1394 224 1055 ]

N=10000
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 449 864 279 694 109 524 940 355 770 185 ]

N=100000
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 68463 9972 51480 92989 34498 76006 17515 59024 532 42041 ]

N=100000
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 6140 14442 2744 11045 19347 7649 15950 4252 12554 856 ]

N=100000
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 7722 1873 6023 174 4325 8476 2627 6778 929 5079 ]

N=1000000
  M_EXPECTED=1
  M_MEASURED=1
  CONSECUTIVE_SAMPLES=[ 86976 502062 917148 332235 747321 162408 577494 992581 407667 822753 ]

N=1000000
  M_EXPECTED=5
  M_MEASURED=5
  CONSECUTIVE_SAMPLES=[ 122884 5902 88919 171936 54954 137971 20988 104006 187023 70040 ]

N=1000000
  M_EXPECTED=10
  M_MEASURED=10
  CONSECUTIVE_SAMPLES=[ 10231 51740 93248 34757 76265 17774 59283 791 42300 83809 ]

auto num_unique_keys = cuda::std::max<size_t>(
1ull,
static_cast<size_t>(
cuda::std::ceil(static_cast<double>(num_) / static_cast<double>(dist_.value))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably remove

__host__ __device__ constexpr T int_div_ceil(T dividend, U divisor) noexcept

and use cuda::std::ceil instead

@sleeepyjack sleeepyjack merged commit 69817e2 into NVIDIA:dev Oct 31, 2024
@sleeepyjack sleeepyjack deleted the fix/uniform-keygen branch October 31, 2024 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs Review Awaiting reviews before merging P1: Should have Necessary but not critical type: bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Random key generator does not produce keys according to input parameters

2 participants