New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the uniform distribution by default #329
Comments
Hi, The behavior of It is also easy to visualize different distributions using the histograms API. I used the following Lua script: function thread_init()
h = sysbench.histogram.new(1000, 1, 10)
end
function event()
h:update(sysbench.rand.default(1, 10))
end
function thread_done()
h:print()
end These are the results that I got: $ sysbench /tmp/random.lua --time=1 --rand-type=zipfian --verbosity=0 run
value ------------- distribution ------------- count
1.000 |**************************************** 690507
2.001 |*********************** 396307
3.002 |***************** 286235
3.996 |************* 228276
4.997 |*********** 190612
5.995 |********** 164206
6.996 |******** 145487
7.997 |******** 130939
8.994 |******* 118629
10.000 |****** 109564
$ sysbench /tmp/random.lua --time=1 --rand-type=pareto --verbosity=0 run
value ------------- distribution ------------- count
1.000 |**************************************** 2027752
2.001 |**** 204580
3.002 |*** 128894
3.996 |** 95608
4.997 |** 76857
5.995 |* 65191
6.996 |* 55967
7.997 |* 49721
8.994 |* 44620
10.000 |* 40663
$ sysbench /tmp/random.lua --time=1 --rand-type=special --verbosity=0 run
value ------------- distribution ------------- count
1.000 | 1
2.001 | 186
3.002 | 16526
3.996 |**** 173084
4.997 |*************************** 1106586
5.995 |**************************************** 1658067
6.996 | 16528
7.997 | 165
$ sysbench /tmp/random.lua --time=1 --rand-type=zipfian --rand-zipfian-exp=0 --verbosity=0 run
value ------------- distribution ------------- count
1.000 |**************************************** 239561
2.001 |**************************************** 239582
3.002 |**************************************** 239817
3.996 |**************************************** 239263
4.997 |**************************************** 239868
5.995 |**************************************** 239665
6.996 |**************************************** 239625
7.997 |**************************************** 239775
8.994 |**************************************** 239745
10.000 |**************************************** 239116 With all that in mind, I'm a little confused: what exactly is being requested in this issue? |
Hi, Thanks for the quick answer. |
@theTibi well, saying that a non-uniform random number is not "truly random" would be wrong and misleading in my opinion. The "probability distribution" term is scientifically correct and that's precisely what I use in sysbench docs. Which distribution people actually expect by default is another question. I asked that question explicitly in the previously mentioned blog post and got only one response, saying that uniform is preferable. I'm fine with leaving this issue as a feature request to make uniform distribution the new default in the next major release. But I'm going to change the title to make the request more explicit and less confusing in the changelog. |
A simple fix to the default will suffice to switch from SPECIAL to UNIFORM. And provide by default a more uniform random distribution index 0539148..a9a43f4 100644
--- a/src/sb_rand.c
+++ b/src/sb_rand.c
@@ -67,7 +67,7 @@ static sb_arg_t rand_args[] =
{
SB_OPT("rand-type",
"random numbers distribution {uniform, gaussian, special, pareto, "
- "zipfian} to use by default", "special", STRING),
+ "zipfian} to use by default", "uniform", STRING),
SB_OPT("rand-seed",
"seed for random number generator. When 0, the current time is "
"used as an RNG seed.", "0", INT), |
Hi,
I was running a series of tests and I have noticed the
get_id()
function does not really random:It should be generating numbers between 1 and the table size, in my test I was using 1000 as table size, so it should get random numbers between 1 and 1000.
To make it simple I was only using one function called
execute_index_updates
and I just printed the the ids:print (id)
I logged the output in a file:
13106+12901+12879+12726+12636+12632+12604=89484
There is
142777
line in the file and only 8 numbers responsible89484
of that which is more than 60% off all the lines. So basically when I am running MySQL benchmarks sysbench creates hotspots in the workload.By digging the code a bit:
I have retested by using the
--rand-type=uniform
I was able to generate real random numbers:
I also noticed sometimes the get_id() function does not create any numbers and sometimes it creates numbers bigger than 1000 which is very wired.
In the log file I could see lines like this:
So it look like there is something wrong going under the hood.
If this is a feature to be able to test hotspots in that case this should be clearly documented but I would recommend to change
get_id
fromdefault
touniform
because I think most of the ppl does not realisedefault
will generate hotspots in their tests and this could make many tests give misleading results.The text was updated successfully, but these errors were encountered: