weighted sampling #18

gzt · 2019-03-12T01:05:45Z

I ran into this thread (http://r.789695.n4.nabble.com/Bias-in-R-s-random-integers-td4752563.html) and saw your discussion of the sample feature - I have a suggestion if you're still thinking of implementing it: you could do stochastic acceptance in order to perform weighted sampling in this framework. See here for an example: https://jbn.github.io/fast_proportional_selection/ It has the overhead of having to find the maximum weight and burns at least one more call to runif(), but you might provide an optional argument of max weight to spare the need for searching - this can also be done so that the weights don't have to add up to 1 (ie sparing the user the need to add up and divide by the sum). When I have time I might be able to submit a pull request if you are interested and if I don't forget, as I have a use for something similar

rstub · 2019-03-12T11:10:48Z

Thanks for the suggestion! I am still interested in adding sample functionality to dqrng. However, I have no immediate need for it, so I cannot give a time frame for it. A PR would be welcome.

Some general comments:

Additional calls to the RNG shouldn't be to bad, since the RNGs used in dqrng are pretty fast. In addition, there already is code for generating an int-float-pair with a single RNG call, which could be made more general. Of course, that works only up to 11 bits of integer precision.
The python algorithm has the same bias that started that thread on R-devel: i = int(n * random.random()). One should use random.randomint() instead. BTW, the bias in R has been fixed (as good as possible) for the upcoming R 3.6.0.
I think it would be best to have at least BisectionSearch and StochasticAcceptance available at the C++ level. I am not sure yet how to design the R interface. Maybe give the user the ability to select between the different algorithms.

rstub · 2019-03-12T14:57:27Z

BTW, in the original paper the timings look much better for the StochasticAcceptance algorithm, being faster than BisectionSearch even without the (IMO) artificial change in weights. I guess that for the python examples the bisection algorithm is implemented in compiled code, while the other two are implemented in python.

gzt · 2019-03-12T17:57:24Z

All right, if I get around to doing this (not likely to be soon), I'll see about putting together a PR - I have something I'm working on that could use something similar (in C) and it might be easy to get it into a form compatible with this project.

I'll drop a note to the author of that post - it does the right thing in several other places.

rstub · 2022-12-29T22:47:52Z

I am finally working on this: https://stubner.me/2022/12/roulette-wheel-selection-for-dqrng/, looks promising.

rstub · 2023-08-30T10:54:59Z

As noted in #52, there are some more things I need to consider w.r.t. to weighted sampling. I will have to back out that code for now in order to release the other changes that have accumulated.

rstub added the enhancement New feature or request label Mar 13, 2019

rstub mentioned this issue Mar 14, 2019

unweighted sampling #21

Closed

rstub mentioned this issue Dec 29, 2022

'prob' argument for dqrng::dqsample #45

Open

rstub mentioned this issue Dec 30, 2022

Feature/weighted sampling #47

Merged

rstub closed this as completed in #47 Jul 30, 2023

rstub mentioned this issue Aug 4, 2023

dqsample.int with arg prob sinks performance #52

Open

rstub reopened this Aug 30, 2023

rstub linked a pull request Oct 7, 2023 that will close this issue

Implement weighted sampling #72

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weighted sampling #18

weighted sampling #18

gzt commented Mar 12, 2019

rstub commented Mar 12, 2019

rstub commented Mar 12, 2019

gzt commented Mar 12, 2019

rstub commented Dec 29, 2022

rstub commented Aug 30, 2023

weighted sampling #18

weighted sampling #18

Comments

gzt commented Mar 12, 2019

rstub commented Mar 12, 2019

rstub commented Mar 12, 2019

gzt commented Mar 12, 2019

rstub commented Dec 29, 2022

rstub commented Aug 30, 2023