You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A mail I originally wrote to Maciej Bendkowski, who asked about a note I wrote on this on my personal homepage. I wanted to take some time to tidy and expand more but now I find it okay to publish here at least to whoever is interested.
I believe that Boltzmann samplers make too many rejections, so that even though they are linear in complexity (FEAT isn't quite linear, I think), the constant factor of the complexity is too large.
The cost of computing the oracle shouldn't factor into it because it is done only once (though if it does, it would only make Boltzmann generators look worse). For the record, I have only implemented a
naive Newton's algorithm.
The interesting entries are "handwritten1" (gen1), a handwritten Boltzmann sampler (singular ceiled rejection sampler), and "feat" (genFeat). The "handwritten1" generator exploits the fact that binary
trees have a quite convenient Boltzmann sampler, needing a single random bit for each constructor; handwriting it gets rid of any overhead that a generic implementation may incur. For completeness, generators obtained by generic-random are also included.
I run this for n = powers of 4, up to 4096.
Even though I am in a way cheating, FEAT remains 20 to 30 times faster at sizes < 1000. "handwritten1" starts to get competitive at size 4000.
I must admit I am not 100% confident about those results and my methodology. I am also unsure about what happens with more complex types. But there is also a theoretical argument that Boltzmann
samplers start with a handicap.
Indeed, the original paper actually gives a precise estimation of the cost of rejection: (Theorem 7.3) the cumulated size of the rejected objects by the singular rejection sampler is asymptotically equivalent
to n * (sqrt(1-e) + sqrt(1+e))/e, where the tolerance parameter e means that we want to generate an object whose size is within the interval [(1-e)n, (1+e)n]. For e=1/10 (which is the parameter I used),
we get a total size of around 20 * n, i.e., the average work of Boltzmann generators is multiplied by 20 compared with an ideal generator.
Using a very loose tolerance helps, e=1/2 for instance, and arguably this might be sufficient for random testing purposes. But a generic library still has quite some overhead to go through. That seems to
involve a lot of work to even start to compete with FEAT, which manages e=0 by the way.
The text was updated successfully, but these errors were encountered:
Lysxia
changed the title
generic-random slower than testing-feat
generic-random is slower than testing-feat
Dec 23, 2016
A mail I originally wrote to Maciej Bendkowski, who asked about a note I wrote on this on my personal homepage. I wanted to take some time to tidy and expand more but now I find it okay to publish here at least to whoever is interested.
I believe that Boltzmann samplers make too many rejections, so that even though they are linear in complexity (FEAT isn't quite linear, I think), the constant factor of the complexity is too large.
The cost of computing the oracle shouldn't factor into it because it is done only once (though if it does, it would only make Boltzmann generators look worse). For the record, I have only implemented a
naive Newton's algorithm.
My benchmark is simple: generate binary trees.
https://github.com/Lysxia/generic-random/blob/dev/bench/binaryTree.hs
The interesting entries are
"handwritten1"
(gen1
), a handwritten Boltzmann sampler (singular ceiled rejection sampler), and"feat"
(genFeat). The"handwritten1"
generator exploits the fact that binarytrees have a quite convenient Boltzmann sampler, needing a single random bit for each constructor; handwriting it gets rid of any overhead that a generic implementation may incur. For completeness, generators obtained by
generic-random
are also included.I run this for
n
= powers of 4, up to 4096.Even though I am in a way cheating, FEAT remains 20 to 30 times faster at sizes < 1000.
"handwritten1"
starts to get competitive at size 4000.I must admit I am not 100% confident about those results and my methodology. I am also unsure about what happens with more complex types. But there is also a theoretical argument that Boltzmann
samplers start with a handicap.
Indeed, the original paper actually gives a precise estimation of the cost of rejection: (Theorem 7.3) the cumulated size of the rejected objects by the singular rejection sampler is asymptotically equivalent
to
n * (sqrt(1-e) + sqrt(1+e))/e
, where the tolerance parameter e means that we want to generate an object whose size is within the interval[(1-e)n, (1+e)n]
. Fore=1/10
(which is the parameter I used),we get a total size of around
20 * n
, i.e., the average work of Boltzmann generators is multiplied by 20 compared with an ideal generator.Using a very loose tolerance helps,
e=1/2
for instance, and arguably this might be sufficient for random testing purposes. But a generic library still has quite some overhead to go through. That seems toinvolve a lot of work to even start to compete with FEAT, which manages
e=0
by the way.The text was updated successfully, but these errors were encountered: