New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic-random is slower than testing-feat #6

Closed
Lysxia opened this Issue Dec 23, 2016 · 1 comment

Comments

Projects
None yet
1 participant
@Lysxia
Copy link
Owner

Lysxia commented Dec 23, 2016

A mail I originally wrote to Maciej Bendkowski, who asked about a note I wrote on this on my personal homepage. I wanted to take some time to tidy and expand more but now I find it okay to publish here at least to whoever is interested.


I believe that Boltzmann samplers make too many rejections, so that even though they are linear in complexity (FEAT isn't quite linear, I think), the constant factor of the complexity is too large.

The cost of computing the oracle shouldn't factor into it because it is done only once (though if it does, it would only make Boltzmann generators look worse). For the record, I have only implemented a
naive Newton's algorithm.

My benchmark is simple: generate binary trees.

https://github.com/Lysxia/generic-random/blob/dev/bench/binaryTree.hs

The interesting entries are "handwritten1" (gen1), a handwritten Boltzmann sampler (singular ceiled rejection sampler), and "feat" (genFeat). The "handwritten1" generator exploits the fact that binary
trees have a quite convenient Boltzmann sampler, needing a single random bit for each constructor; handwriting it gets rid of any overhead that a generic implementation may incur. For completeness, generators obtained by generic-random are also included.

I run this for n = powers of 4, up to 4096.

Even though I am in a way cheating, FEAT remains 20 to 30 times faster at sizes < 1000. "handwritten1" starts to get competitive at size 4000.

I must admit I am not 100% confident about those results and my methodology. I am also unsure about what happens with more complex types. But there is also a theoretical argument that Boltzmann
samplers start with a handicap.

Indeed, the original paper actually gives a precise estimation of the cost of rejection: (Theorem 7.3) the cumulated size of the rejected objects by the singular rejection sampler is asymptotically equivalent
to n * (sqrt(1-e) + sqrt(1+e))/e, where the tolerance parameter e means that we want to generate an object whose size is within the interval [(1-e)n, (1+e)n]. For e=1/10 (which is the parameter I used),
we get a total size of around 20 * n, i.e., the average work of Boltzmann generators is multiplied by 20 compared with an ideal generator.

Using a very loose tolerance helps, e=1/2 for instance, and arguably this might be sufficient for random testing purposes. But a generic library still has quite some overhead to go through. That seems to
involve a lot of work to even start to compete with FEAT, which manages e=0 by the way.

@Lysxia Lysxia changed the title `generic-random` slower than `testing-feat` generic-random is slower than testing-feat Dec 23, 2016

@Lysxia

This comment has been minimized.

Copy link
Owner

Lysxia commented Mar 5, 2017

@Lysxia Lysxia closed this Mar 5, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment