bookworm-samples

Extension to Bookworm that splits it up into a bunch of (roughly) equal-sized chunks for testing.

Purpose

This is an easy drop in to a Bookworm to get some sense whether effects are driven by random sample noise or by real factors.

Install it according to the instructions for Bookworm Extensions in the extensions folder; switch into the folder, and run make.

You'll now have two new variables in your metadata; randomsetA and randomsetA. By default, randomA has 5 levels and randomB has 24. This means that you can:

Split any query up by the random variables to see if (for instance) five separate random samples of Canadian writers differ from five separate random samples of American writers*
Pull a random subset of data in really processor intensive queries (like getting wordcounts). You can compare a 20% sample of American writers to a 20% sample of Canadian writers, say.

*Although nb: most error won't come from this sort of random variation.

Why two random variables?

There are two random variables so you can interact with each other. That's why one is five, and the other two is 24 (so the multiple of a bunch of threes and twos). It should be possible to create most reasonable samples you might want.

To get a 10% sample, for example, you could put in "search_limits":{"randomsetA":[1],"randomsetB":{"$lte":12}; all of the first set, and half of the second set.

To get one in twelve, put in "search_limits":{"randomsetA":[1,2]}.

And so forth.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
randomNumbers.py		randomNumbers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bookworm-samples

Purpose

Why two random variables?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bookworm-samples

Purpose

Why two random variables?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages