Skip to content

bmschmidt/bookworm-samples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bookworm-samples

Extension to Bookworm that splits it up into a bunch of (roughly) equal-sized chunks for testing.

Purpose

This is an easy drop in to a Bookworm to get some sense whether effects are driven by random sample noise or by real factors.

Install it according to the instructions for Bookworm Extensions in the extensions folder; switch into the folder, and run make.

You'll now have two new variables in your metadata; randomsetA and randomsetA. By default, randomA has 5 levels and randomB has 24. This means that you can:

  1. Split any query up by the random variables to see if (for instance) five separate random samples of Canadian writers differ from five separate random samples of American writers*
  2. Pull a random subset of data in really processor intensive queries (like getting wordcounts). You can compare a 20% sample of American writers to a 20% sample of Canadian writers, say.

*Although nb: most error won't come from this sort of random variation.

Why two random variables?

There are two random variables so you can interact with each other. That's why one is five, and the other two is 24 (so the multiple of a bunch of threes and twos). It should be possible to create most reasonable samples you might want.

To get a 10% sample, for example, you could put in "search_limits":{"randomsetA":[1],"randomsetB":{"$lte":12}; all of the first set, and half of the second set.

To get one in twelve, put in "search_limits":{"randomsetA":[1,2]}.

And so forth.

About

Extension to Bookworm that splits it up into a bunch of (roughly) equal-sized chunks for testing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages