-
Notifications
You must be signed in to change notification settings - Fork 575
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Additional documentation of internals + some extras
- Loading branch information
Showing
1 changed file
with
282 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,282 @@ | ||
|
||
|
||
<!DOCTYPE html> | ||
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]--> | ||
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]--> | ||
<head> | ||
<meta charset="utf-8"> | ||
<meta name="viewport" content="width=device-width, initial-scale=1.0"> | ||
|
||
<title>Hypothesis internals — Hypothesis 0.7 documentation</title> | ||
|
||
|
||
|
||
|
||
|
||
|
||
<link href='https://fonts.googleapis.com/css?family=Lato:400,700,400italic,700italic|Roboto+Slab:400,700|Inconsolata:400,700&subset=latin,cyrillic' rel='stylesheet' type='text/css'> | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" /> | ||
|
||
|
||
|
||
|
||
|
||
<link rel="top" title="Hypothesis 0.7 documentation" href="index.html"/> | ||
|
||
|
||
<script src="https://cdnjs.cloudflare.com/ajax/libs/modernizr/2.6.2/modernizr.min.js"></script> | ||
|
||
</head> | ||
|
||
<body class="wy-body-for-nav" role="document"> | ||
|
||
<div class="wy-grid-for-nav"> | ||
|
||
|
||
<nav data-toggle="wy-nav-shift" class="wy-nav-side"> | ||
<div class="wy-side-nav-search"> | ||
|
||
<a href="index.html" class="fa fa-home"> Hypothesis</a> | ||
|
||
|
||
<div role="search"> | ||
<form id="rtd-search-form" class="wy-form" action="search.html" method="get"> | ||
<input type="text" name="q" placeholder="Search docs" /> | ||
<input type="hidden" name="check_keywords" value="yes" /> | ||
<input type="hidden" name="area" value="default" /> | ||
</form> | ||
</div> | ||
</div> | ||
|
||
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation"> | ||
|
||
|
||
|
||
<!-- Local TOC --> | ||
<div class="local-toc"><ul> | ||
<li><a class="reference internal" href="#">Hypothesis internals</a><ul> | ||
<li><a class="reference internal" href="#templating">Templating</a></li> | ||
<li><a class="reference internal" href="#parametrization">Parametrization</a></li> | ||
<li><a class="reference internal" href="#the-database">The database</a></li> | ||
<li><a class="reference internal" href="#example-tracking">Example tracking</a></li> | ||
</ul> | ||
</li> | ||
</ul> | ||
</div> | ||
|
||
|
||
</div> | ||
| ||
</nav> | ||
|
||
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"> | ||
|
||
|
||
<nav class="wy-nav-top" role="navigation" aria-label="top navigation"> | ||
<i data-toggle="wy-nav-top" class="fa fa-bars"></i> | ||
<a href="index.html">Hypothesis</a> | ||
</nav> | ||
|
||
|
||
|
||
<div class="wy-nav-content"> | ||
<div class="rst-content"> | ||
<div role="navigation" aria-label="breadcrumbs navigation"> | ||
<ul class="wy-breadcrumbs"> | ||
<li><a href="index.html">Docs</a> »</li> | ||
|
||
<li>Hypothesis internals</li> | ||
<li class="wy-breadcrumbs-aside"> | ||
|
||
<a href="_sources/internals.txt" rel="nofollow"> View page source</a> | ||
|
||
</li> | ||
</ul> | ||
<hr/> | ||
</div> | ||
<div role="main" class="document"> | ||
|
||
<div class="section" id="hypothesis-internals"> | ||
<h1>Hypothesis internals<a class="headerlink" href="#hypothesis-internals" title="Permalink to this headline">¶</a></h1> | ||
<p>This document is a guide to Hypothesis internals, mostly with a goal to porting | ||
to other implementations of Quickcheck that want to benefit from some of the | ||
more unusual/interesting ideas in it.</p> | ||
<p>Nothing here is stable public API and might all be prone to change between | ||
minor releases. The purpose of this document is to share the ideas, not to | ||
specify the behaviour.</p> | ||
<p>This is sorted roughly in order of most interesting to least interesting.</p> | ||
<div class="section" id="templating"> | ||
<h2>Templating<a class="headerlink" href="#templating" title="Permalink to this headline">¶</a></h2> | ||
<p>Templating is the single most important innovation in Hypothesis. If you’re | ||
going to take any ideas out of Hypothesis you should take this one.</p> | ||
<p>The idea is as follows: Rather than generating data of the required type | ||
directly, value generation is split into two parts. We first generate a <em>template</em> | ||
and we then have a function which can reify that template, turning it into a | ||
value of the desired type. Importantly, simplification happens on templates and | ||
not on reified data.</p> | ||
<p>This has several major advantages:</p> | ||
<p>1. The templates can be of a much more restricted type than the desired output | ||
- you can require them to be immutable, serializable, hashable, etc without in | ||
any way restricting the range of data that you can generate. | ||
2. Seamless support for mutable data: Because the mutable object you produce is | ||
the result of reifying the template, any mutation done by the function you call | ||
does not affect the underlying template. | ||
3. Generation strategies can be made functorial (and indeed applicative. You can | ||
sortof make them monadic but the resulting templates are a bit fiddly and can’t | ||
really be of the desired restricted type, so it’s probably not really worth it)</p> | ||
<p>The latter is worth elaborating on: Hypothesis SearchStrategy has a method map | ||
which lets you do e.g. strategy(int).map(lamda x: Decimal(x) / 100). This gives | ||
you a new strategy for decimals, which still supports minimization. The normal | ||
obstacle here is that you can’t minimize the result because you’d need a way to | ||
map back to the original data type, but that isn’t an issue here because you | ||
can just keep using the previous template type, minimize that, and only convert | ||
to the new data type at the point of reification.</p> | ||
</div> | ||
<div class="section" id="parametrization"> | ||
<h2>Parametrization<a class="headerlink" href="#parametrization" title="Permalink to this headline">¶</a></h2> | ||
<p>Template generation is also less direct than you might expect. Each strategy | ||
has two distributions: A parameter distribution, and a conditional template | ||
distribution given a parameter value.</p> | ||
<p>There are two reasons for this: The first is simplier that this gives a | ||
“clumpier” distribution. This is based on a paper called <a class="reference external" href="http://www.cs.utah.edu/~regehr/papers/swarm12.pdf">Swarm Testing</a> | ||
but with some extensions to the idea. The essential concept is that a distribution | ||
which is too flat is likely to spend too much time exploring uninteresting | ||
interactions.</p> | ||
<p>A trivial example of how this sort of thing can be interesting is consider tests | ||
parametrized by lists of integers. How do you find a distribution which can trigger:</p> | ||
<ol class="arabic simple"> | ||
<li>A bug that only occurs when the list has duplicates</li> | ||
<li>A bug that only occurs when the list contains very large integers</li> | ||
<li>A bug that only occurs when the list is long and contains only small integers</li> | ||
</ol> | ||
<p>The answer in Hypothesis is that the parameter for a list of values draws a single | ||
parameter for its element and draws templates conditional on that parameter. | ||
For integers sometimes the parameter produces on average small ints, sometimes | ||
it produces on average big ints.</p> | ||
<p>(It would be possible to have it instead draw a small but sometimes > 1 number | ||
of parameters and draw from a mixture of those. I haven’t investigated this yet)</p> | ||
<p>The second important benefit of the parameter system is that you can use it to | ||
guide the search space. This is useful because it allows you to use otherwise | ||
quite hard to satisfy preconditions in your tests.</p> | ||
<p>The way this works is that we store all the parameters we use, and will tend to | ||
use each parameter multiple times. Parameters which tend to produce “bad” | ||
results (that is, produce a test such that assume() is called with a Falsey | ||
value) will be chosen less often than a parameter which doesn’t. Parameters | ||
which produce templates we’ve already seen are also penalized in order to guide | ||
the search towards novelty.</p> | ||
<p>The way this works in Hypothesis is with an infinitely many armed bandit algorithm | ||
based on Thompson Sampling and some ad hoc hacks. I don’t strongly recommend | ||
following the specific algorithm, though it seems to work well in practice.</p> | ||
</div> | ||
<div class="section" id="the-database"> | ||
<h2>The database<a class="headerlink" href="#the-database" title="Permalink to this headline">¶</a></h2> | ||
<p>There’s not much to say here except “why isn’t everyone doing this?” (though | ||
in fairness this is made much easier by the template system).</p> | ||
<p>When Hypothesis finds a minimal failing example it saves the template for it in | ||
a database (by default a local sqlite database, though it could be anything). | ||
When run in future, Hypothesis first checks if there are any saved examples for | ||
the test and tries those first. If any of them fail the test, it skips straight | ||
to the minimization statge without bothering with data generation. This is | ||
particularly useful for tests with a low probability of failure - if Hypothesis | ||
has a one in 1000 chance of finding an example it will probably take 5 runs of | ||
the test suite before the test fails, but after that it will consistently fail | ||
until you fix the bug.</p> | ||
<p>The key Hypothesis uses for this is the type signature of the test, but that | ||
hasn’t proven terribly useful. You could use the name of the test equally well | ||
without losing much.</p> | ||
<p>I had some experiments with disassembling and reassembling examples for reuse | ||
in other tests, but in the end these didn’t prove very useful and were hard to | ||
support after some other changes to the system, so I took them out.</p> | ||
</div> | ||
<div class="section" id="example-tracking"> | ||
<h2>Example tracking<a class="headerlink" href="#example-tracking" title="Permalink to this headline">¶</a></h2> | ||
<p>The idea of this is simply that we don’t want to call a test function with the | ||
same example twice. I think normal property based testing systems don’t do this | ||
because they just assume that properties are faster to check than it is to test | ||
whether we’ve seen this one before, especially given a low duplication rate.</p> | ||
<p>Because Hypothesis is designed around the assumption that you’re going to use | ||
it on things that look more like unit tests (and also because Python is quite | ||
slow) it’s more important that we don’t duplicate effort, so we track which | ||
templates have previously been run and don’t bother to reify and test them | ||
again if they come up. As mentioned in the previous section we also then | ||
penalize the parameter that produced them.</p> | ||
<p>This is also useful for minimization: Hypothesis doesn’t mind if you have | ||
cycles in your minimize graph (e.g. if x simplifies to y and y simplifies to x) | ||
because it can just use the example tracking system to break loops.</p> | ||
<p>There’s a trick to this: Examples might be quite large and we don’t actually | ||
want to keep them around in memory if we don’t have to. Because of the restricted | ||
templates, we can insist that all examples belong to a set of types that have a | ||
stable serialization format. So rather than storing and testing the whole | ||
examples for equality we simply serialize them and (if the serialized string is | ||
at least 20 bytes) we take the sha1 hash of it. We then just keep these hashes | ||
around and if we’ve seen the hash before we treat the example as seen.</p> | ||
</div> | ||
</div> | ||
|
||
|
||
</div> | ||
<footer> | ||
|
||
|
||
<hr/> | ||
|
||
<div role="contentinfo"> | ||
<p> | ||
© Copyright 2015, David R. MacIver. | ||
</p> | ||
</div> | ||
|
||
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. | ||
|
||
</footer> | ||
</div> | ||
</div> | ||
|
||
</section> | ||
|
||
</div> | ||
|
||
|
||
|
||
|
||
|
||
<script type="text/javascript"> | ||
var DOCUMENTATION_OPTIONS = { | ||
URL_ROOT:'./', | ||
VERSION:'0.7', | ||
COLLAPSE_INDEX:false, | ||
FILE_SUFFIX:'.html', | ||
HAS_SOURCE: true | ||
}; | ||
</script> | ||
<script type="text/javascript" src="_static/jquery.js"></script> | ||
<script type="text/javascript" src="_static/underscore.js"></script> | ||
<script type="text/javascript" src="_static/doctools.js"></script> | ||
|
||
|
||
|
||
|
||
|
||
<script type="text/javascript" src="_static/js/theme.js"></script> | ||
|
||
|
||
|
||
|
||
<script type="text/javascript"> | ||
jQuery(function () { | ||
SphinxRtdTheme.StickyNav.enable(); | ||
}); | ||
</script> | ||
|
||
|
||
</body> | ||
</html> |