Skip to content

Commit

Permalink
Additional documentation of internals + some extras
Browse files Browse the repository at this point in the history
  • Loading branch information
DRMacIver committed Mar 20, 2015
1 parent 8ef19a3 commit 70e3fd5
Showing 1 changed file with 282 additions and 0 deletions.
282 changes: 282 additions & 0 deletions docs/_build/html/internals.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,282 @@


<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">

<title>Hypothesis internals &mdash; Hypothesis 0.7 documentation</title>






<link href='https://fonts.googleapis.com/css?family=Lato:400,700,400italic,700italic|Roboto+Slab:400,700|Inconsolata:400,700&subset=latin,cyrillic' rel='stylesheet' type='text/css'>









<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />





<link rel="top" title="Hypothesis 0.7 documentation" href="index.html"/>


<script src="https://cdnjs.cloudflare.com/ajax/libs/modernizr/2.6.2/modernizr.min.js"></script>

</head>

<body class="wy-body-for-nav" role="document">

<div class="wy-grid-for-nav">


<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-nav-search">

<a href="index.html" class="fa fa-home"> Hypothesis</a>


<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>

<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">



<!-- Local TOC -->
<div class="local-toc"><ul>
<li><a class="reference internal" href="#">Hypothesis internals</a><ul>
<li><a class="reference internal" href="#templating">Templating</a></li>
<li><a class="reference internal" href="#parametrization">Parametrization</a></li>
<li><a class="reference internal" href="#the-database">The database</a></li>
<li><a class="reference internal" href="#example-tracking">Example tracking</a></li>
</ul>
</li>
</ul>
</div>


</div>
&nbsp;
</nav>

<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">


<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">Hypothesis</a>
</nav>



<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html">Docs</a> &raquo;</li>

<li>Hypothesis internals</li>
<li class="wy-breadcrumbs-aside">

<a href="_sources/internals.txt" rel="nofollow"> View page source</a>

</li>
</ul>
<hr/>
</div>
<div role="main" class="document">

<div class="section" id="hypothesis-internals">
<h1>Hypothesis internals<a class="headerlink" href="#hypothesis-internals" title="Permalink to this headline"></a></h1>
<p>This document is a guide to Hypothesis internals, mostly with a goal to porting
to other implementations of Quickcheck that want to benefit from some of the
more unusual/interesting ideas in it.</p>
<p>Nothing here is stable public API and might all be prone to change between
minor releases. The purpose of this document is to share the ideas, not to
specify the behaviour.</p>
<p>This is sorted roughly in order of most interesting to least interesting.</p>
<div class="section" id="templating">
<h2>Templating<a class="headerlink" href="#templating" title="Permalink to this headline"></a></h2>
<p>Templating is the single most important innovation in Hypothesis. If you&#8217;re
going to take any ideas out of Hypothesis you should take this one.</p>
<p>The idea is as follows: Rather than generating data of the required type
directly, value generation is split into two parts. We first generate a <em>template</em>
and we then have a function which can reify that template, turning it into a
value of the desired type. Importantly, simplification happens on templates and
not on reified data.</p>
<p>This has several major advantages:</p>
<p>1. The templates can be of a much more restricted type than the desired output
- you can require them to be immutable, serializable, hashable, etc without in
any way restricting the range of data that you can generate.
2. Seamless support for mutable data: Because the mutable object you produce is
the result of reifying the template, any mutation done by the function you call
does not affect the underlying template.
3. Generation strategies can be made functorial (and indeed applicative. You can
sortof make them monadic but the resulting templates are a bit fiddly and can&#8217;t
really be of the desired restricted type, so it&#8217;s probably not really worth it)</p>
<p>The latter is worth elaborating on: Hypothesis SearchStrategy has a method map
which lets you do e.g. strategy(int).map(lamda x: Decimal(x) / 100). This gives
you a new strategy for decimals, which still supports minimization. The normal
obstacle here is that you can&#8217;t minimize the result because you&#8217;d need a way to
map back to the original data type, but that isn&#8217;t an issue here because you
can just keep using the previous template type, minimize that, and only convert
to the new data type at the point of reification.</p>
</div>
<div class="section" id="parametrization">
<h2>Parametrization<a class="headerlink" href="#parametrization" title="Permalink to this headline"></a></h2>
<p>Template generation is also less direct than you might expect. Each strategy
has two distributions: A parameter distribution, and a conditional template
distribution given a parameter value.</p>
<p>There are two reasons for this: The first is simplier that this gives a
&#8220;clumpier&#8221; distribution. This is based on a paper called <a class="reference external" href="http://www.cs.utah.edu/~regehr/papers/swarm12.pdf">Swarm Testing</a>
but with some extensions to the idea. The essential concept is that a distribution
which is too flat is likely to spend too much time exploring uninteresting
interactions.</p>
<p>A trivial example of how this sort of thing can be interesting is consider tests
parametrized by lists of integers. How do you find a distribution which can trigger:</p>
<ol class="arabic simple">
<li>A bug that only occurs when the list has duplicates</li>
<li>A bug that only occurs when the list contains very large integers</li>
<li>A bug that only occurs when the list is long and contains only small integers</li>
</ol>
<p>The answer in Hypothesis is that the parameter for a list of values draws a single
parameter for its element and draws templates conditional on that parameter.
For integers sometimes the parameter produces on average small ints, sometimes
it produces on average big ints.</p>
<p>(It would be possible to have it instead draw a small but sometimes &gt; 1 number
of parameters and draw from a mixture of those. I haven&#8217;t investigated this yet)</p>
<p>The second important benefit of the parameter system is that you can use it to
guide the search space. This is useful because it allows you to use otherwise
quite hard to satisfy preconditions in your tests.</p>
<p>The way this works is that we store all the parameters we use, and will tend to
use each parameter multiple times. Parameters which tend to produce &#8220;bad&#8221;
results (that is, produce a test such that assume() is called with a Falsey
value) will be chosen less often than a parameter which doesn&#8217;t. Parameters
which produce templates we&#8217;ve already seen are also penalized in order to guide
the search towards novelty.</p>
<p>The way this works in Hypothesis is with an infinitely many armed bandit algorithm
based on Thompson Sampling and some ad hoc hacks. I don&#8217;t strongly recommend
following the specific algorithm, though it seems to work well in practice.</p>
</div>
<div class="section" id="the-database">
<h2>The database<a class="headerlink" href="#the-database" title="Permalink to this headline"></a></h2>
<p>There&#8217;s not much to say here except &#8220;why isn&#8217;t everyone doing this?&#8221; (though
in fairness this is made much easier by the template system).</p>
<p>When Hypothesis finds a minimal failing example it saves the template for it in
a database (by default a local sqlite database, though it could be anything).
When run in future, Hypothesis first checks if there are any saved examples for
the test and tries those first. If any of them fail the test, it skips straight
to the minimization statge without bothering with data generation. This is
particularly useful for tests with a low probability of failure - if Hypothesis
has a one in 1000 chance of finding an example it will probably take 5 runs of
the test suite before the test fails, but after that it will consistently fail
until you fix the bug.</p>
<p>The key Hypothesis uses for this is the type signature of the test, but that
hasn&#8217;t proven terribly useful. You could use the name of the test equally well
without losing much.</p>
<p>I had some experiments with disassembling and reassembling examples for reuse
in other tests, but in the end these didn&#8217;t prove very useful and were hard to
support after some other changes to the system, so I took them out.</p>
</div>
<div class="section" id="example-tracking">
<h2>Example tracking<a class="headerlink" href="#example-tracking" title="Permalink to this headline"></a></h2>
<p>The idea of this is simply that we don&#8217;t want to call a test function with the
same example twice. I think normal property based testing systems don&#8217;t do this
because they just assume that properties are faster to check than it is to test
whether we&#8217;ve seen this one before, especially given a low duplication rate.</p>
<p>Because Hypothesis is designed around the assumption that you&#8217;re going to use
it on things that look more like unit tests (and also because Python is quite
slow) it&#8217;s more important that we don&#8217;t duplicate effort, so we track which
templates have previously been run and don&#8217;t bother to reify and test them
again if they come up. As mentioned in the previous section we also then
penalize the parameter that produced them.</p>
<p>This is also useful for minimization: Hypothesis doesn&#8217;t mind if you have
cycles in your minimize graph (e.g. if x simplifies to y and y simplifies to x)
because it can just use the example tracking system to break loops.</p>
<p>There&#8217;s a trick to this: Examples might be quite large and we don&#8217;t actually
want to keep them around in memory if we don&#8217;t have to. Because of the restricted
templates, we can insist that all examples belong to a set of types that have a
stable serialization format. So rather than storing and testing the whole
examples for equality we simply serialize them and (if the serialized string is
at least 20 bytes) we take the sha1 hash of it. We then just keep these hashes
around and if we&#8217;ve seen the hash before we treat the example as seen.</p>
</div>
</div>


</div>
<footer>


<hr/>

<div role="contentinfo">
<p>
&copy; Copyright 2015, David R. MacIver.
</p>
</div>

Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.

</footer>
</div>
</div>

</section>

</div>





<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT:'./',
VERSION:'0.7',
COLLAPSE_INDEX:false,
FILE_SUFFIX:'.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>





<script type="text/javascript" src="_static/js/theme.js"></script>




<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.StickyNav.enable();
});
</script>


</body>
</html>

0 comments on commit 70e3fd5

Please sign in to comment.