Additional documentation of internals + some extras

HypothesisWorks · Mar 20, 2015 · 70e3fd5 · 70e3fd5
1 parent 8ef19a3
commit 70e3fd5
Showing 1 changed file with 282 additions and 0 deletions.
diff --git a/docs/_build/html/internals.html b/docs/_build/html/internals.html
@@ -0,0 +1,282 @@
+
+
+<!DOCTYPE html>
+<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+  <title>Hypothesis internals &mdash; Hypothesis 0.7 documentation</title>
+
+
+
+
+
+
+  <link href='https://fonts.googleapis.com/css?family=Lato:400,700,400italic,700italic|Roboto+Slab:400,700|Inconsolata:400,700&subset=latin,cyrillic' rel='stylesheet' type='text/css'>
+
+
+
+
+
+
+
+
+
+    <link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
+
+
+
+
+
+    <link rel="top" title="Hypothesis 0.7 documentation" href="index.html"/> 
+
+
+  <script src="https://cdnjs.cloudflare.com/ajax/libs/modernizr/2.6.2/modernizr.min.js"></script>
+
+</head>
+
+<body class="wy-body-for-nav" role="document">
+
+  <div class="wy-grid-for-nav">
+
+
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
+      <div class="wy-side-nav-search">
+
+          <a href="index.html" class="fa fa-home"> Hypothesis</a>
+
+
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+      </div>
+
+      <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+
+
+
+              <!-- Local TOC -->
+              <div class="local-toc"><ul>
+<li><a class="reference internal" href="#">Hypothesis internals</a><ul>
+<li><a class="reference internal" href="#templating">Templating</a></li>
+<li><a class="reference internal" href="#parametrization">Parametrization</a></li>
+<li><a class="reference internal" href="#the-database">The database</a></li>
+<li><a class="reference internal" href="#example-tracking">Example tracking</a></li>
+</ul>
+</li>
+</ul>
+</div>
+
+
+      </div>
+      &nbsp;
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
+
+
+      <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
+        <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
+        <a href="index.html">Hypothesis</a>
+      </nav>
+
+
+
+      <div class="wy-nav-content">
+        <div class="rst-content">
+          <div role="navigation" aria-label="breadcrumbs navigation">
+  <ul class="wy-breadcrumbs">
+    <li><a href="index.html">Docs</a> &raquo;</li>
+
+    <li>Hypothesis internals</li>
+      <li class="wy-breadcrumbs-aside">
+
+          <a href="_sources/internals.txt" rel="nofollow"> View page source</a>
+
+      </li>
+  </ul>
+  <hr/>
+</div>
+          <div role="main" class="document">
+
+  <div class="section" id="hypothesis-internals">
+<h1>Hypothesis internals<a class="headerlink" href="#hypothesis-internals" title="Permalink to this headline">¶</a></h1>
+<p>This document is a guide to Hypothesis internals, mostly with a goal to porting
+to other implementations of Quickcheck that want to benefit from some of the
+more unusual/interesting ideas in it.</p>
+<p>Nothing here is stable public API and might all be prone to change between
+minor releases. The purpose of this document is to share the ideas, not to
+specify the behaviour.</p>
+<p>This is sorted roughly in order of most interesting to least interesting.</p>
+<div class="section" id="templating">
+<h2>Templating<a class="headerlink" href="#templating" title="Permalink to this headline">¶</a></h2>
+<p>Templating is the single most important innovation in Hypothesis. If you&#8217;re
+going to take any ideas out of Hypothesis you should take this one.</p>
+<p>The idea is as follows: Rather than generating data of the required type
+directly, value generation is split into two parts. We first generate a <em>template</em>
+and we then have a function which can reify that template, turning it into a
+value of the desired type. Importantly, simplification happens on templates and
+not on reified data.</p>
+<p>This has several major advantages:</p>
+<p>1. The templates can be of a much more restricted type than the desired output
+- you can require them to be immutable, serializable, hashable, etc without in
+any way restricting the range of data that you can generate.
+2. Seamless support for mutable data: Because the mutable object you produce is
+the result of reifying the template, any mutation done by the function you call
+does not affect the underlying template.
+3. Generation strategies can be made functorial (and indeed applicative. You can
+sortof make them monadic but the resulting templates are a bit fiddly and can&#8217;t
+really be of the desired restricted type, so it&#8217;s probably not really worth it)</p>
+<p>The latter is worth elaborating on: Hypothesis SearchStrategy has a method map
+which lets you do e.g. strategy(int).map(lamda x: Decimal(x) / 100). This gives
+you a new strategy for decimals, which still supports minimization. The normal
+obstacle here is that you can&#8217;t minimize the result because you&#8217;d need a way to
+map back to the original data type, but that isn&#8217;t an issue here because you
+can just keep using the previous template type, minimize that, and only convert
+to the new data type at the point of reification.</p>
+</div>
+<div class="section" id="parametrization">
+<h2>Parametrization<a class="headerlink" href="#parametrization" title="Permalink to this headline">¶</a></h2>
+<p>Template generation is also less direct than you might expect. Each strategy
+has two distributions: A parameter distribution, and a conditional template
+distribution given a parameter value.</p>
+<p>There are two reasons for this: The first is simplier that this gives a
+&#8220;clumpier&#8221; distribution. This is based on a paper called <a class="reference external" href="http://www.cs.utah.edu/~regehr/papers/swarm12.pdf">Swarm Testing</a>
+but with some extensions to the idea. The essential concept is that a distribution
+which is too flat is likely to spend too much time exploring uninteresting
+interactions.</p>
+<p>A trivial example of how this sort of thing can be interesting is consider tests
+parametrized by lists of integers. How do you find a distribution which can trigger:</p>
+<ol class="arabic simple">
+<li>A bug that only occurs when the list has duplicates</li>
+<li>A bug that only occurs when the list contains very large integers</li>
+<li>A bug that only occurs when the list is long and contains only small integers</li>
+</ol>
+<p>The answer in Hypothesis is that the parameter for a list of values draws a single
+parameter for its element and draws templates conditional on that parameter.
+For integers sometimes the parameter produces on average small ints, sometimes
+it produces on average big ints.</p>
+<p>(It would be possible to have it instead draw a small but sometimes &gt; 1 number
+of parameters and draw from a mixture of those. I haven&#8217;t investigated this yet)</p>
+<p>The second important benefit of the parameter system is that you can use it to
+guide the search space. This is useful because it allows you to use otherwise
+quite hard to satisfy preconditions in your tests.</p>
+<p>The way this works is that we store all the parameters we use, and will tend to
+use each parameter multiple times. Parameters which tend to produce &#8220;bad&#8221;
+results (that is, produce a test such that assume() is called with a Falsey
+value) will be chosen less often than a parameter which doesn&#8217;t. Parameters
+which produce templates we&#8217;ve already seen are also penalized in order to guide
+the search towards novelty.</p>
+<p>The way this works in Hypothesis is with an infinitely many armed bandit algorithm
+based on Thompson Sampling and some ad hoc hacks. I don&#8217;t strongly recommend
+following the specific algorithm, though it seems to work well in practice.</p>
+</div>
+<div class="section" id="the-database">
+<h2>The database<a class="headerlink" href="#the-database" title="Permalink to this headline">¶</a></h2>
+<p>There&#8217;s not much to say here except &#8220;why isn&#8217;t everyone doing this?&#8221; (though
+in fairness this is made much easier by the template system).</p>
+<p>When Hypothesis finds a minimal failing example it saves the template for it in
+a database (by default a local sqlite database, though it could be anything).
+When run in future, Hypothesis first checks if there are any saved examples for
+the test and tries those first. If any of them fail the test, it skips straight
+to the minimization statge without bothering with data generation. This is
+particularly useful for tests with a low probability of failure - if Hypothesis
+has a one in 1000 chance of finding an example it will probably take 5 runs of
+the test suite before the test fails, but after that it will consistently fail
+until you fix the bug.</p>
+<p>The key Hypothesis uses for this is the type signature of the test, but that
+hasn&#8217;t proven terribly useful. You could use the name of the test equally well
+without losing much.</p>
+<p>I had some experiments with disassembling and reassembling examples for reuse
+in other tests, but in the end these didn&#8217;t prove very useful and were hard to
+support after some other changes to the system, so I took them out.</p>
+</div>
+<div class="section" id="example-tracking">
+<h2>Example tracking<a class="headerlink" href="#example-tracking" title="Permalink to this headline">¶</a></h2>
+<p>The idea of this is simply that we don&#8217;t want to call a test function with the
+same example twice. I think normal property based testing systems don&#8217;t do this
+because they just assume that properties are faster to check than it is to test
+whether we&#8217;ve seen this one before, especially given a low duplication rate.</p>
+<p>Because Hypothesis is designed around the assumption that you&#8217;re going to use
+it on things that look more like unit tests (and also because Python is quite
+slow) it&#8217;s more important that we don&#8217;t duplicate effort, so we track which
+templates have previously been run and don&#8217;t bother to reify and test them
+again if they come up. As mentioned in the previous section we also then
+penalize the parameter that produced them.</p>
+<p>This is also useful for minimization: Hypothesis doesn&#8217;t mind if you have
+cycles in your minimize graph (e.g. if x simplifies to y and y simplifies to x)
+because it can just use the example tracking system to break loops.</p>
+<p>There&#8217;s a trick to this: Examples might be quite large and we don&#8217;t actually
+want to keep them around in memory if we don&#8217;t have to. Because of the restricted
+templates, we can insist that all examples belong to a set of types that have a
+stable serialization format. So rather than storing and testing the whole
+examples for equality we simply serialize them and (if the serialized string is
+at least 20 bytes) we take the sha1 hash of it. We then just keep these hashes
+around and if we&#8217;ve seen the hash before we treat the example as seen.</p>
+</div>
+</div>
+
+
+          </div>
+          <footer>
+
+
+  <hr/>
+
+  <div role="contentinfo">
+    <p>
+        &copy; Copyright 2015, David R. MacIver.
+    </p>
+  </div>
+
+  Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
+
+</footer>
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+
+
+
+
+
+    <script type="text/javascript">
+        var DOCUMENTATION_OPTIONS = {
+            URL_ROOT:'./',
+            VERSION:'0.7',
+            COLLAPSE_INDEX:false,
+            FILE_SUFFIX:'.html',
+            HAS_SOURCE:  true
+        };
+    </script>
+      <script type="text/javascript" src="_static/jquery.js"></script>
+      <script type="text/javascript" src="_static/underscore.js"></script>
+      <script type="text/javascript" src="_static/doctools.js"></script>
+
+
+
+
+
+    <script type="text/javascript" src="_static/js/theme.js"></script>
+
+
+
+
+  <script type="text/javascript">
+      jQuery(function () {
+          SphinxRtdTheme.StickyNav.enable();
+      });
+  </script>
+
+
+</body>
+</html>