Skip to content

Commit

Permalink
Use marshal + sha1 to track seen objects
Browse files Browse the repository at this point in the history
Does this look like an ugly commit? You're quite wrong. This is a
beautiful commit.

This commit:

  1. Vastly speeds up Hypothesis
  2. Vastly reduces Hypothesis's memory usage
  3. Paves the way for throwing away some utterly terrible code.

Tracker previously used HashItAnyway, which uses extmethods for
equality and hashing to let us put arbitrary objects into a dict.
This did a lot of hashing and equality logic in pure python and
thus was deathly slow, even when the hashing logic was good (which
it often wasn't).

Additionally, this meant we were keeping around a bunch of complex
Python objects in memory which takes up a large amount of space.

Because we only need to track templates, which both typically are
and can easily be enforced to be simple data types, this takes
advantage of the stability of the marshal format for tracking. We
then sha1 the results (I'm not concerned about collisions because
this is not an adversarial situation) and get to instead track a
digest that uses less memory than a simple python object, let alone
the complex data we were previously tracking.

In short, this commit is pretty great. Give it a hug.
  • Loading branch information
DRMacIver committed Mar 18, 2015
1 parent 2d5474e commit a4921b0
Show file tree
Hide file tree
Showing 3 changed files with 31 additions and 4 deletions.
2 changes: 1 addition & 1 deletion src/hypothesis/database/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def save(self, value):
tracker = Tracker()

def do_save(d, v):
if tracker.track((d, v)) > 1:
if tracker.track((repr(d), v)) > 1:
return
s = self.database.storage_for(d)
converted = s.strategy.to_basic(v)
Expand Down
27 changes: 25 additions & 2 deletions src/hypothesis/internal/tracker.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,30 @@
from __future__ import division, print_function, absolute_import, \
unicode_literals

from hypothesis.internal.hashitanyway import HashItAnyway
import marshal
import collections
from hypothesis.internal.compat import text_type, binary_type
import hashlib


def flatten(x):
if isinstance(x, (text_type, binary_type)):
return x
if isinstance(x, collections.Iterable):
return (type(x).__name__, tuple(map(flatten, x)))
return x


def object_to_tracking_key(o):
try:
k = marshal.dumps(flatten(o))
except ValueError:
raise ValueError("Unmarshallable object %r" % (o,))

if len(k) < 20:
return k
else:
return hashlib.sha1(k).digest()


class Tracker(object):
Expand All @@ -25,7 +48,7 @@ def __len__(self):
return len(self.contents)

def track(self, x):
k = HashItAnyway(x)
k = object_to_tracking_key(x)
n = self.contents.get(k, 0) + 1
self.contents[k] = n
return n
6 changes: 5 additions & 1 deletion src/hypothesis/searchstrategy/misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ def produce_parameter(self, random):
return None

def produce_template(self, context, pv):
return None

def reify(self, template):
assert template is None
return self.descriptor.value

def to_basic(self, template):
Expand All @@ -74,7 +78,7 @@ def to_basic(self, template):
def from_basic(self, data):
if data is not None:
raise BadData('Expected None but got %s' % (nice_string(data,)))
return self.descriptor.value
return None


class RandomStrategy(SearchStrategy):
Expand Down

2 comments on commit a4921b0

@doismellburning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏆

@doismellburning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no :hug: 😞

Please sign in to comment.