Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix loading of charmap from cache file #2166

Merged
merged 6 commits into from
Nov 2, 2019

Conversation

robertknight
Copy link
Contributor

Loading the cached charmap from disk always failed because the
GzipFile was passed to json.loads, which expects a string. Use
json.load instead, which accepts a file.

This fixes an issue where the first test in a test run that used the
charmap would be unexpectedly slow.

For context, see #2108 (comment).

Loading the cached charmap from disk always failed because the
`GzipFile` was passed to `json.loads`, which expects a string. Use
`json.load` instead, which accepts a file.

This fixes an issue where the first test in a test run that used the
charmap would be unexpectedly slow.
@Zac-HD Zac-HD added the bug something is clearly wrong here label Nov 1, 2019
@Zac-HD
Copy link
Member

Zac-HD commented Nov 1, 2019

Wow, that's a pretty serious bug! Thanks heaps for spotting it, and the PR! (with tests 😍)

You'll need to add a release file (with thank-you note) and I'd encourage you to add yourself to the list of authors too. See #2162 for a good example of what this looks like.

robertknight added a commit to robertknight/hypothesis that referenced this pull request Nov 1, 2019
@robertknight
Copy link
Contributor Author

You'll need to add a release file (with thank-you note) and I'd encourage you to add yourself to the list of authors too. See #2162 for a good example of what this looks like.

Done. Thanks for the help :)

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again Robert 🎉

@Zac-HD
Copy link
Member

Zac-HD commented Nov 1, 2019

 - Verify that the `os.utime` call to set the mtime succeeded
 - Only check the mtime of the file, since that is all we care about and
   the platform may or may not update the atime.
 - Only require second-level accuracy of mtime. Not all file
   systems/OSes support the same level of accuracy as `time.time()`
   returns.
@robertknight
Copy link
Contributor Author

I'm looking into the test failure on Travis. I have a related question though about whether we could potentially just bundle a precompiled version of the charmap in the Hypothesis package, along with a script to regenerate it? The only time it changes AFAIK is when new Unicode versions are released.

On CI systems, each new branch (or sometimes every build) typically starts out with a fresh build directory, so the first test that triggers a call to charmap() will result in a call to this function. The caching therefore doesn't help many builds by default. The flaky test failure in our app's test suite that brought the issue to light was:

E   hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 6 valid examples in 1.35 seconds (0 invalid ones and 2 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.

@Zac-HD
Copy link
Member

Zac-HD commented Nov 1, 2019

We have considered shipping this as a static data file, but the way that e.g. unicode version can vary by platform makes it fairly tricky to be sure that everything stays in sync (not that the status quo is perfect, either...). Given that we want to address the 'long delay on first run' for other reasons too, we're likely to go with a more general solution.

@robertknight
Copy link
Contributor Author

Hmm, the linux-py35 tests are failing. I'll look into that.

Given that we want to address the 'long delay on first run' for other reasons too, we're likely to go with a more general solution.

OK. I had a look at optimizing the existing cache generation and was able to find wins of ~2x by reducing the number of hash lookups: https://gist.github.com/robertknight/055d19c61c5dbe2c1169a54e1da7f238.

Reading from the `GzipFile` returns bytes, but `json.load` only supports
string input in Python <= 3.5.
Co-Authored-By: robertknight <robertknight@gmail.com>
@Zac-HD Zac-HD merged commit 8ec093c into HypothesisWorks:master Nov 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is clearly wrong here
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants