Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypothesis-aware contracts ? #1474

Closed
asmodehn opened this issue Aug 6, 2018 · 3 comments
Closed

Hypothesis-aware contracts ? #1474

asmodehn opened this issue Aug 6, 2018 · 3 comments
Labels
new-feature entirely novel capabilities or strategies question not sure it's a bug? questions welcome

Comments

@asmodehn
Copy link

asmodehn commented Aug 6, 2018

I am looking for a better way to integrate code and tests together.

Currently we have the hypothesis.infer function that determine the strategy depending on the type hint.

Pushing this logic further, how about having contracts, enforced on the usual code part of the app, that when run provide information to hypothesis :

  • similar to hypothesis.assume for a precondition. During test, hypothesis will not generate values that will not pass the precondition, and precondition will be enforced during actual execution.
  • similar to hypothesis.find for a postcondition. hypothesis can gather data for the mock during actual execution, and generate mock data with a strategy from the data gathered (like sample maybe) during test.

Any reason why / why not attempting to do this ? advice ?
Thanks for any input.

I ll see if I can come up with some proof of concept code...

@Zac-HD Zac-HD added the question not sure it's a bug? questions welcome label Aug 7, 2018
@Zac-HD Zac-HD added the new-feature entirely novel capabilities or strategies label Aug 7, 2018
@Zac-HD
Copy link
Member

Zac-HD commented Aug 7, 2018

I assume you've seen @hwayne's blog post and talk Beyond Unit Tests? That's the best general introduction I know of to using Hypothesis with dpcontracts.

I've opened an issue (link above) on the dpcontract repo to distinguish the violation of various contracts. Once that's implemented, the function below will be usable:

def fulfill(contract_func):
    """Make Hypothesis retry the test case if an @require(...) contract
    is violated.  Can be used as `builds(fulfill(func), ...)` or in the
    body of the test e.g. `assert fulfill(func)(*args)`.
    """
    @functools.wraps(contract_func)
    def inner(*args, **kwargs):
        try:
            return contract_func(*args, **kwargs)
        except PreconditionError:
            hypothesis.assume(False)
    return inner

Per CONTRIBUTING.rst, this could go either in an external package (suggested name hypothesis-dpcontracts), or as a new module hypothesis.extra.dpcontracts. Usually we encourage external packages these days, but if the above is all you need then I'd be happy with a small new extra.


Unfortunately the dpcontracts library does not store enough information to determine valid input strategies at runtime, only filter out invalid inputs. Note also that hypothesis.infer is not a function, just a special value that tells st.builds(...) to infer a strategy from the type hint instead of using the default value for an argument.

Inferring a strategy from example outputs is both difficult and unreliable - it's basically impossible not to have many false positives (flaky tests), false negatives (weak tests), or both - so we would not include it in Hypothesis. You could make an external package, but even if you did I would still recommend just generating real data by calling the function in question when you need it.

@asmodehn
Copy link
Author

asmodehn commented Aug 8, 2018

I saw the talk before, and just saw the blog post recently. Let's say I am attempting to go one step further to see what is doable and in what conditions... Let me try to explain where this come from.

My background is in distributed systems and software, and I am wondering to what extend I can apply the "testing in production" paradigm here. Python is a dynamic portable language after all, so I want to test on the actual platform in production, not on another one, potentially different. That means running hypothesis on a program while that program is running in production, with contracts enabled.

On another hand, from experience, upgrading distributed software is a pain because at the interface between two "nodes/agents", the interface (syntax or semantics) changing just a little bit can wreck havoc, and to counter that a developer often has to synchronize updates, or spent a lot of time providing backward compatibility. And neither can really scale up.

I am wondering what is required for the "testing in production" being a solution to avoid the synchronized distributed update problem. Detecting when two nodes are not working together as they should is a first step, providing a backup using an old version (or a mock of an old version) is the second one.

The precondition part seems would work here, since running the hypothesis tests in the background would detect a defect in the contracts, and action can therefore be taken to mitigate the impact.
This can also easily be used to verify bad but trusted network input (not taking security into account here).

Regarding the post condition part, socket mocking frameworks already can implement some "real" request behavior as part of the mock (dive into https://github.com/mindflayer/python-mocket as an example if you are interested in this), and will forward and store the response. This is currently quite limited though.
What I have in mind is to :

  • run hypothesis tests in a "production" setting,
  • gather requests/responses to "external" components
  • provide mocks to have tests run fast enough and not impact "external" components
  • have multiple levels of what "external" means (unit testing, integration testing, and so on).

So you mention that we cannot get rid off :

  • false positive : A test that succeed where it shouldn't.
  • false negative : A test that fails where it shouldn't.

It seems to me that this could only be because :

  • the code we call is not purely functional / idempotent / deterministic : the same input can produce different results, and we might pass the test, where a real call would have failed.
  • the network might fail and mocking it means we succeed when we shouldn't.

Let's assume a basic post-condition strategy for starter : storing samples. It seems it would work fine iff the called code is assumed pure/idempotent. It could be used as a mock. Also if we assume the "external" network is not part of what we are testing, having it succeed even if the network would fail seems reasonable. Catching that network failure would be part of other tests (higher level) tests.

I am wondering because in a distributed setting, having idempotent interaction is what you want to have if you plan to be resilient and scale anyway, and what we usually do is test a "set of nodes" while mocking its "edges". So with these two assumptions, it would seem to be a reasonable thing to do, and these assumptions are already the ones we usually make when we test distributed code...

I hope this makes more sense, and can provide some insight into the reasoning.
Let me know what you think.
Anyway I might just go and give it a try eventually...

@Zac-HD
Copy link
Member

Zac-HD commented Aug 8, 2018

OK, there's a lot of stuff to discuss here - even if you didn't mean dpcontracts integration!

  • For "testing in production", I would describe it more precisely as "writing a client or a node that uses Hypothesis". This is definitely possible - for example the swagger-conformance package acts as a client for a web API.
    The trick is basically that if it interacts via your usual interface, you can run the same test code against a production instance or against various mock versions for faster testing or to reproduce a bug. This also nicely separates the problem of "how do I write the tests" from "what do I run them against"!

  • Hypothesis cannot solve distributed upgrade problems. It can help you find incompatible parts, but you still need to either think about compatibility or replace everything at once (which isn't very distributed!).

  • Tests can also pass or fail when they should not because the input domain was mis-specified. For example, you might test with text(min_size=1) when empty messages are both possible and mishandled, or you might test with integers() when negative numbers are impossible and cause subtle errors (e.g. unsigned overflow). Nondeterminism and incorrect mocks can also cause problems, of course.

Overall though, I think my advice is just "try it and see how it goes"! Feel free to stay in touch too; I'm, closing this issue as not something we can fix about Hypothesis but would love to hear about how you're using it 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature entirely novel capabilities or strategies question not sure it's a bug? questions welcome
Projects
None yet
Development

No branches or pull requests

2 participants