Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarify docs on Bundles vs. instance variables #3511

Closed
jab opened this issue Nov 19, 2022 · 10 comments
Closed

clarify docs on Bundles vs. instance variables #3511

jab opened this issue Nov 19, 2022 · 10 comments
Labels
docs documentation could *always* be better

Comments

@jab
Copy link
Contributor

jab commented Nov 19, 2022

I read through the https://hypothesis.readthedocs.io/en/latest/stateful.html docs several times, and am still having trouble understanding when you would use Bundles vs. instance variables. The docs say:

If you can replace use of Bundles with instance attributes of the class that is often simpler, but often Bundles are strictly more powerful.

But I'm not seeing what you can do with Bundles that you can't do with instance variables because they're not powerful enough.

For example, in https://github.com/jab/lfu-cache/blob/main/test_lfu.py I was able to use only instance variables to test several properties of an LFU cache, and I feel like plain old instance variables alone are getting me really far. Is there anything I'm missing out on by not using Bundles?

I noticed the docs also add:

Note that currently preconditions can’t access bundles; if you need to use preconditions, you should store relevant data on the instance instead.

Given that limitation, it's even less clear to me when it's worth it to use Bundles.


I also had a similar question on when to use a plain old __init__ method (as in the docs' DatabaseComparison example) vs. initializes that I might as well mention here too. The answer I would have guessed is that @initialize allows you to take in some strategy-generated input as an argument, while __init__ does not. But the docs don't say that, and I'm not sure it's actually true. Perhaps a section on "when to use __init__ vs. @initialize would help clarify that too.)

Thanks!

@Zac-HD Zac-HD added the docs documentation could *always* be better label Nov 19, 2022
@td-anne
Copy link
Contributor

td-anne commented Feb 9, 2024

I agree that docs clarifying this would be really helpful!

My understanding so far:

The situation in which Bundles are strictly more powerful is that Bundles can be used in place of strategies to generate inputs to rules. Thus, for instance, one can use a rule to build up a pool of examples in a Bundle, and then other rules can draw from that population of Bundles. For dictionaries, for example, the Bundles of keys/values serve as a universe to draw from, so that rules looking up things have a decent chance of selecting things that have already been inserted. While hypothesis' generators like to produce simple examples that do tend to repeat, it could be valuable to have a "universe" to draw from that wasn't any bigger than necessary to show the behaviour.

I'm not sure how you can do this with instance variables: the strategies that generate inputs to rules have no access to the current state of the state machine. This might be resolvable by adding a function to hypothesis (something like flatmap that calls a user-provided function that takes as its argument the current machine state), or it might be a fundamental limit designed in to the example generation/simplification process. My guess is that Bundles are an attempt to provide this ability.

I do have at least one question about how bundles work: the documentation states that a rule can write to multiple Bundles by specifying a tuple in targets. I had thought that this meant you could write different data to the bundles by returning a tuple, but the runtime behaviour suggests that what happens is that you can write the same data to multiple bundles (why?). For example, I might generate a polygon and a point in it, and want the polygon to go into a Bundle of polygons and the point to go into a Bundle of points; the Bundle of points could also receive points generated from a strategy directly, that were not necessarily in one of the polygons in the bundle. It doesn't seem that this is possible.

@Zac-HD
Copy link
Member

Zac-HD commented Feb 10, 2024

I'm not sure how you can do this with instance variables: the strategies that generate inputs to rules have no access to the current state of the state machine. ... My guess is that Bundles are an attempt to provide this ability.

As in @jab's example, you can use st.data() to draw from a strategy constructed inside the rule method, where you have access to (the current value of) instance variables.

Bundles are "just" an interface which allows you to express this pattern somewhat more compactly, and ideally clearly, especially once you get to something like st.lists(consumes(some_bundle)). Behind the scenes, RuleBasedStateMachine itself is syntactic sugar over @given(st.data()) too - "arbitrary mixtures of system behavior and constructing-and-drawing-from strategies" is a very powerful primitive!


I do have at least one question about how bundles work: the documentation states that a rule can write to multiple Bundles by specifying a tuple in targets. I had thought that this meant you could write different data to the bundles by returning a tuple, but the runtime behaviour suggests that what happens is that you can write the same data to multiple bundles (why?).

That design choice actually predates my involvement in the project by a few years (!!), but it does seem that we're missing a primitive here - you can easily combine elements from two bundles into one, but not vice-versa. Perhaps something we should consider adding, like we added multiple()?

@td-anne
Copy link
Contributor

td-anne commented Feb 12, 2024

Is it then untrue that Bundles are strictly more powerful?

If you can replace use of Bundles with instance attributes of the class that is often simpler, but often Bundles are strictly more powerful.

Would a PR clarifying the docs be welcome? Even if the PR needed review to be sure I had properly understood?

@Zac-HD
Copy link
Member

Zac-HD commented Feb 12, 2024

Oops, yes, those docs are just plain wrong! (I think they were correct when written, before Hypothesis 3.0 added good support for interactive data)

PRs definitely welcome 😍 If interested, you can read a bit about our PR review philosophy here.

@td-anne
Copy link
Contributor

td-anne commented Feb 13, 2024

I also had a similar question on when to use a plain old __init__ method (as in the docs' DatabaseComparison example) vs. initializes that I might as well mention here too. The answer I would have guessed is that @initialize allows you to take in some strategy-generated input as an argument, while __init__ does not. But the docs don't say that, and I'm not sure it's actually true. Perhaps a section on "when to use __init__ vs. @initialize would help clarify that too.)

Trying to address this, I find the following in the documentation:

Initializes are a special case of rules that are guaranteed to be run at most
once at the beginning of a run (i.e. before any normal rule is called).
Note if multiple initialize rules are defined, they may be called in any order,
and that order will vary from run to run.

The phrases "at most once" and "in any order" would appear to be in conflict with each other. They also conflict with the function's own documentation:

An initialize decorator behaves like a rule, but all @initialize() decorated methods will be called before any @rule() decorated methods, in an arbitrary order. Each @initialize() method will be called exactly once per run, unless one raises an exception - after which only the .teardown() method will be run.

The code (in run_state_machine) appears to support the function's documentation.

My understanding is:

  • If you need some kind of initialization of the system under test, you can generally do this in __init__.
  • If you want the initialization to depend on values pulled from a strategy, you will probably want to use @initialize.
  • If you want the initialization to populate bundles, you will want to use @initialize as it's a rule and can do that (you can't populate a bundle except by returning a value from a rule).
  • If you want to select between several different kinds of initialization (__init__, .from_json(), ...) using the same sort of selection that chooses rules to apply, you have to either cram all the options into one @initialize rule with a strategy parameter that picks which kind of initialization to run, or you have to use an instance variable that includes a "not initialized" option and write rules with preconditions (some require the variable not to have been initialized - the actual initializers - and some require it to have been initialized - the actual rules).

Is this about right?

For now I'll make the rest of the docs match the implementation.

@td-anne
Copy link
Contributor

td-anne commented Feb 13, 2024

I'm not sure how you can do this with instance variables: the strategies that generate inputs to rules have no access to the current state of the state machine. ... My guess is that Bundles are an attempt to provide this ability.

As in @jab's example, you can use st.data() to draw from a strategy constructed inside the rule method, where you have access to (the current value of) instance variables.

Bundles are "just" an interface which allows you to express this pattern somewhat more compactly, and ideally clearly, especially once you get to something like st.lists(consumes(some_bundle)). Behind the scenes, RuleBasedStateMachine itself is syntactic sugar over @given(st.data()) too - "arbitrary mixtures of system behavior and constructing-and-drawing-from strategies" is a very powerful primitive!

I realise that more primitives are not a good thing, but it seems like a clean functional way to avoid the need to sue data directly would be to provide a strategy with_state that takes an argument that is a function taking the current instance and returning a strategy?

@rule(x=with_state(lambda self: st.sampled_from(self.not_a_bundle)))
def f(self, x):
    ...

This would seem to allow the same operations as bundles, with minimal extra ceremony, and without requiring users to learn a new data type/interface. But I don't know whether it would allow hypothesis to reason about/shrink/manipulate examples as effectively - with Bundles the dependence on state is more explicit, in that hypothesis knows exactly which rules may change the contents of a bundle (usually few).

If Bundles are container classes, do they have any properties that should be described? Can a Bundle contain the same value twice? (relevant with consumes) Are they effectively lists "under the hood"? Is it possible to type-annotate Bundle objects to show what they contain?

@Zac-HD
Copy link
Member

Zac-HD commented Feb 13, 2024

st.runner().flatmap(lambda self: st.sampled_from(self.not_a_bundle)) would actually work, I think, but also gets into enough advanced detail that I'm not sure it's an improvement. The internals don't know that Bundles exist though, so they'd be just as effective either way.

Bundles can contain duplicate values, and are essentially lists/sampled_from under the hood. Since Bundle is a subtype of st.SearchStrategy, it's a generic type and you can indeed parametrize them by whatever they generate. (we even claim to the typechecker they're covariant, which is usually approximately true)

@td-anne
Copy link
Contributor

td-anne commented Feb 14, 2024

st.runner().flatmap(lambda self: st.sampled_from(self.not_a_bundle)) would actually work, I think, but also gets into enough advanced detail that I'm not sure it's an improvement. The internals don't know that Bundles exist though, so they'd be just as effective either way.

Actually, that's pretty much what I was hoping for. Will this support explicit example generation (the way data doesn't)?

@td-anne
Copy link
Contributor

td-anne commented Feb 15, 2024

@jab PR #3881 is now merged and should help with this. Could you take a look and see if you feel it answers your questions?

@jab
Copy link
Contributor Author

jab commented Feb 16, 2024

@td-anne, your changes look great, thanks so much!

@jab jab closed this as completed Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs documentation could *always* be better
Projects
None yet
Development

No branches or pull requests

3 participants