Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

register_type_strategy doesn't seem to work on parameterized generic types #3750

Closed
nickcollins opened this issue Sep 22, 2023 · 16 comments · Fixed by #3785
Closed

register_type_strategy doesn't seem to work on parameterized generic types #3750

nickcollins opened this issue Sep 22, 2023 · 16 comments · Fixed by #3785
Labels
legibility make errors helpful and Hypothesis grokable

Comments

@nickcollins
Copy link
Contributor

nickcollins commented Sep 22, 2023

The docs for register_type_strategy say that instead of registering tuple[int], I can register tuple with a function that takes the type and return the strategy. I may be misunderstanding, but I take this to mean that I can register
st.register_type_strategy(tuple, lambda typ: foo)
and then foo should be the default strategy for any param of type tuple, or tuple[int], or tuple[....], etc.

However, the code below results in a test failure, and from my investigation it is clear that the registration is not effective, and that test is getting any old tuple[int] rather than using the strategy that was registered.

from hypothesis import given, strategies as st

# Note that I'm not actually using the typ param here, but I do use it in my actual code
st.register_type_strategy(tuple, lambda typ: st.tuples(st.integers(min_value=0)))

@given(x=...)
def test(x: tuple[int]):
    assert x[0] >= 0

test()

I'm not sure if I'm using this correctly or if I understand the correct behavior. Is it simply impossible to register a default strategy for tuples that will work on parameterized tuples? If so, the documentation seems to suggest otherwise to me. Also, this seems like a useful feature to have. Or, is it that I am doing something wrong?

Thanks!

@Zac-HD Zac-HD added the legibility make errors helpful and Hypothesis grokable label Sep 25, 2023
@Zac-HD
Copy link
Member

Zac-HD commented Sep 25, 2023

Yep, turns out that we have hard-coded logic for tuple, because it's sufficiently different from other generics:

# We start with special-case support for Tuple, which isn't actually a generic
# type; then Final, Literal, and Annotated since they don't support `isinstance`.
#
# We then explicitly error on non-Generic types, which don't carry enough
# information to sensibly resolve to strategies at runtime.
# Finally, we run a variation of the subclass lookup in `st.from_type`
# among generic types in the lookup.
if get_origin(thing) == tuple or isinstance(
thing, getattr(typing, "TupleMeta", ())
):
elem_types = getattr(thing, "__tuple_params__", None) or ()
elem_types += getattr(thing, "__args__", None) or ()
if (
getattr(thing, "__tuple_use_ellipsis__", False)
or len(elem_types) == 2
and elem_types[-1] is Ellipsis
):
return st.lists(st.from_type(elem_types[0])).map(tuple)
elif len(elem_types) == 1 and elem_types[0] == ():
return st.tuples() # Empty tuple; see issue #1583
return st.tuples(*map(st.from_type, elem_types))

We should add a further bit of logic to use the registered strategy for tuple if there is one. Want to open another PR?

@nickcollins
Copy link
Contributor Author

nickcollins commented Sep 25, 2023

Yea, I think I could do another PR, as long as I understand the right approach here.
Is it something along the lines of this?

     ...
     elem_types += getattr(thing, "__args__", None) or () 
     tuple_reg_strat = get_registered_strat(tuple)
     if tuple_reg_strat is not None:
         if is_callable(tuple_reg_strat):
             return tuple_reg_strat(thing)
         else:
             return tuple_reg_strat
     elif ( 
         getattr(thing, "__tuple_use_ellipsis__", False) 
     ...

@Zac-HD
Copy link
Member

Zac-HD commented Sep 25, 2023

Yep, that looks about right to me 😁

@nickcollins
Copy link
Contributor Author

nickcollins commented Sep 25, 2023

I'm looking at hypothesis/strategies/_internal/core.py now, and it appears that the registration check in _global_type_lookup occurs before types.from_typing_type is invoked:

...
1308     # Now that we know `thing` is a type, the first step is to check for an
1309     # explicitly registered strategy. This is the best (and hopefully most
1310     # common) way to resolve a type to a strategy.  Note that the value in the
1311     # lookup may be a strategy or a function from type -> strategy; and we      
1312     # convert empty results into an explicit error.                                
1313     try:
1314         if thing in types._global_type_lookup:    
1315             return as_strategy(types._global_type_lookup[thing], thing) 
...
1359     # If there's no explicitly registered strategy, maybe a subtype of thing
1360     # is registered - if so, we can resolve it to the subclass strategy.
1361     # We'll start by checking if thing is from from the typing module,
1362     # because there are several special cases that don't play well with
1363     # subclass and instance checks.
1364     if isinstance(thing, types.typing_root_type) or (
...
1368     ):
1369         return types.from_typing_type(thing)
...

So, it seems to me that the _global_type_lookup is already failing somehow, even before from_typing_type is invoked?
And from_typing_type doesn't appear to be doing any registration lookup, so it feels odd to add registration lookup there.

Kinda unrelated:
In tests, what's cover and nocover?
Also, am I going to need to worry about different python versions? Like test_lookup_py3*.py, etc? What is "drypython"?
Just wanted to know what this stuff is as I'm looking through the tests so I know what is and isn't important to this task.
Thanks!

@nickcollins
Copy link
Contributor Author

nickcollins commented Sep 28, 2023

After looking into the code further, and in line with my previous comment, I'm now thinking that the solution looks something more like (in .../_internal/core.py:

1313     try:
1314         if get_origin(thing) in types._global_type_lookup: 
1315             return as_strategy(types._global_type_lookup[thing], thing)

as opposed to just if thing in types._global_type_lookup: ...

This presumably addresses other generic type cases as well, besides just tuple

Of course, making this change naively causes a lot of other test cases to fail, so I'm sure that more needs to be done to get it to use origin only when appropriate. I'm not sure the exact rules for type registration that determine whether a generic should be looked up on the basis of its origin or not.

@Zac-HD
Copy link
Member

Zac-HD commented Sep 28, 2023

That makes sense to me. It's been a while since I was working on the type resolution code, so the tests are probably a more reliable guide than me, but I think we probably usually want to use the registered thing if there is one 🤔. On the other hand, we might want to go with a minimal change that doesn't break existing code...

In tests, what's cover and nocover? Also, am I going to need to worry about different python versions? Like test_lookup_py3*.py, etc? What is "drypython"?

Missed this at the time, sorry.

  • cover and nocover are fairly arbitrary divisions of our tests which don't require optional dependencies, such that cover is reasonably fast and gets 100% code coverage - but we're honestly pretty sloppy about what goes where.
  • I don't think you'll need to worry about different Python versions; most of the typing-internals churn settled down by 3.8 and that's the oldest Python we still support
  • https://github.com/dry-python/returns is a library full of complicated types, which we'd mutually like to work with Hypothesis.

@nickcollins
Copy link
Contributor Author

Oh yea, I definitely don't intend to push something that breaks any tests.
Nor should I just do whatever gets all the tests to pass, cuz I think that may lead to wrong behavior that maybe isn't covered by some test.
So I'm wondering what about adding get_origin there that is breaking the tests.
Like, my instinct would be that since registration of generics isn't allowed, then any de-parameterized type should be the represented registration for all generics that boil down to it. But clearly there seems to be something wrong with this way of thinking. I haven't explored the test failures much, maybe that'll help answer the question, but in the meantime, do you know someone else who knows the ins and outs of the type resolution stuff? Or is there a spec somewhere that clearly lays out these rules? Cuz a lot of the code seems pretty well-spec'd, but the question of how to register non-generic types (or get resolution for a parameterized type whose origin has been registered) seems a bit unclear to me, when looking at the specs that i've found so far.

@nickcollins
Copy link
Contributor Author

Is there a place or person I can ask general questions about Hypothesis?
Both stuff like how this code works so that I can better understand the right approach to this problem,
but also miscellaneous stuff like how to narrow down my input to get through health checks that say not enough inputs satisfy assumptions?
The docs link to https://groups.google.com/g/hypothesis-users but it doesn't seem to be active.
Wondering if you have any tips on that? Thanks!

@Zac-HD
Copy link
Member

Zac-HD commented Oct 3, 2023

Best place for user questions is https://stackoverflow.com/questions/tagged/python-hypothesis ; for developer questions ask in an issue or PR comments 🙂

Sorry I've been slow to respond lately, it's just a v busy period at work. Often it's easier for me to comment if you open a PR with failing tests, so I can see exactly what code is having what effect.

To answer the previous question about get_origin and generic types, I don't think there's much of a spec beyond "how the code works" and to a lesser extent the tests. Unfortunately Hypothesis' type resolution logic predates Python having a get_origin() function, so there are a few places where we do some janky stuff to keep supporting things we might have just ruled out of scope if we started more recently. I probably know that code about as well as anyone else alive - and wrote a lot of it - but would still need to reload that context before working on it again. That's a good incentive for tests, comments, and clear factoring at least!

@nickcollins
Copy link
Contributor Author

nickcollins commented Oct 4, 2023

Ok, i made this pull request #3762 of the oversimplistic get_origin change so that we can see the test failures in CI.
My naive intuition is that something like this should work, but I guess the actual logic is a little too complicated?

Another option is to only use get_origin for tuples, or other special types. I don't know if generic type registration works or not for non-tuples (i.e. classes and stuff), so if it does maybe we only need to make a fix for things like tuples, and that would hopefully cut down on the potential areas where things could break. Or, if the generic type registration is not working for all types, this would wind up being only a partial fix. But that might still be better than nothing while still limiting the scope of possible test failures.

@nickcollins
Copy link
Contributor Author

Ok, sorry for all the churn in the other pull request #3762 , I see now that we only need to change logic in from_typing_type as you had originally suggested. Having looked at that a bit, it seems that tuples are already pre-registered:

So if we just check whether they are registered, and go with whatever is registered, prior to the normal tuple logic (which does stuff like st.tuples(*map(st.from_type, elem_types))), then we will always skip the normal tuple logic in favor of what will usually just be st.builds(tuple) which I think equates to () every time.

  • Maybe we could delete the pre-registration? I'm not sure why the pre-registration is there, but I assume it's there for some important reason.
  • Maybe we try to distinguish between the pre-registration and a user-defined registration? An ugly way to do this is to just look at what is registered and if it looks like st.builds(tuple) then we don't use it and go along with the normal tuple special case logic. But that seems like an error prone way to handle it. But I can't think of a reasonable way to distinguish - fundamentally, it seems like registration involves the user overriding whatever logic would normally be used to handle a type, which makes it fundamentally different from whatever the pre-registration is doing, because pre-registration should probably defer to the special-case handling (which is what seems to be happening in the current codebase). Perhaps the pre-registration could be separated from user registration, somehow, but the scope of that change seems pretty large compared to this particular issue.

@Zac-HD
Copy link
Member

Zac-HD commented Oct 24, 2023

Oops! Yeah, I think we should either delete that registration, or change it to instead register a function which does all the tuple-resolving logic.

@nickcollins
Copy link
Contributor Author

I think in another recent issue i saw there was some way to have a registered strategy return a "don't know what to do" sentinel in cases where the registered strategy should be ignored, and some fallback should be used instead. Maybe that would be useful here? If no strategy is registered, it falls back to the current special casing logic.

@nickcollins
Copy link
Contributor Author

Ok I found it - d94243b

I will have it so if NotImplemented is returned by the user-defined custom strategy, then it will fall back to the special casing logic.

@nickcollins
Copy link
Contributor Author

Ok i created a pull request #3782 . I think it is mostly the right kind of thing, though I'll need to review the push guidelines before pushing.

@nickcollins
Copy link
Contributor Author

Ok, thanks for fixing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
legibility make errors helpful and Hypothesis grokable
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants