Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`Supports...` Protocol strategies bugs & suggested improvements. #2292

Closed
leaprovenzano opened this issue Dec 21, 2019 · 8 comments
Closed

`Supports...` Protocol strategies bugs & suggested improvements. #2292

leaprovenzano opened this issue Dec 21, 2019 · 8 comments
Labels

Comments

@leaprovenzano
Copy link
Contributor

@leaprovenzano leaprovenzano commented Dec 21, 2019

Supports... Protocol strategies bugs & suggested improvements.


hypothesis: 4.56.1
python : 3.7.4


The following are suggested improvements for the typing.Supports... protocol types. The general expectation is that if i do from_type on a one of these protocol types i will get at minimum all strategies which generate values who's types support the protocol. At the moment many of them are looking pretty bare and so wouldn't be generating the range of values a user would expect.

For each protocol below I've provided the underlying current strategy drawn by from_type and a suggested strategy.
I gathered the suggested strategies with a little help from hypothesis.find and have also tried to optimize the order of the statigies with a little stupid heruristic funtion to support good shrinking. I then did some basic tests with the example function to make sure they hold.
There are also a couple which contain bugs. I will try and clarify where that is the case in each example:


typing.SupportsInt

current strategy:

floats()

example:

import typing
from hypothesis.strategies import *
from hypothesis import given


@given(from_type(typing.SupportsInt))
def test_parse_int(inp):
    int(inp)

BUGS: The above currently fails it's not protected from infinites and nans.

suggested strategy:

supportsint = (booleans() |
               integers() |
               floats(allow_infinity=False, allow_nan=False) |
               uuids() |
               decimals(allow_infinity=False, allow_nan=False) 
               )

really really nice to have:

another nice touch would be to add a stringified version of int particularly since casting from string is such a common use case

supportsint |= integers().map(str)

typing.SupportsAbs

current strategy:

complex_numbers()

example:

import typing
from hypothesis.strategies import *
from hypothesis import given


@given(from_type(typing.SupportsAbs))
def test_abs(inp):
    abs(inp)

BUGS:

we encounter an overflow error on on an huge complex number

inp = (1.2711610061536462e+308+1.2711610061536464e+308j)

    @given(from_type(typing.SupportsAbs))
    def test_parse_int(inp):
>       abs(inp)
E       OverflowError: absolute value too large

suggested strategy:

we need to limit the mangitude on complex numbers to avoid overflow. The threshold below is just a guess...

# this is just a guess...
MAX_MAGNITUDE_FOR_ABS = 308

supportsabs = (
    booleans() |
    integers() |
    floats(allow_nan=False, allow_infinity=False) |
    complex_numbers(max_magnitude=MAX_MAGNITUDE_FOR_ABS) |
    fractions() |
    decimals() |
    timedeltas()
)

nice to have:

support for numpy.extra where applicable since most numeric arrays support abs.


typing.SupportsRound

current strategy:

complex_numbers()

example:

import typing
from hypothesis.strategies import *
from hypothesis import given


@given(from_type(typing.SupportsRound))
def test_round(inp):
    round(inp)

BUGS:

We quickly get a TypeError since complex numbers don't define __round__ method !!!

suggested strategy:

supports_round = (
    booleans() |
    integers() |
    floats(allow_nan=False, allow_infinity=False) |
    decimals(allow_nan=False, allow_infinity=False) |
    fractions()
)

typing.SupportsFloat

current strategy:

floats()

example:

import typing
from hypothesis.strategies import *
from hypothesis import given


@given(from_type(typing.SupportsFloat))
def test_cast_to_float(inp):
    float(inp)

The current strategy doesn't have any overt bugs it's just too permissive.

suggested strategy:

supports_float = (
    booleans() |
    floats() |
    integers() |
    decimals(allow_nan=False) |
    fractions()
)

really really nice to have:

as with SupportsInt would be really nice to include stringified floats in the strategy since it's a common use case.

supportsint |= floats().map(str)

I haven't covered SupportsBytes and SupportsComplex ( sorry i got a bit knackered of writing..) but they also look too narrow. So maybe worth having a look into them as well.


I'll file another issue for improvements to abc type protocols when in a bit. If this format is overwhelming let me know and i'll do issue per strategy instead of grouping together.

@leaprovenzano

This comment has been minimized.

Copy link
Contributor Author

@leaprovenzano leaprovenzano commented Dec 21, 2019

also most of these protocols are direct mappings in the _global_type_lookup and so would not involve too much faffing in types .

I'd be happy to open a PR for the changes if maintainers don't see any problems with these suggestions.

@Zac-HD

This comment has been minimized.

Copy link
Member

@Zac-HD Zac-HD commented Dec 22, 2019

Ooh, thanks for looking into this! Short on time now, but I'll share more thoughts tomorrow about what would be helpful in a PR 😁

@Zac-HD Zac-HD added the enhancement label Dec 23, 2019
@Zac-HD

This comment has been minimized.

Copy link
Member

@Zac-HD Zac-HD commented Dec 23, 2019

  • I would be very happy to have a single PR with these changes 🎉
  • I'd actually prefer to keep generating values which can overflow (etc) - the SupportsOp classes indicate that the type of some value supports an operation, not that the operation will succeed on that specific value.
  • integers().map(str) seems fairer than text(), but from_regex(r"-?\d+", fullmatch=True).filter(can_cast_to_int) could be even more fun...
  • Note that for collection types, I don't want to infer a strategy for nested collections - while they would be valid and perhaps find bugs, it's also a significant performance drain.
@leaprovenzano

This comment has been minimized.

Copy link
Contributor Author

@leaprovenzano leaprovenzano commented Dec 23, 2019

  • I would be very happy to have a single PR with these changes 🎉

cool i will have a go at it then!

  • I'd actually prefer to keep generating values which can overflow (etc) - the SupportsOp classes indicate that the type of some value supports an operation, not that the operation will succeed on that specific value.

yeah this makes sense and i don't suppose it's much work for users to filter downstream if this is what they're after... It's just me being lazy and (and possibly wanting SupportOps to be way more powerful than they are/should be).

  • integers().map(str) seems fairer than text(), but from_regex(r"-?\d+", fullmatch=True).filter(can_cast_to_int) could be even more fun...

yeah i was looking at the regex for floats but really that requires lookaheads to do properly and is more complex. int should be simpler... something like :

r'^-?([1-9]\d*)|0$'

and we could avoid filter altogether.

  • Note that for collection types, I don't want to infer a strategy for nested collections - while they would be valid and perhaps find bugs, it's also a significant performance drain.

As to collection stuff I will make another issue so we can discuss if that's cool... for SupportsOp stuff it's pretty much just numeric so we shouldn't be an problem.

@leaprovenzano

This comment has been minimized.

Copy link
Contributor Author

@leaprovenzano leaprovenzano commented Dec 23, 2019

thinking about it another option might be to make a general supports strategy that accepts an abc and returns a union of strategies whos value types implement all of its abstractmethods? ... just a thought...

longterm could extend to stuff like error filtering (ie for overflow error) at users discretion....
signature checking on the abstract methods might even be useful? again... just a potential avenue.

@Zac-HD

This comment has been minimized.

Copy link
Member

@Zac-HD Zac-HD commented Dec 23, 2019

  • I'd stick with the filtering int regex, because the first digit need not be ASCII 😉
  • for floats, we could generate a float and a format spec to apply - there are many!
  • strategies inferred from types aren't that fancy - if you want control of overflow, proper signature checks, etc, I think that belongs in your downstream project. If it's widely useful downstream, we're often happy to have it added upstream too!
@leaprovenzano

This comment has been minimized.

Copy link
Contributor Author

@leaprovenzano leaprovenzano commented Dec 23, 2019

  • I'd stick with the filtering int regex, because the first digit need not be ASCII 😉
  • for floats, we could generate a float and a format spec to apply - there are many!
  • strategies inferred from types aren't that fancy - if you want control of overflow, proper signature checks, etc, I think that belongs in your downstream project. If it's widely useful downstream, we're often happy to have it added upstream too!

fair nuff I may have gotten a bit carried away! I'll do a pr patching to patch the updates into _global_type_lookup in a bit...

leaprovenzano added a commit to leaprovenzano/hypothesis that referenced this issue Dec 24, 2019
complex numbers do not have a __round__ method. see HypothesisWorks#2292 for more details
Zac-HD added a commit that referenced this issue Dec 28, 2019
* test for support ops types

note that this does not currently work in python < 3.8 before changes to make these `typing.Protocol` types. which is a bummer... will have to work out a solution.

* add workaround in test for python < 3.8

* fix strategy for SupportsRound

complex numbers do not have a __round__ method. see #2292 for more details

* update  SupportsBytes

* add special test for supports bytes

* SupportsBytes is a subclass of Hashable... so we do the tuple thing again...

* update tests for support ops

- add a couple helper functions
- split tests into those for which only an interface is expected and those for which  in the absence of an interface we may try and cast a value

* add bytes back into SupportsBytes strat

* add SupportsInt basic strat

* remove nan and inf constraints.

* update SupportsRound strategy and pep8

* add comment to the int string thing

* add SupportsFloat strategy

* loosen error type in cast

just in case

* pep8

* update SupportsComplex strategy

* add testcase for SupportsComplex test

* add RELEASE.rst

* reformat stuff using reformat script

* test that the value generated by the strategy is not a dummy instance of the protocol itself

* fix name error causing all sorts of chaos

* silence flake8 on bytes for SupportsBytes test case

* fix lint err

* pep8 again

* pep8

* use one_of

* change to minor version

* fix link refs in RELEASE.rst

* correct regex for SupportsInt strings

* remove usless comment in supports_casting

* inline assertion and fix comment

* simplify supports_protocol

* change wording in release notes

* SupportOps strategies to multiarg form of one_of

* black formatting

* Fix build for Pandas 0.19

* Minor refactor

Co-authored-by: Zac Hatfield-Dodds <Zac-HD@users.noreply.github.com>
@Zac-HD

This comment has been minimized.

Copy link
Member

@Zac-HD Zac-HD commented Dec 28, 2019

Closed by #2297 🚀

@Zac-HD Zac-HD closed this Dec 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.