Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Popular value constraint #32

Closed
Saphyel opened this issue Mar 14, 2021 · 11 comments
Closed

Popular value constraint #32

Saphyel opened this issue Mar 14, 2021 · 11 comments

Comments

@Saphyel
Copy link

Saphyel commented Mar 14, 2021

Hello! I'm creating this issue to see how people feels about choosing the most popular constrains so we can agree which one are more important/urgent, feel free also to comment improve the list if you feel like I'm missing something.
I'm gonna create some categories as well for visibility

String Constraints

  • Email
  • Uuid
  • Choice
  • Language
  • Locale
  • Country
  • Currency

Comparison Constraints

  • EqualTo
  • NotEqualTo
  • IdenticalTo
  • NotIdenticalTo
  • LessThan
  • GreaterThan
  • Range
  • DivisibleBy

The implementation I think should be something like:

    passports: Annotated[List[str], Country[alpha3=True]],
    age: Annotated[date, Range["1901-12-12", "2001-12-12"]],
    pets: Annotated[int, GreaterThan[-1]],
    bio: Annotated[Optional[str], Range[50, 500]],
@leycec
Copy link
Member

leycec commented Mar 15, 2021

Thanks so much for the detailed theory-crafting, Carlos! Indeed, I love it. This means it will happen.

One practical alternative to adopting, embracing, and extending someone else's third-party constraints package would be to develop our own in-house beartype solution. I'm leaning "our way or the highway," because there don't actually appear to be any PEP-compliant third-party constraints packages. All existing constraints packages precede PEP 593 -- Flexible function and variable annotations, which means:

In either case, they're all bad news for certain definitions of "bad news" and just waste everyone's time. Instead...

Introducing Bearnote

Since we don't trust anyone else to get this right, let's just do this ourselves.

Specifically, let's publish a new beartype/{insert_punny_package_name_here} package under the @beartype organization, where prospective names for {insert_punny_package_name_here} might be:

  • bearnote!
  • bearspray!
  • bearspittle!

Very well, I have no idea. Something prefixed by bear. Beyond that, I'm all furry ears. And here's how we design that package:

  • In a public bearnote.abc submodule, we declare a BearNoteConstraintABC abstract base class (ABC) resembling:
from abc import (
    ABCMeta as _ABCMeta,
    abstractmethod as _abstractmethod,
)
from types import GenericAlias as _GenericAlias
from typing import Any as _Any

class BearNoteConstraintABC(object, metaclass=_ABCMeta):
    '''
    Abstract base class (ABC) of all :mod:`bearnote` 
    value constraint subclasses suitable for use in
    `PEP 593`_-compliant type hints.

    Instances of this ABC encapsulate a specific constraint on
    arbitrary values of a specific type (e.g., minimum and/or
    maximum character length for strings, minimum and/or
    maximum value for numbers).

    .. _PEP 593:
       https://www.python.org/dev/peps/pep-0593
    '''

    #FIXME: Unsure whether this works as intended. Let's find out!
    @_abstractmethod
    @classmethod
    def __class_getitem__(self, *args) -> _GenericAlias:
        '''
        `PEP 560`_-compliant class method returning a generic alias
        of this subclass subscripted by the passed arguments.
        '''

        pass


    @_abstractmethod
    def is_valid(self, obj: _Any) -> bool:
        '''
        ``True`` only if the passed arbitrary object satisfies this constraint.
        '''

        pass
  • In a public bearnote.text submodule, we declare one BearNoteConstraintABC subclass for each string-specific value constraint. For example, a trivial constraint on string length might resemble:
from bearnote.abc import BearNoteConstraintABC as _BearNoteConstraintABC
from bearnote.roar import BearNoteException as _BearNoteException
from beartype import beartype as _beartype
from types import GenericAlias as _GenericAlias
from typing import Any as _Any

class StrLenMax(_BearNoteConstraintABC):
    '''
    :mod:`bearnote` **maximum string length constraint** (i.e., matching
    only strings no longer than a user-specific maximum character
    length), suitable for use in `PEP 593`_-compliant type hints.

    .. _PEP 593:
       https://www.python.org/dev/peps/pep-0593
    '''

    #FIXME: Obviously, this needs work. What you gonna do?
    @beartype
    def __class_getitem__(self, max_len: int) -> _GenericAlias:
        if max_len < 0:
            raise _BearNoteException(f'Maximum string length {max_len} < 0.')

        return _GenericAlias(StrLenMax, max_len)

    #FIXME: Obviously, this needs work too. What you still gonna do?
    def is_valid(self, obj: _Any) -> bool:
        return isinstance(obj, str) and len(obj) <= self.__args__[0]

End users then annotate @beartype-decorated callables with instances of those subclasses like so:

from bearnote.text import StrLenMax
from beartype import beartype
from typing import Annotated

@beartype
def strip_text(text: Annotated[str, StrLenMax[5]]) -> (
    Annotated[str, StrLenMax[3]]):
    return text[2:-1] if len(text) >= 3 else text

beartype then checks the arguments of PEP 593-compliant typing.Annotated type hints to decide whether any of them are bearnote-specific value constraints with a simple O(1) test resembling:

from bearnote.abc import BearNoteConstraintABC 
from types import GenericAlias
...

# If this "typing.Annotated" argument is a "bearnote" value constraint...
if (
    isinstance(annotated_arg, GenericAlias) and
    isinstance(annotated_org.__origin__, BearNoteConstraintABC
):
    #FIXME: Obviously, this is simply pseudo-code. Don't crucify me!
    # Then generate code calling this constraint's is_valid() method.
    return 'annotated_org.__origin__.is_valid(obj)'

That's the gist, anyway.

@harens: I know you're absolutely full-up with being an over-achieving Londoner of studious success, but I dimly recall you being interested in Python data validation. It's that something you'd still be interested in? If so, I'd be delighted to have you either co-lead or just casually contribute to a project like that outlined above.

Of course, a non-committal shrug is also an acceptable and expected response. 😜

@harens
Copy link
Contributor

harens commented Mar 16, 2021

I dimly recall you being interested in Python data validation.

Those dodgy nootropics seem to be doing you well @leycec! 💊 Funnily enough, that project was just part of a school course back when I was learning Python...so I wouldn't really call myself a data validation connoisseur. Having said that, I'd still love to help where I can. 👍

Introducing Bearnote

This looks great! A mypy-compliant constraints package would certainly be amazing, and a sprinkle of bear puns makes it even better :) 🐻 It's definitely something that I would love to use.

Specifically, let's publish a new beartype/{insert_punny_package_name_here} package under the @beartype organization

If this is going to be a seperate project (which it seems like it will), I have a few suggestions from our experience with the beartype project which you might find interesting.

  • Start building the docs from the beginning. It doesn't have to be amazing, but I think just breaking down the content into different files should hopefully make things easier in the long run.
  • mypy as we go. Getting ~400 mypy errors just before a release is never fun (thanks again for fixing PEP 561 compliance #25). It might be easier and simpler if, similar to beartype, zero mypy errors are required for tests to pass.
  • Hypermodern Python setup. 🎸 This one's less necessary, but there are loads of hypermodern python templates/guides online which you might find fascinating. In particular, definitely check out poetry if you haven't already. Both @Saphyel and I use it! Having said that, there might also be benefits to sticking with the current layout since then a lot of things can be copied across easily. I'll leave that one up to you. 😉

Either way, whatever happens, I'll still be here to package the project when you need me. 📦 Thanks for all your work on the beartype project @leycec! You've done an amazing job.

Also hi 👋 @Saphyel. It's always great to see a fellow Londoner! 🇬🇧 I hope the chilly nightfall is treating you well.

@Heliotrop3
Copy link
Contributor

Note that beartype will need to perform some value checking in order to incorporate PEP 586 -- Literal Type.

However the aforementioned PEP suggests then states adding the semantics for value checking as outlined above is currently outside the scope of the Literal type hint. Paraphrasing:

A full-fledged dependent type system that lets users predicate types based on their values in arbitrary ways, while certainly useful, is out of scope for PEP 586. Such a type system would require substantially more work with respect to implementation, discussion, and research than this PEP provides
....
This PEP should be seen as a stepping stone towards this goal, rather than an attempt at providing a comprehensive solution.

Admittedly, I'm not crystal clear on the interaction between PEP 593 -- Flexible Function and Variable Annotations and PEP 586 -- Literal Type. It seems, however, type hint Literal should be included in this conversation of data validation.

@leycec
Copy link
Member

leycec commented Mar 18, 2021

@Heliotrop3 with the deep take, as always.

PEP 586 -- Literal Type is indeed a crude form of value constraint – the crudest! typing.Literal is literally see wat i did there just the == object equality operator encapsulated as a type hint: e.g.,

>>> CREEPY = 'I have a special plan for this world.'
>>> NOT_CREEPY = 'In her eyes tonight, there’s a glow tonight.'

# This is how normal code checks object equality.
>>> CREEPY == 'I have a special plan for this world.'
True
>>> NOT_CREEPY == 'I have a special plan for this world.'
False

# ...but this is how type checkers check object equality!
>>> @beartype
... def ligotti(creepy: Literal[
...     'I have a special plan for this world.',
...     'Imagine, he said, all the flesh that is eaten.',
...     'Now take away that flesh, he said.',
... ]) -> str:
...     return creepy[:16]

>>> ligotti(CREEPY)
'I have a special'
>>> ligotti(NOT_CREEPY)
beartype.roar.BeartypeCallHintPepParamException: @beartyped ligotti()
parameter creepy='In her eyes tonight, there’s a glow tonight.' violates type
hint typing.Literal[...], as value 'In her eyes tonight, there’s a glow tonight.'
not 'I have a special plan for this world.', 'Imagine, he said, all the flesh that
is eaten.', or 'Now take away that flesh, he said.'.

The takeaway is that PEP 586 doesn't really do much for us here, because PEP 586 only talks about object equality. But we're talking about every sort of object comparison here like integer comparison and regex-based pattern matching and yadda-yadda.

Object equality is neat and all... but how often in real-world code have you needed to type-check a callable parameter or return to be strictly equal to only one of n specific values? Those who like playing devil's advocate may now be thinking: "All the time, bro. All the friggin' time." If this is you, read on, because we have bad news.

PEP 586: The PEP That Barely Does Anything and Does It Badly

@beartype technically doesn't support PEP 586 yet – but it wouldn't take much to get us there. We mostly just lack motivation, because PEP 586 is mostly useless.

Why? Because PEP 586 only supports five possible types excluding None, which PEP 484 already explicitly supported five years ago:

Literal may be parameterized with literal ints, byte and unicode strings, bools, Enum values and None.

That's it. Like, who even wrote that specification? With constraints that narrow, why'd they even bother? We can't even type-check objects against complex numbers, containers, or instances of user-defined classes with that! That makes PEP 586 frustratingly inapplicable for 99.9999% of use cases.

Even using that to type-check equality against Enum members – which is really the one only valid real-world use case here – violates DRY by requiring manual relisting of all Enum members (e.g., Literal[ShapeEnum.square, ShapeEnum.circle, ShapeEnum.ellipse, ShapeEnum.triangle]).

That pains me somewhere sensitive deep inside. What happens when you add a new ShapeEnum.dodecahedron member but forget to append that member to every Literal type hint enumerating ShapeEnum littered throughout your million-line codebase?

Bad stuff, Tyler. Bad stuff happens.

We still intend to support PEP 586, because it's an annotation PEP. That's what we do here. But it's the least useful annotation PEP yet, which means it's dead last on our TODO: list.

Y U do this 2 us, Guido? 😞

@leycec
Copy link
Member

leycec commented Mar 18, 2021

Let's chat third-party validation packages. There are more than a few. Here at @beartype, we aim to please you while pleasing ourselves. We'd thus be happy to support all reasonably popular well-maintained validation packages with sane APIs, where "sane APIs" is defined here as packages that:

  • Provide a fast mechanism for detecting package-specific validation objects. We need to be able to detect these objects when listed as typing.Annotation arguments and we need that detection to be fast. This usually isn't a problem, because a sane API should root its object hierarchy at a public abstract base class (ABC). Given that class, we can trivially detect all validation objects produced by that package via a trivial issubclass(type_hint, ThirdPartyPackageABC). Next!
  • Provide a quasi-fast mechanism for validating arbitrary objects against those package-specific validation objects. This is probably the stickler. We're not necessarily mandating @beartype-style O(1) behaviour here, but it would be nice if the package in question took pains to avoid unsafe O(n) behaviour. Even if it doesn't, that could still be okay. The inefficiency of other packages isn't necessarily our concern, right? What we do need here is for that package to make it really easy to validate arbitrary objects against those package-specific validation objects. "Really easy" means that validation objects like somepackage.GreaterThan(2):
    • Must define a tester method that returns booleans rather than raising exceptions when passed invalid objects. This tester should be defined as an abstract method by the aforementioned ABC, since that lets us treat validation objects generically.
    • May optionally also define a validation method that raises exceptions rather than returning booleans when passed invalid objects. This is optional, because we can raise exceptions ourselves; we don't need a specific method for that, although having a specific method for that would help us raise human-readable exceptions.

Minimal-length example or it didn't happen, so consider a third-party validation somepackage with a sane API resembling:

class ValidationABC(object, metaclass=ABCMeta):
    @abstractmethod
    def is_valid(self, obj: Any) -> bool: pass

class GreaterThan(ValidationABC):
    def __init__(self, number: int) -> None:
        self._number = number
    def is_valid(self, obj: Any) -> bool:
        return isinstance(obj, int) and obj > self._number

That's it. We have no idea whether the following packages satisfy those requirements, because we are lazy. Nonetheless, popular well-maintained validation packages include (in no particular order):

That... might be it.

Note that most third-party validation packages like Cerberus and Colander are obsessed with schemas and data exchange formats (e.g., JSON, YAML). Those are all sadly irrelevant and useless for our purposes. We need something general-purpose, unstructured, and fast – so, not those. Those are all slow behemoths from a bygone age when web devs didn't have Django or Panel. </sigh>

@leycec
Copy link
Member

leycec commented Mar 30, 2021

...back from the GitHub gutter, it's that balding @leycec guy! I've just finalized Python 3.10 support in 89bb8d3 and am now dedicating the next several years months weeks days to this. Provisional support for data validation will land in the next stable release, which means beartype 0.7.0 by this Friday, because if I spend any longer on this my wife will seriously beat me up.

Here's how you'll use it:

from beartype import beartype
from beartype.constraint import Constraint
from typing import Annotated

@beartype
def validate_text(text: Annotated[str, Constraint[
    lambda text: 4 <= len(data) <= 14]]):
    ...

The validate_text() function defined above validates the passed value to be a string with length in the range [4, 14]. @beartype will do all that for you. All you do is supply the arbitrary user-defined constraint. I believe in you!

Of course, that's a bit unwieldy when copy-and-pasted across an entire codebase. Instead, everyone wants to define commonly used constraints as PEP 484-compliant type aliases: e.g.,

from beartype import beartype
from beartype.constraint import Constraint
from typing import Annotated

LengthyString = Annotated[str, Constraint[lambda data: 4 <= len(data) <= 14]]
'''
PEP-compliant type hint validating the passed or returned value to be
a string with length in the range ``[4, 14]``.
'''

@beartype
def munge_text(text: LengthyString): ...

@beartype
def plunge_text(text: LengthyString): ...

Everything above is PEP-compliant. That means static type checkers and smarty-pants Python IDEs like PyCharm will implicitly support all of that. That's good.

More importantly, users can define arbitrarily complex constraints satisfying their own stack-specific needs. That's even better. You don't need to wait for me or someone else who resembles me (so, my evil doppelgänger) to write those constraints for you. Instead, you do it and I'll unconditionally support it, whatever it is, no matter what.

Given that basic support for data validation, we can then gradually build out more involved support for specific types of data validation in the beartype.constraint submodule – all reusing the same core beartype.constraint.Constraint API. Higher-level constraints might resemble:

  • beartype.constraint.RegexConstraint, constraining the passed or returned string to match a compiled regex.
  • beartype.constraint.NumberConstraint, constraining the passed or returned integer, floating-point, or complex number to match a numeric operation (e.g., greater-than, less-than).
  • beartype.constraint.CountryCodeConstraint, constraining the passed or returned string to match an ISO 3166-compliant country code (probably internally implemented with beartype.constraint.RegexConstraint and A Really Big Or Ugly Regex (ARBOUR)).
  • ...and so on and so forth, hand-waving all the ugly details away.

None of that except the core beartype.constraint.Constraint API will land in beartype 0.7.0, because time is slipping like a greased pig through my sausage fingers. I'd still love to hear what everyone thinks about that. If you think that sucks, please don't tell because I'm now emotionally invested.

@harens: Also, I can't believe I missed your enthusiastic reply suffused with venerable wisdom. I now feel bad. You are correct about everything. You often are. Thankfully, I have realized that nobody wants me to author 1,001 packages with cute and cuddly names like bearcat, bearspray, bearable, and unbearable. I won't be doing that. Instead, everyone just wants me to make beartype usable and passably documented. I'll try to be doing that instead. 📝

leycec added a commit that referenced this issue Apr 3, 2021
This commit is the first in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
commit defines a new well-tested private
`beartype._util.func.utilfuncmake.copy_func_shallow()` utility function
shallowly coping pure-Python callables in a robust and efficient manner
*and* a new untested and currently mostly empty public `beartype.must`
subpackage exposing the public API for this validation. (*Circumstantial happenstance!*)
@leycec
Copy link
Member

leycec commented Apr 3, 2021

It's happening, bearmongers. Commit 036075b begins The Work That Transforms Beartype into a Useable Work Product for People.

beartype.constraint.Constraint seemed overly verbose and anti-fun. Instead, we're now aiming for a terse and pro-fun data validation API situated at either beartype.must.Must or beartype.note.Note. I'm leaning towards the former, because beartype must increase your data consistency.

It looks like:

from beartype import beartype
from beartype.must import Must
from typing import Annotated

@beartype
def get_text_middle(text: Annotated[str, Must[
    lambda text: 4 <= len(data) <= 14]]):
    '''
    Return the substring spanning characters ``[6, 9]`` inclusive
    from the passed string required to have a length in the range
    ``[4, 14]`` inclusive.
    '''

    # "This is guaranteed to work," says beartype.
    return text[7:10]

Tragically, the IRS insists I must file expat taxes... or else. I think they intend to commit depraved and unmentionable acts of perfidy on our cats. To save our cats, I'll be distracted next week with filling out an endless litany of Cold War-era Linux-incompatible bureaucracy that could really use have used a reboot three decades ago.

Do not be alarmed if I commit nothing for a week. That's just me trading off my remaining sanity, patience, and hair for my cats.

leycec added a commit that referenced this issue Apr 3, 2021
This commit is the first in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
commit documents the new `beartype.must.Must` class -- complete with
usage instructions and a working example. (*Hypothetical hypochondriacs!*)
leycec added a commit that referenced this issue Apr 8, 2021
This commit is the first in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
commit renames the prior `beartype.must` subpackage to `beartype.vale`
and the prior prior `beartype.must.Must` class `beartype.vale.Is`,
significantly improves documentation across this class, and begins
implementing the core `beartype.vale.Is.__class_getitem__()` dunder
class method. (*Arbitrary arbitration!*)
leycec added a commit that referenced this issue Apr 9, 2021
This commit is the first in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
significantly improves documentation across the `beartype.vale.Is` class
and defines a new private
`beartype._util.func.utilfuncarg.get_func_args_len_standard()` getter
introspecting the number of standard arguments accepted by the passed
callable, internally called by the core
`beartype.vale.Is.__class_getitem__()` dunder class method. (*Prestidigitation's predestination!*)
leycec added a commit that referenced this issue Apr 10, 2021
This commit is the next in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
commit (yet again) significantly improves documentation across the
`beartype.vale.Is` class, defines a new public
`beartype.vale.AnnotatedIs` class instantiated by the core
`beartype.vale.Is.__class_getitem__()` dunder class method, defines a
new private `beartype.vale._valeiscore.is_hint_pep593_beartype()` tester
detecting beartype-specific annotated type hints, and exhaustively
exercises *all* `beartype.vale._valeiscore` attributes with unit tests.
(*Insipid agility's tepid fragility!*)
leycec added a commit that referenced this issue Apr 14, 2021
This commit is the next in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
commit enables the new public `beartype.vale.AnnotatedIs` class
instantiated by the core `beartype.vale.Is.__class_getitem__()` dunder
class method to optionally avoid additional stack frames by generating
executable code and code locals and exhaustively exercises this
functionality with unit tests. (*Pallid limpets primped in pomposity!*)
leycec added a commit that referenced this issue Apr 16, 2021
This commit is the next in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
commit type-checks the `beartype.vale.AnnotatedIs` class instantiated by
the `beartype.vale.Is.__class_getitem__()` dunder class method, but has
yet to generate human-readable exceptions on type-checking failures.
(*Extraneous extemporaneousness!*)
leycec added a commit that referenced this issue Apr 17, 2021
This commit is the next in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
commit implements a variety of low-level utility functions required to
generate human-readable exceptions on violations of data validators
supplied as callables, which curiously is substantially more difficult
than simply validating that data. (*Munificent munitions!*)
leycec added a commit that referenced this issue Apr 19, 2021
This commit is the next in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
commit implements a draft private
beartype._util.func.utilfuncorigin.get_func_lambda_origin_code_or_none()
function introspecting the exact code substring declaring an arbitrary
lambda function as well as superficial support for a new
`beartype.vale.SubscriptedIs._get_repr()` static method enabling dynamic
generation of machine-readable object representations for arbitrarily
nested and complex "Is[...]" subscriptions, required to generate
human-readable exceptions on violations of data validators. Unrelatedly,
this commit also dramatically improves the *See Also* section of our
front-facing `README.rst` documentation with a comparative review of all
known runtime type checkers and new *Runtime Data Validators* subsection
enumerating all known runtime data validator (e.g., contract) packages.
(*Enigmatic intervals!*)
leycec added a commit that referenced this issue Apr 20, 2021
This commit is the next in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, it's so
late and it's raining and I no longer have a clear grip on what exactly
was done here but I'm fairly sure it was impressive.

## Features Optimized

* **Non-builtin types.** `@beartype` now checks non-builtin types
  optimally by avoiding an extraneous dictionary lookup in the
  beartypistry singleton previously required to check those types.

(*Outlandish outlanders!*)
leycec added a commit that referenced this issue Apr 22, 2021
This commit is the next in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
commit implements support for generating deferred memoized
machine-readable representations via the repr() builtin when passed
arbitrarily nested and complex "Is[...]" subscriptions, used by
@beartype to generate human-readable exception messages when a parameter
or return violates a data validator. (*Dissipative reparations!*)
leycec added a commit that referenced this issue Apr 23, 2021
This commit is the next in a commit chain adding support for arbitrary
caller-defined data validation en-route to resolving issue #32, enabling
callers to validate the internal structure of arbitrarily complex
scalars, data structures, and third-party objects. Specifically, this
commit substantially improves the robustness of getter functions defined
by the private `beartype._util.func.utilfunccode`, which internally call
surprisingly fragile `ast`, `inspect`, and `tokenize` functions that
have a bad habit of raising fatal non-human-readable exceptions in
common edge cases. Frankly, those functions are sufficiently buggy that
I have doubts whether anyone actually tested them to any reasonable
degree. (*Destructive derivatives!*)
leycec added a commit that referenced this issue Apr 24, 2021
This commit is the maybe second-to-last in a commit chain adding support
for arbitrary caller-defined data validation en-route to resolving the
issue that is #32, enabling callers to validate the internal structure
of arbitrarily complex scalars, data structures, and third-party
objects. Specifically, this commit finalizes the implementation of both
the core "beartype.vale.Is" class *and* code generated to validate type
hints annotated by subscriptions of that class in `@beartype`-decorated
callables as well as unit tests exercising these facilities. The next
commit will finalize unit tests exercising all edge cases and ideally be
the last commit in this commit chain – finalizing the world's first
PEP-compliant data validation API. (*Almost almighty!*)
@leycec
Copy link
Member

leycec commented Apr 24, 2021

OMFG. People, the world's first PEP-friendly data validation framework that is also the world's fastest PEP-friendly data validation framework is happening. We will:

  • Finalize the API by tomorrow after dangerously high caffeine consumption and the agonized head-clutching that follows.
  • Release beartype 0.8.0 publishing both that API and Python 3.10 support on Sunday, which hereafter will be referred to as D.V.S. (Data Validation Sunday).

Final sneak preview for the crickets chirping dolefully in the audience: 🦗 🦗 🦗

from beartype import beartype
from beartype.vale import Is
from typing import Annotated

@beartype
def get_text_middle(text: Annotated[str, Is[
    lambda text: 4 <= len(data) <= 14]]):
    '''
    Return the substring spanning characters ``[6, 9]`` inclusive
    from the passed string required to have a length in the range
    ``[4, 14]`` inclusive.
    '''

    # "This is guaranteed to work," says beartype.
    return text[7:10]

</heavy_breathing>

leycec added a commit that referenced this issue Apr 25, 2021
This commit is the last in a commit chain adding support for arbitrary
caller-defined data validation, resolving issue #32. Specifically,
this commit finalizes unit tests exercising all edge cases associated
with this functionality – finalizing the world's first PEP-compliant
data validation API.

## Issues Resolved

* **Data validation.** `@beartype` now supports arbitrary caller-defined
  data validators enabling callers to efficiently validate the internal
  structure of arbitrarily complex scalars, data structures, and
  third-party objects. Specifically, `@beartype`-decorated callables may
  now be annotated by type hints of the form `typing.Annotated[{cls},
  beartype.vale.Is[lambda obj: {test_expr1}], ...,
  beartype.vale.Is[lambda obj: {test_exprN}]]`, where:
  * `{cls}` is any arbitrary class (e.g., `str`, `numpy.ndarray`).
  * `{test_expr1}` is any arbitrary expression evaluating to a boolean
    (e.g., `len(obj) <= 80`, `obj.dtype == np.dtype(np.float64)`).
  * `{test_exprN}` is any arbitrary expression evaluating to a boolean.
  `beartype.vale.Is` may also be subscripted (indexed) by non-lambda
  callables with similar signatures. For convenience, `beartype.vale.Is`
  objects support a rich domain-specific language (DSL) enabling new
  data validators to be synthesized from existing data validators using
  only standard operators:
  * **Negation** with `~beartype.vale.Is[lambda obj: {test_expr}]`,
    equivalent to
    `beartype.vale.Is[lambda obj: not {test_expr}]`.
  * **And-ing** with `beartype.vale.Is[lambda obj: {test_expr1}] &
    beartype.vale.Is[lambda obj: {test_expr2}]`, equivalent to
    `beartype.vale.Is[lambda obj: {test_expr1} and {test_expr2}]`.
  * **Or-ing** with `beartype.vale.Is[lambda obj: {test_expr1}] |
    beartype.vale.Is[lambda obj: {test_expr2}]`, equivalent to
    `beartype.vale.Is[lambda obj: {test_expr1} or {test_expr2}]`.
  This syntax fully complies with [PEP
  593](https://www.python.org/dev/peps/pep-0593) and thus requires
  Python ≥ 3.9. See `help(beartype.vale.Is)` for full usage
  instructions, complete with real-world examples. This resolves issue
  #32, kindly submitted by fashionable top hat-wearing London cat
  #seductress @Saphyel (Carlos Jimenez).
* **Byte strings in errors.** `@beartype` now correctly displays byte
  string values in exception and warning messages. It's amazing! Believe
  what you have never believed before.

(*Unseemly briars sired by seamstress tresses!*)
@leycec
Copy link
Member

leycec commented Apr 25, 2021

Boom-shaka! 🔥 💥 🤯

@beartype now supports arbitrary caller-defined data validators. Thus ends a year-long journey culminating in the complete loss of all hair from my head, which we take a sober moment to mourn.

Everyone may now efficiently validate the internal structure of arbitrarily complex scalars, data structures, and third-party objects with PEP-compliant type hints that preserve everything you secretly love about fat bears, luscious berries, hot Spring weather, and valid app data.

Syntax

@beartype-decorated callables may now be annotated by type hints of the form typing.Annotated[{cls}, beartype.vale.Is[lambda obj: {test_expr1}], ..., beartype.vale.Is[lambda obj: {test_exprN}]], where:

  • {cls} is any arbitrary class (e.g., str, numpy.ndarray).
  • {test_expr1} is any arbitrary expression evaluating to a boolean (e.g., len(obj) <= 80, obj.dtype == np.dtype(np.float64)).
  • {test_exprN} is any arbitrary expression evaluating to a boolean, too.

Syntax: it's no sin and we don't charge tax.

Example 1: Make It So, Ensign NumPy!

Computational geometry example or it didn't happen, so let's validate a passed object as a two-dimensional NumPy array of floats of arbitrary precision:

from beartype import beartype
from beartype.vale import Is
from numpy import floating, issubdtype, ndarray
from typing import Annotated

Numpy2DFloatArray = Annotated[ndarray, Is[
    lambda array: polygon.ndim == 2 and issubdtype(polygon.dtype, floating)]]
'''
Beartype-specific data validator matching only parameter and return values that
are two-dimensional NumPy arrays of floats of arbitrary precision.
'''

@beartype
def polygon_area(polygon: Numpy2DFloatArray) -> float:
    '''
    Area of a two-dimensional polygon of floats defined as a set of
    counter-clockwise points, calculated via Green's theorem ala a planimeter.

    *Don't ask.*
    '''

    # Calculate and return the desired area. Just pretend we understand this.
    polygon_rolled = np.roll(polygon, -1, axis=0)
    return np.abs(0.5*np.sum(
        polygon[:,0]*polygon_rolled[:,1] - polygon_rolled[:,0]*polygon[:,1]))

DSL: It's Not Just a Telecom Acronym Anymore

beartype.vale.Is also supports a rich domain-specific language (DSL) enabling new validators to be synthesized from existing validators with overloaded set operators, including:

  • Negation via ~beartype.vale.Is[lambda obj: {test_expr}], equivalent to beartype.vale.Is[lambda obj: not {test_expr}].
  • And-ing via beartype.vale.Is[lambda obj: {test_expr1}] & beartype.vale.Is[lambda obj: {test_expr2}], equivalent to beartype.vale.Is[lambda obj: {test_expr1} and {test_expr2}].
  • Or-ing via beartype.vale.Is[lambda obj: {test_expr1}] | beartype.vale.Is[lambda obj: {test_expr2}], equivalent to beartype.vale.Is[lambda obj: {test_expr1} or {test_expr2}].

Example 2: Validate My Strings or GTFO

Nonsensical string matching example or it didn't happen, so let's validate a passed object as a string either of at least 80 characters or both quoted and suffixed by a period. Look, it doesn't matter. Just do it already, @beartype!

from beartype import beartype
from beartype.vale import Is
from typing import Annotated

# Beartype-specific data validators defined as lambda functions.
IsLengthy = Is[lambda text: len(text) > 80]
IsSentence = Is[lambda text: text and text[-1] == '.']

# Beartype-specific data validator defined as a non-lambda function.
def _is_quoted(text): return '"' in text or "'" in text
IsQuoted = Is[_is_quoted]

# Combine multiple validators by just listing them sequentially.
@beartype
def desentence_lengthy_quoted_sentence(
    text: Annotated[str, IsLengthy, IsSentence, IsQuoted]]) -> str:
    '''
    Strip the suffixing period from a lengthy quoted sentence... *just 'cause.*
    '''

    return text[:-1]  # this is horrible

# Combine multiple validators by just "&"-ing them sequentially. Yes, this is
# exactly identical to the prior function... just 'cause.
@beartype
def desentence_lengthy_quoted_sentence_part_deux(
    text: Annotated[str, IsLengthy & IsSentence & IsQuoted]]) -> str:
    '''
    Strip the suffixing period from a lengthy quoted sentence... *just 'cause.*
    '''

    return text[:-1]  # this is still horrible

# Combine multiple validators with as many "&", "|", and "~" operators as you
# can possibly stuff into a file that your coworkers can stomach. They will
# thank you later... possibly much, much later.
@beartype
def strip_lengthy_or_quoted_sentence(
    text: Annotated[str, IsLengthy | (IsSentence & ~IsQuoted)]]) -> str:
    '''
    Strip the suffixing character from a string that is lengthy and/or a quoted
    sentence, because your web app deserves only the best data.
    '''

    return text[:-1]  # this is frankly outrageous

There's No Catch, I Swear and I Cannot Tell a Lie

Everything above fully complies with PEP 593 and thus requires Python ≥ 3.9. See help(beartype.vale.Is) in your favourite Python REPL (...which is, of course, Jupyter Lab, because I see that you are an end user of culture) for full usage instructions, complete with real-world examples.

Thus ends my last hair follicle. 👨‍🦲

@leycec leycec closed this as completed Apr 25, 2021
leycec added a commit that referenced this issue May 25, 2021
This release brings titillating support for
**[beartype validators][beartype validators]**, **Python 3.10**, [**full
PEP 563 – "Postponed Evaluation of Annotations" compliance**][PEP 563],
and [**full PEP 586 – "Literal Types" compliance**. This release
resolves **4 outstanding issues** and merges **1 pending pull request.**
Changes include:

## Features Added

* **[Beartype validators][beartype validators],** the world's first
  PEP-compliant validation API. Validate anything with two-line type
  hints designed by you, built by the `@beartype` decorator for you. The
  new public `beartype.vale` subpackage enables `beartype` users to
  design their own PEP-compliant type hints enforcing arbitrary runtime
  constraints on the internal structure and contents of parameters and
  returns via user-defined lambda functions and nestable declarative
  expressions leveraging familiar `typing` syntax – all seamlessly
  composable with standard type hints through an expressive
  domain-specific language (DSL). Specifically, `@beartype`-decorated
  callables may now be annotated by type hints of the form
  `typing.Annotated[{cls}, beartype.vale.Is[lambda obj: {test_expr1}],
  ..., beartype.vale.Is[lambda obj: {test_exprN}]]`, where:
  * `{cls}` is any arbitrary class (e.g., `str`, `numpy.ndarray`).
  * `{test_expr1}` and `{test_exprN}` are any arbitrary expressions
    evaluating to booleans (e.g., `len(obj) <= 80`, `obj.dtype ==
    np.dtype(np.float64)`).
  `beartype.vale.Is` may also be subscripted (indexed) by non-lambda
  callables with similar signatures. For convenience, `beartype.vale.Is`
  objects support a rich domain-specific language (DSL) enabling new
  validators to be synthesized from existing validators with Pythonic
  set operators:
  * **Negation** with `~beartype.vale.Is[lambda obj: {test_expr}]`,
    equivalent to
    `beartype.vale.Is[lambda obj: not {test_expr}]`.
  * **And-ing** with `beartype.vale.Is[lambda obj: {test_expr1}] &
    beartype.vale.Is[lambda obj: {test_expr2}]`, equivalent to
    `beartype.vale.Is[lambda obj: {test_expr1} and {test_expr2}]`.
  * **Or-ing** with `beartype.vale.Is[lambda obj: {test_expr1}] |
    beartype.vale.Is[lambda obj: {test_expr2}]`, equivalent to
    `beartype.vale.Is[lambda obj: {test_expr1} or {test_expr2}]`.
  This syntax fully complies with [PEP 593][PEP 593] and thus requires
  Python ≥ 3.9. See [*Beartype validators*][beartype validators] for
  full usage instructions, complete with real-world examples including
  tensors. Rejoice machine learning data scientists! This resolves issue
  #32, kindly submitted by fashionable London steampunk cat pimp
  @Saphyel (Carlos Jimenez).

## Features Optimized

* **Package importation.** The first importation of both the `beartype`
  package and `@beartype` decorator has been significantly optimized,
  now consuming on the order of microseconds rather than milliseconds
  (or even seconds in the worst case). This critical optimization should
  significantly improve runtime performance for short-lived CLI
  applications. Isn't that great, guys? ...guys? *awkward cough*
* **Wrapper function attributes.** The `@beartype` decorator now
  generates unconditionally faster type-checking wrapper functions.
  Previously, attributes accessed in the bodies of those functions were
  indirectly passed to those functions via a common dictionary singleton
  referred to as the "beartypistry" directly passed to those functions;
  while trivial, this approach had the measurable harm of one dictionary
  lookup for each attribute access in those functions. Now, the same
  attributes are instead directly passed as optional private
  beartype-specific parameters to these functions; while non-trivial,
  this approach has the measurable benefit of avoiding *any* dictionary
  lookups by instead localizing all requisite attributes to the
  signatures of those functions. Of course, this isn't just an
  optimization; this is also a hard prerequisite for supporting both
  ["PEP 586 -- Literal Types"](https://www.python.org/dev/peps/pep-0586)
  and beartype validators. The beartypistry singleton remains used only
  to dynamically resolve forward references to undeclared user types.

## Compatibility Improved

* **Python >= 3.10.0.** `@beartype` now officially supports Python 3.10,
  currently in beta but maybe-soon-to-be-released thanks to
  Python's accelerated release schedule. Python 3.10 significantly broke
  backwards compatibility with runtime introspection of type hints and
  thus runtime type checkers, complicating support for Python 3.10 for
  most runtime type checkers (including us). Specifically, Python 3.10
  unconditionally enables ["PEP 563 -- Postponed Evaluation of
  Annotations"](https://www.python.org/dev/peps/pep-0563) – an abysmal
  standard preferentially improving the efficiency of statically
  type-checked applications by reducing the efficiency of applications
  also checked by runtime type checkers. We can only protest with skinny
  fists lifted like antennas to GitHub. *Praise be to Guido.*
* **[PEP 563 – "Postponed Evaluation of Annotations"][PEP 563].** While
  `beartype 0.1.1` only partially supported [PEP 563][PEP 563],
  `@beartype` now fully supports all edge cases associated with [PEP
  563][PEP 563] – including postponed methods, nested functions,
  closures, and forward references. Forward references merit particular
  mention. Why? Because  of course, forward references are fundamentally
  indistinguishable from [PEP 563][PEP 563]-postponed type hints,
  because [PEP 563][PEP 563] was never intended to be usable at runtime.
  Unsurprisingly, it isn't. While numerous Python packages
  superficially support [PEP 563][PEP 563] by deferring to the broken
  `typing.get_type_hints()` function, `@beartype` is the first and thus
  far *only* annotation-based Python package to fully support [PEP
  563][PEP 563] and thus Python 3.10.
* **[PEP 586 – "Literal Types"][PEP 586].**  The `@beartype` decorator
  now fully supports the [new `typing.Literal` type hint introduced by
  Python ≥
  3.9](https://docs.python.org/3/library/typing.html#typing.Literal).
  Note, however, that beartype validators offer similar but
  significantly more practical support for type hint-based equality
  comparison in our new `beartype.vale.IsEqual` class.

## Issues Resolved

* **`typing.OrderedDict` under Python 3.7.0 and 3.7.1.** `@beartype` now
  conditionally imports the `typing.OrderedDict` singleton *only* if the
  active Python interpreter targets Python ≥ 3.7.2, the patch release
  that bizarrely changed the [public `typing` API by introducing this
  new public
  attribute](https://docs.python.org/3/library/typing.html#typing.OrderedDict).
  Doing so improves compatibility with both Python
  3.7.0 and 3.7.1 *and* resolves issue #33 – kindly reported by
  @aiporre, the dancing unicorn that radiates sparkles named Ariel.

## Tests Improved

* **Test coverage.** The test suite for `@beartype` now automatically
  generates test coverage metrics – resolving #20:
  * Locally via the third-party `coverage` package (if importable under
    the active Python interpreter). `@beartype` intentionally leverages
    the `coverage` package directly rather than its higher-level
    `pytest-cov` wrapper, as the latter offers no tangible benefits over
    the former while suffering various tangible harms. These include:
    * Insufficient configurability, preventing us from sanely generating
      XML-formatted reports via our existing `tox.ini` configuration.
    * Ambiguous output, preventing us from sanely differentiating
      expected from unexpected behaviours.
    * Argumentative and strongly opinionated developers, which is
      frankly *never* a good look for open-source volunteerism.
  * Remotely via the [third-party Codecov.io coverage
    service](https://about.codecov.io), integrated with the
    [codecov/codecov-action](https://github.com/codecov/codecov-action)
    now performed on each commit and pull request by our GitHub Actions
    continuous integration (CI) workflow. This commit is the last in a
    commit chain  by measuring test coverage driven
    Specifically, this commit refactors  (*Innumerable insinuation!*)
* **Python Development Mode (PDM).** The PDM (e.g., `-X dev`,
  `PYTHONDEVMODE`) is now enabled by default under both pytest and tox
  and thus continuous integration (CI), mildly improving the robustness
  of our test suite in edge cases that absolutely should *never* apply
  (e.g., GIL and memory safety violations) but probably will, because
  bad things always happen to good coders. It's, like, a law.

## Documentation Revised

* **See Also.** The *See Also* section of our front-facing `README.rst`
  documentation has been significantly expanded with:
  * A comparative review of all known runtime type checkers.
  * A new *Runtime Data Validators* subsection enumerating all known
    runtime validation (e.g., contract, trait) packages.

  [beartype validators]: https://github.com/beartype/beartype#beartype-validators
  [PEP 563]: https://www.python.org/dev/peps/pep-0563
  [PEP 586]: https://www.python.org/dev/peps/pep-0586
  [PEP 593]: https://www.python.org/dev/peps/pep-0593

(*Winsome winners ransom random dendritic endoscopy!*)
@xerz-one
Copy link

xerz-one commented May 29, 2022

Hey there, any chance we'll get those higher level beartype.constraints? I was particularly interested about RegexConstraint, as that seems better than compiling a regex somewhere (globally?) and then taking that regex object into a validator lambda without even checking its type first, or else compiling the regex on each performed validation.

@leycec
Copy link
Member

leycec commented May 31, 2022

So much "Yes." Thanks for reminding me about my shameful laziness, @xerz-one. The long-dormant issue of high-level constraints has, indeed, laid dormant for too long.

Most of the constraints originally listed by @Saphyel above ultimately reduce to regular expression-based matching. But as you astutely suggest, that itself raises non-trivial questions relating to space and time efficiency... like:

  • Do we compile the regular expression?
  • If so, where do we cache the compiled regular expression to?

These are fascinating conundrums. Let's open a new feature request for this, @beartype!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants