Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Documentation] Document @dataclass integration better #56

Closed
JulesGM opened this issue Oct 8, 2021 · 29 comments
Closed

[Documentation] Document @dataclass integration better #56

JulesGM opened this issue Oct 8, 2021 · 29 comments

Comments

@JulesGM
Copy link

JulesGM commented Oct 8, 2021

Big fan of the project.
I was thinking that something like

@beartype
@dataclasses.dataclass
class Whatever:
    something: int
    other_thing: string


if __name__ == "__main__":
    asd = Whatever(123.321, b"Other error")  # Beartype error

Would be really great.

@leycec
Copy link
Member

leycec commented Oct 8, 2021

YES. Let us quietly reach for the open throat of pydantic under the veiled cover of moonlit darkness, for such is the dominion of... Actually, let's just do this without all the creepy metaphors. (That's why listening to five continuous hours of Norwegian black metal has real-life consequences, people.)

Srsly, tho. This must happen. You even stacked the decorators correctly, layering @beartype onto the standard @dataclasses.dataclass decorator without replacing the latter with our own ad-hoc @beartype.dataclass decorator. This is the Way. You now gratefully receive five golden sunbursts in acknowledgement of your contributions to mankind, Jules. 🔆 ⭐ 🌟 ☀️ 🌞

@leycec Has a Dream

I've always believed that:

  • Pydantic's substitution of the standard @dataclasses.dataclass decorator with its own ad-hoc @pydantic.dataclasses.dataclass decorator is the fundamentally wrong approach. Let the Python standard library do what the Python standard library excels at, which is everything except type-checking, data science, and web dev. Don't reinvent the wheel. The wheel already exists, is well-tested, is well-maintained by a large international consortium of developers who are both better paid and more talented than me, i see a correlation between those two things and has been proven to behave as expected. Just use that existing wheel, yo!
  • Pydantic's heavyweight BaseModel approach is also the fundamentally wrong approach – because it's heavyweight. As soon as you require user-defined classes to subclass your custom third-party abstract base class (ABC) whose own class is a custom third-party metaclass that conflicts with the only standard metaclass that matters (I speak, of course, of abc.ABCMeta, whose very classname is holy and divine), you invite horrifying metaclass and subclass (e.g., "Diamond Problem") conflicts. This is why pydantic questions on StackOverflow reduce to: "Pydantic class conflict. Help me, O' Nordic Gods!?!?"

Let's not do either of those things. The key to avoiding those things is to dynamically inspect and generate code at decoration time. If you do that, you don't need dark metaclass or superclass magic at variable access time – because you've already front-loaded all the dark magic to decoration time, where dark magic properly belongs.

Of course, dynamically inspecting and generating code at decoration time is already the core of what @beartype does. So that's noice. We just need to generalize what @beartype does to now cover classes as well.

Admittedly, that's kinda non-trivial. ←understatement alert But non-trivial means fun, right? Right? I also love co-opting competitor ideas and folding them like a rapacious cult of Genestealers into our own genomic codebase.

fear them! fear them all!
pretty sure we're the purple guys in this metaphor

Decorate Them All... And in the Darkness Type-check Them

You may have guessed, but this exact issue was my principal rationale for beartype validators. Everything's been leading up to this, really.

Now that beartype validators support Python ≥ 3.6 (via the typing_extensions.Annotated backport under Python < 3.9 and the official
typing.Annotated factory under Python ≥ 3.9), the stage is locked, loaded, and primed for beartype dataclasses.

In theory, generalizing @beartype to decorate classes shouldn't be too excrutiating. Here's how @leycec optimistically sees it. Our main task is to inspect the dictionary of the class currently being decorated and, for each class or instance variable declared at class scope, replace that variable with a dynamically generated data descriptor type-checking that variable in O(1) time after each assignment to that variable.

In theory, that sounds trivial. Okay... so it really doesn't. But let's pretend, because it's late and my tenuous grip on reality only has so much wiggle room.

In practice, the devil-in-the-details will be minimizing space costs. We really don't want to dynamically generate one new data descriptor for each class or instance variable. We want to cache data descriptors for subsequent reuse, because it's likely multiple classes will share class or instance variables annotated by the same type hints: e.g.,

@beartype
@dataclasses.dataclass
class LikeWhatever:
    something: int
    other_thing: List[str]

@beartype
@dataclasses.dataclass
class LikeIKnowRight:
    one_more_thing: List[str]   # <-- reuse LikeWhatever.one_thing's data descriptor
    yet_another_thing: str      # <-- reuse LikeWhatever.something's data descriptor

Phew. I feel even older just thinking about this. Since our first priority for the remainder of 2021 and probably most of 2022 ...let's be honest is deep type-checking for all remaining PEP 484- and 585-compliant type hints, fulfilling this feature request that everyone really wants will have to moulder on the back burner for just a little bit.

Let's choose to believe that one day even type-checking bears can fly.

@antonagestam
Copy link

I'm currently experimenting with this decorator, which seems to work fine because __init__ gets __annotations__ correctly set by the dataclass decorator.

from dataclasses import is_dataclass
from typing import TypeVar

import beartype

T = TypeVar("T")


def beartyped(cls: type[T]) -> type[T]:
    assert is_dataclass(cls)
    cls.__init__ = beartype.beartype(cls.__init__)  # type: ignore[assignment]
    return cls

@JulesGM
Copy link
Author

JulesGM commented Oct 8, 2021

what @antonagestam suggests is what I personally was looking for, just doing arg type-checking on the __init__ call of dataclasses.

@JulesGM
Copy link
Author

JulesGM commented Oct 8, 2021

I understand that making class decorators behave this way (meaning, be a shorthand for decorating whatever init the class ends up with) would hamper the development of further class decorators though.

For a regular class, one would just add the beartype decorator to the init call. In the case of dataclasses however... it's automatically generated. Maybe a @beartype.check_init or @beartype.decorate_init decorator that one uses on class declaration could work.

@beartype.decorate_init
@dataclasses.dataclass
class Whatever:
    something: int
    other_thing: string


if __name__ == "__main__":
    asd = Whatever(123.321, b"Other error")  # Beartype error

@leycec
Copy link
Member

leycec commented Oct 9, 2021

Ah... I finally get it. </wet_forehead_smacking_sound>

We're not talking about type-checking class or instance variables at assignment time, which is a hard but "fun" problem; we're talking about type-checking in-memory callables dynamically generated by an earlier decorator like @dataclasses.dataclass, which is an easier and maybe even funner problem.

Indeed, this bears ...see wut i did there some rumination. It'd be nice if whatever API we gin up in our bathtub cleanly generalizes to all dynamically generated callables – not just __init__() constructors synthesized by @dataclasses.dataclass.

I mean, __init__() constructors synthesized by @dataclasses.dataclass are the obvious use case. Sure. Can't argue there. But something similarly devilish is bound to crop up like a mouldy piece of organic broccoli on our dinner plate at some point.

Everything Should Work with Me Doing Nothing

Fortunately for my RSI-riddled fingers, this should already automagically happen once we generalize @beartype to handle classes. Why? Because @beartype will automagically apply itself to all methods of each decorated class – including in-memory methods dynamically generated by an earlier decorator like @dataclasses.dataclass.

In theory, @beartype shouldn't even be able to distinguish between on-disk methods that you explicitly write and in-memory methods that are implicitly written for you by someone else. As far as @beartype is concerned, they're all just unbound methods in the same class dictionary dir(cls). @beartype will type-check them all! muhaha

But What Should We Do until You Actually Do Something?

...what @antonagestam suggests. You knew that was coming, didn't you?

I'd probably rename @beartyped to something a bit more unambiguous like @beartype_decorate_init (as you smartly suggest, @JulesGM) or just @beartype_init or something.

Since @beartype will automagically handle all of this for everyone without requiring API changes when we generalize @beartype to decorate classes, I'd rather hold off on hard-coding something like @beartype.decorate_init into the codebase. We'd end up deprecating that anyway, at which point everyone would shake their mottled fists at us and demand all their GitHub stars back.

Does all of that sound reasonable? If not, I have five hours of Swedish melodic death metal that desperately needs listening to... cues Hypocrisy Destroys Wacken 🎧

@JakkuSakura
Copy link

Recently I developed typedmodel for this. Under the hook it's MetaClass and beartype. It's not very runtime efficient, but strict. Speed is not anymore the main concern when I choose python

https://github.com/qiujiangkun/typedmodel

@leycec
Copy link
Member

leycec commented Jan 5, 2022

Speed is not anymore the main concern when I choose python

...truer words were never spoken.

Relatedly, it's abundantly clear that everyone just wants @antonagestam's wonderful @beartyped decorator. It's wonderful because it works now and is literally two joyous lines of code, which is more than I can say for anything I've done about this – which is nothing.

Here's the short-term winning game plan:

  • Shamelessly copy-paste @antonagestam's aforementioned monument to clever one-liners into the @beartype decorator.
  • Release beartype 0.10.0 in a few days with that.
  • Wait several years to several decades.
  • Add real support for decorating classes to @beartype.
  • Release beartype 0.99.0 with that.

Specifically, I'm contemplating this:

from dataclasses import dataclass, is_dataclass
from typing import Type

T = TypeVar("T")

def beartype(thing: T) -> T:
    # Recursion, because exhausting the stack is "fun."
    if isinstance(thing, type):
        if is_dataclass(thing):
            thing.__init__ = beartype(thing.__init__)  # type: ignore[assignment]
            return thing

        raise BeartypeWrappeeException(
            f'{repr(thing)} not a @dataclasses.dataclass, yo. What up?')

    # Else, do everything else we currently do.
    ...

Sound good? Great. Let's do this thing. Perpetual thanks to @antonagestam for the no-nonsense contribution and to everyone else for continually referencing the obvious move I should have made several months ago.

Speak now or forever hold my beer.

leycec added a commit that referenced this issue Jan 6, 2022
This commit is the first in a commit chain adding support for PEP
557-compliant dataclasses (i.e., types decorated by the standard
`@dataclasses.dataclass` decorator) to the `@beartype` decorator,
en-route to resolving issue #56 kindly submitted by @JulesGM
(Jules Gagnon-Marchand) the Big Brain NLP researcher. Specifically, this
commit employs a clever one-liner employed by @antonagestam (Anton
Agestam) the ageless Swede that I stan for, to transparently type-check
the implicit `__init__()` method generated by @DataClass for
dataclasses. Naturally, nothing is tested; everything is broke. (*Wondrous dander wands!*)
@leycec
Copy link
Member

leycec commented Jan 6, 2022

the happening

leycec added a commit that referenced this issue Jan 8, 2022
This commit is the last in a commit chain adding support for PEP
557-compliant dataclasses (i.e., types decorated by the standard
`@dataclasses.dataclass` decorator) to the `@beartype` decorator,
resolving issue #56 kindly submitted by @JulesGM (Jules Gagnon-Marchand)
the Big Brain NLP researcher. Specifically, this commit adds support for
**dataclass-specific initialization-only instance variable type hints**
(i.e., `dataclasses.InitVar[...]` hints) as well as exhaustively testing
all likely edge cases in this support. 2022 lookin' up. (*Scalded tail-wagging scallywag!*)
@leycec
Copy link
Member

leycec commented Jan 8, 2022

As it was written, so it was done. Thanks to all for continually kicking this into nitrous-injected high-gear and to @antonagestam particularly for the game-changing one-liner. 2022 is looking up already; the evidence is here.

Nitrous injection: engage!

eat dust, sharkbait

@leycec leycec closed this as completed Jan 8, 2022
@JulesGM
Copy link
Author

JulesGM commented Jan 8, 2022 via email

@JulesGM
Copy link
Author

JulesGM commented Jan 8, 2022 via email

@leycec
Copy link
Member

leycec commented Jan 8, 2022

Ah! Spot-on point, as always.

In theory, type-checking __setattr__() for @dataclass-decorated classes enabling frozen=True should be trivial as well – assuming @dataclass synthesizes a __setattr__() dunder method that's just as nicely annotated as __init__(). Is that the case? I... have no idea whatsoever.

Allow me to scratch my chin thoughtfully while top men look into this.

@posita
Copy link
Collaborator

posita commented Jan 8, 2022

Allow me to scratch my chin thoughtfully while top men look into this.

Apologies in advance. I'm ethically (and—in some jurisdictions—legally) obligated to post this:

Top…men.

@leycec
Copy link
Member

leycec commented Jan 8, 2022

ohsh—

Literally cannot believe that obliquely referencing "top men" was enough to trigger @posita's 80's neural net. You've finely honed that neocortex on the finest chad-tier media yet produced by humankind; now, you are literally inside my mind and I'm not sure how to feel about that. Kuato would like to have a word.

do it, do it!

@leycec
Copy link
Member

leycec commented Jan 9, 2022

@JulesGM: Sadly, @dataclass fails to annotate the __setattr__() or __delattr__() methods it synthesizes. Of course, that kinda makes sense; these methods just unconditionally raise dataclasses.FrozenInstanceError exceptions when they're called. Annotating something guaranteed to fail would be sorta pointless, right?

But please do shoot us another feature request if you think up anything else ingenious. An obvious example would be for @beartype to auto-generate type-checking data descriptors for all dataclass fields, which would type-check attempts to reassign those fields. I want this.

Let's make every dream come true in 2022.

@JulesGM
Copy link
Author

JulesGM commented Jan 9, 2022 via email

@leycec
Copy link
Member

leycec commented Jan 10, 2022

Well, dataclasses aren't always frozen?

wut u say!?!?

Of course, you are correct. It was my (mis)understanding from the official documentation that @dataclass only synthesizes __setattr__() and __delattr__() methods when passed frozen=True – but perhaps this too is a baldfaced spittle-flecked lie and @dataclass is actually unconditionally injecting __setattr__() and __delattr__() into everything?

Is the method decorated if the dataclass is not frozen?

Nope, according to my limited grasp of physical reality anyway. The official documentation reads:

frozen: If true (the default is False), assigning to fields will generate an exception. This emulates read-only frozen instances. If __setattr__() or __delattr__() is defined in the class, then TypeError is raised.

I take this to mean that @dataclass intentionally avoids defining __setattr__() and __delattr__() methods for you unless it absolutely has to (e.g., when frozen=True). Why? Because doing so would conflict with user-defined __setattr__() and __delattr__() methods, which would be bad.

Can we just decorate in that case?

If by "decorate," you mean "manually define @beartype-decorated __setattr__() and __delattr__() methods," the answer is: "Yes, please. This is the way. Do it for Johnny."

Minimal-length example from our Cheatsheet or it didn't happen, so:

# Import the requisite machinery. Note this requires Python ≥ 3.8.
from dataclasses import dataclass, InitVar

# User-defined dataclass. @beartype currently only type-checks the implicit
# __init__() method generated by @dataclass. Fields are type-checked *ONLY*
# at initialization time and thus *NOT* type-checked when reassigned to.
@beartype
@dataclass
class MyDataclass:
    # Annotate fields with PEP-compliant type hints.
    field_must_satisfy_builtin_type: InitVar[str]
    field_must_satisfy_pep604_union: str | None = None

    # Decorate explicit methods with @beartype as above.
    @beartype
    def __post_init__(self, field_must_satisfy_builtin_type: str) -> None:
        if self.field_must_satisfy_pep604_union is None:
            self.field_must_satisfy_pep604_union = (
                field_must_satisfy_builtin_type)

@dataclass API: why you so confusing? 😬

@JulesGM
Copy link
Author

JulesGM commented Jan 10, 2022

So, how about @beartype defining a default __setattr__ if there aren't any user-defined ones?
I guess there is a line that must be put, where support for most use cases of dataclasses ends and support for regular classes begins. I do feel that maybe only defining type-checking __setattr__ if there aren't user-defined ones keeps us on the good side of the line.
BTW, thanks for all your awesome work and super pleasant energy, as always, Cecil.

@leycec
Copy link
Member

leycec commented Jan 11, 2022

So, how about @beartype defining a default __setattr__ if there aren't any user-defined ones?

Heh. You're brilliant, so you went there. Because I'm less brilliant, please bear heh with me.

I'm pretty sure (but not certain) you're wondering when @beartype will type-check attempts to reassign fields after dataclass instantiation, right? In other words, you want this:

>>> @beartype
>>> @dataclass
>>> class DoItDoIt(object):
....     check_it_check_it: int = 0xDEADBABE
>>> do_it_do_it = DoItDoIt()
>>> do_it_do_it.check_it_check_it = 'This is an int. I swear.'
beartype.roar.BeartypeException: "This is an int. I swear."
is not actually an int. Your swears mean nothing!

If so, you're in deep luck. There are innumerably many ways to accomplish that lofty goal – and @beartype synthesizing a type-checking __setattr__() for you is (surprisingly!) one of the worst, mostly because of the aforementioned conflict with user-defined __setattr__() methods.

The best approach (from my limited understanding of physical reality) is to improve:

...@beartype to auto-generate type-checking data descriptors for all dataclass fields, which would type-check attempts to reassign those fields. I want this.

Data descriptors are dark and profane Python magic, but the core takeaway here is that:

  • For each field of your dataclass:
    • @beartype will entirely replace that field with a new data descriptor of the same name masquerading as that field, whose:
      • __get__() method just unconditionally returns that field.
      • __set__() method (in order):
        1. Type-checks the passed value.
        2. If that type-check succeeds, sets that field to that value.

That's more-or-less how CPython implements the standard @property decorator, for example. I'm genuinely surprised that @dataclass(frozen=True) is implemented with __setattr__() rather than data descriptors, as the latter are always preferable to the former.

Someone who in not me dropped the greasy ball there. 😬

@antonagestam
Copy link

antonagestam commented Jan 11, 2022

@leycec Wouldn't you need to define __setattr__ anyway to raise when assigning to undefined fields? I'm guessing now, but it seems like having only __setattr__ would result in fewer function calls as opposed to __setattr__ + property.

Anyways, as a "datapoint", I think it's completely reasonable to not support checking types om assignment. But then again, I also think having frozen=False as default was a mistake ;)

@leycec
Copy link
Member

leycec commented Jan 12, 2022

Wouldn't you need to define __setattr__ anyway to raise when assigning to undefined fields?

You ask me this troubling question, Anton. To better understand my delicate position, let me rephrase your question as:

Should @beartype prevent third parties from monkey-patching your @dataclass?

First, let's quietly admit to ourselves from the filthy confines of our ice-encrusted northerly mancaves that @beartype itself is a third-party monkey-patching your @dataclass to implement its diabolical data descriptor scheme. Hypocrisy, thy name is @beartype.

Second, let's quietly point to the sign on the wall reading @dataclass(slots=True). CPython 3.10 (and sorta prior CPython releases) already ships with built-in support for prohibiting unwanted field assignments, because CPython 3.10 is the Big Boss:

from dataclasses import dataclass

# Do you believe in miracles?
@dataclass(frozen=True, slots=True)
class Point:
    x: int = 0
    y: int = 0

So, the answer to this question and more is: "By the dessicated beard of Odin, this must not be." Thus spake the Bear.

...I think it's completely reasonable to not support checking types on assignment.

...yup. I'm right there with you – by default, anyway.

My main concern with check-on-assignment is the performance hit, because when isn't the performance hit my main concern? Never, that's when it isn't. wut

In this case, each assignment costs you the function overhead of calling a data descriptor's __set__() method. In some contexts, that's fine; in others, the machine learning guys are already parading my head on a pike in downtown Gothenburg if I enable that by default in @beartype.

To appease everyone, I'm currently finalizing a configuration API for (wait-for-it) beartype 0.10.0 to be released in a few more days I swear this by the frayed beard of Odin. I'm contemplating a hypothetical configuration option is_check_var_assignment=True for the next-next beartype 0.11.0 release cycle enabling users to type-check field reassignment on a @dataclass-by-@dataclass basis: e.g.,

from beartype import beartype, BeartypeConf
from dataclasses import dataclass

# Don't ask why this looks so awkward. Don't.
@beartype(conf=BeartypeConf(is_check_var_assignment=True))
@dataclass
class Point:
    x: int = 0
    y: int = 0

But then again, I also think having frozen=False as default was a mistake ;)

...heh. A fellow frozen-favouring gentleman of culture, I see. I'm right there with you yet again, actually. What is this, Group Hug Time!?!? It may very well be.

The Big Takeaway from functional programming is that a shockingly 😨 large class of bugs can be entirely eliminated by minimizing (or even prohibiting) object mutation. Of course, doing so usually also comes at a minor cost of pillaging and burning reducing the usability of your API – but I'm willing to sacrifice your API at the golden altar of robustness if you are.

I'm possibly still jelly that Rubyists got Object#freeze() and Object#frozen methods out-of-the-box, while Pythonistas can't even get robust freezing and slotting semantics in 2022 over here. Matz, we summon thee to finally join the Python world.

@antonagestam
Copy link

Sorry, I was bit vague, this remark:

Wouldn't you need to define __setattr__ anyway to raise when assigning to undefined fields?

Was in response to:

I'm genuinely surprised that @dataclass(frozen=True) is implemented with __setattr__()

E.g. just reasoning as to why frozeness is implemented with __setattr__().

And, it turns out I was misinformed, I for some reason thought the below would yield an error. Since it doesn't you're most likely right about it being weird that frozen=True isn't implemented with descriptors.

>>> from dataclasses import dataclass
>>> @dataclass(frozen=True)
... class A: ...
... 
>>> A.foo = "bar"

Completely agree about frozeness generally, my mind keeps grasping for a Python with much less mutability!

@leycec
Copy link
Member

leycec commented Jan 13, 2022

...my mind keeps grasping for a Python with much less mutability!

...heh. So, I have bad news and I have good news.

The bad news is Guido and Alex Martelli intentionally scuttled a working PEP proposal to introduce a freeze protocol via a new freeze() builtin that was on the Standards Track two-and-a-half decades ago. The good news is Guido and Alex Martelli were wrong.

Martelli's caustic rebuttal of PEP 351 makes utterly no sense from the rear view window of 2022. A throbbing varicose vein (blotched with ugly purple mottling, of course!) pops out on my forehead whenever I revisit that rebuttal. Everyone on Python's Steering Committee at the time badly missed the boat, which is just:

You freeze objects to eliminate bugs. That is all.

Instead, Martelli kicks up and attacks all these indefensible straw men like hashability and polymorphism – without ever addressing the fundamental raison d'etre for freezing, which is to stomp bugs repeatedly until they die. 🤕

@GregHilston
Copy link

@leycec I just wanted to add that your first reply to this thread is why I'll be switching to Beartypes over Pydantic. I honestly believe the examples you demonstrated here should be part of the most forward facing documentation. I spent some time trying to compare and contrast the two packages, and your comment here, did the best job of providing that information.

I wasn't sure where to provide this feedback, so if its inappropriate to do so here, I can remove this post

@leycec
Copy link
Member

leycec commented Feb 3, 2022

Fist bump! Thanks so much for the gracious feedback, Greg. Love your avatar, too. That's the most serious game-dev-casually-leaning-against-a-tree photo op I've ever seen. Never change that.

I honestly believe the examples you demonstrated here should be part of the most forward facing documentation.

You are unequivocally correct. Our documentation needs a jab in the arm from a qualified technical writer or documentation specialist... yet all we have is me. At the moment, the only public documentation on the subject is this @dataclass subsection in our sprawling Cheatsheet:

# ..................{              CLASSES ~ dataclass   }..................
# Import the requisite machinery. Note this requires Python ≥ 3.8.
from dataclasses import dataclass, InitVar

# User-defined dataclass. @beartype currently only type-checks the implicit
# __init__() method generated by @dataclass. Fields are type-checked *ONLY*
# at initialization time and thus *NOT* type-checked when reassigned to.
@beartype
@dataclass
class MyDataclass:
    # Annotate fields with PEP-compliant type hints.
    field_must_satisfy_builtin_type: InitVar[str]
    field_must_satisfy_pep604_union: str | None = None

    # Decorate explicit methods with @beartype as above.
    @beartype
    def __post_init__(self, field_must_satisfy_builtin_type: str) -> None:
        if self.field_must_satisfy_pep604_union is None:
            self.field_must_satisfy_pep604_union = (
                field_must_satisfy_builtin_type)

That's buried fairly far down, though. A dedicated human-readable section on integrating @dataclass + @beartype would go a long way to alleviating these usability concerns. Let's reopen this issue as a documentation request... so that I actually do this.

Thanks again, Greg – and thanks for digging deep into the wide world of @beartype. 🐻 🌳 🐻

@leycec leycec changed the title Feature suggestion: decorator with dataclasses [Documentation] Document @dataclass integration better Feb 3, 2022
@GregHilston
Copy link

@leycec its funny to me that you consider yourself not a technical writer, as I aspire to write as well as your comments here.

I admittedly have not even tried beartype yet, but once I've done so, I'll look into the pending issues on the project and see if there's any place I can help.

I'm a big fan of python, type hinting, and type checking with mypy, so it seems like data validation with beartype is a logical next step.

Thanks for updating this issue :)

@leycec
Copy link
Member

leycec commented Feb 4, 2022

I admittedly have not even tried beartype yet...

this cannot be

I'll look into the pending issues on the project and see if there's any place I can help.

That would be fantabulous. Since you name-drop technical writing, how do you feel about... Sphinx? </audience_gasp>

One of our more embarrassing issues that's been open since 0 A.B. (After Beartype) is the gradual refactoring of our inscrutable README.rst file into a cohesive suite of ReadTheDocs (RTD)-hosted documentation that no longer brings shame to @beartype. We've actually configured up a rudimentary Sphinx build complete with official RTD name-squatting. The groundwork's been laid... then no one bothered to build the house. 😓

Could be fun! Of course, that's what I tell all the volunteers.

...data validation with beartype is a logical next step.

@beartype: the final typing frontier.

These are the voyages of the type-checker beartype. It's infinitely long mission: to type-check strange new objects. To seek out new quality assurance and new PEPs. To boldly check what no code has checked before.

Thanks for updating this issue :)

Your faith is unsettling. Thank you for prodding me to do this. I swear this will be done. Someday. I swear.

leycec added a commit that referenced this issue Feb 9, 2022
This release titillates with scintillating support for **[PEP 557 --
Data Classes][PEP 557]**, **[PEP 570 -- Python Positional-Only
Parameters][PEP 570]**, and **[PEP 604 -- Allow writing union types as
X | Y][PEP 604]**.

This release resolves a bone-crushing **30 issues** (mostly shameless
dupes of one another, admittedly) and merges **3 pull requests.**
World-girdling changes include:

## Compatibility Added

* **[PEP 557 -- Data Classes][PEP 557].** `@beartype` now supports
  **dataclasses** (i.e., types decorated by the standard
  `@dataclasses.dataclass` decorator), resolving issue #56 kindly
  submitted by @JulesGM (Jules Gagnon-Marchand) the Big Brain NLP
  researcher. Specifically, `@beartype` now transparently type-checks:
  * **Dataclass-specific initialization-only instance variable type
    hints** (i.e., `dataclasses.InitVar[...]`).
  * The implicit `__init__()` method generated by `@dataclass` for
    dataclasses through a clever one-liner employed by @antonagestam
    (Anton Agestam) the ageless Swede that I stan for.
* **[PEP 570 -- Python Positional-Only Parameters][PEP 570].**
  `@beartype` now supports positional-only arguments and no one cares.
  Given the triviality, the rear view mirror of regret suggests we kinda
  should've implemented this sooner. Better late than never, best
  @beartype friends for life (BBFFL).
* **[PEP 604 -- Allow writing union types as X | Y][PEP 604].**
  `@beartype` now supports new-style set unions (e.g., `int | float`),
  resolving issue #71 kindly submitted by pro typing aficionado Derek
  Wan (@dycw). Thanks to Derek for the helpful heads up that @beartype
  was headed straight for typing disaster under Python ≥ 3.10. Since we
  dodged another bullet there, this must mean we have now activated
  bullet time. Goooooo, slomo!

## Compatibility Improved

* **[PEP 484 -- Type Hints][PEP 484],** including:
  * **`typing.{Binary,Text,}IO[...]` deep type-checking.** `@beartype`
    now deeply type-checks subscripted `typing.{Binary,Text,}IO[...]`
    type hints, resolving issue #75 kindly submitted by Niklas "If I had
    a nickel for every lass..." Rosenstein. Notably:
    * Since the `typing.BinaryIO` protocol and its `typing.IO`
      superclass share the exact same API, the `typing.BinaryIO`
      protocol is lamentably useless for *all* practical purposes. This
      protocol *cannot* be leveraged to detect binary file handles. Can
      binary file handles be detected at runtime then? Yes, we can! A
      binary file handle is any object satisfying the `typing.IO`
      protocol but *not* the `typing.TextIO` protocol. To implement this
      distinction, `@beartype` necessarily invented a novel form of
      type-checking and a new variant of type elision: **anti-structural
      subtyping.** Whereas structural subtyping checks that one class
      matches the API of another class (referred to as a "protocol"),
      anti-structural subtyping checks that one class does *not* match
      the API of another class (referred to as an "anti-protocol").
      `@beartype` public exposes this functionality via the new
      `beartype.vale.IsInstance[...]` validator, enabling *anyone* to
      trivially perform anti-structural subtyping. In this case,
      `@beartype` internally reduces all useless `typing.BinaryIO` type
      hints to substantially more useful `typing.Annotated[typing.IO,
      ~beartype.vale.IsInstance[typing.TextIO]]` type hints.
* **Unsubscripted NumPy type hints.** `@beartype` now supports **untyped
  NumPy array type hints** (i.e., the unsubscripted
  `numpy.typing.NDArray` and subscripted
  `numpy.typing.NDArray[typing.Any]` type hints), resolving issue #69
  kindly submitted by @Jasha10, the stylish boy wonder dual-wielding the
  double thumbs-up and coke-bottle glasses that signify elementary
  genius. Specifically, this commit now detects and reduces these hints
  to the equivalent `numpy.ndarray` type.
* **Mypy ≥ 0.920.** `@beartype` now squelches ignorable mypy complaints
  first introduced by mypy 0.920, including:
  * **Explicit reexport errors.** `beartype` now squelches implicit
    reexport complaints from mypy with respect to public attributes
    published by the `beartype.cave` subpackage, resolving issue #57
    kindly reopened by Göteborg melodic death metal protégé and
    brightest academic luminary @antonagestam. This subpackage is now
    compatible with both the `--no-implicit-reexport` mypy CLI option
    and equivalent `no_implicit_reexport = True` configuration setting
    in `.mypy.ini`.
  * **Version-dependent errors.** Previously, mypy permitted imports
    against standard library modules introduced in newer CPython
    versions to be squelched with the usual ``"# type:
    ignore[attr-defined]"`` pragma. Since mypy now ignores these
    pragmas, `@beartype` now silences its complaints through...
    *unconventional* means. A bear do wut a bear gotta do.

## Features Added

* **Compatibility API.** `beartype` now publishes a new
  `beartype.typing` API as a `typing` compatibility layer improving
  forward compatibility with future Python releases, resolving issue #81
  kindly submitted by the honorable @qiujiangkun (Qiu Jiangkun).
  Consider resolving [PEP 585][PEP 585] deprecations by importing from
  our new `beartype.typing` API rather than the standard `typing` API. A
  battery of new unit tests ensure conformance:
  * Between `beartype.typing` and `typing` across all Python versions.
  * With mypy when importing from `beartype.typing`.
* **Configuration API** (i.e., public attributes of the `beartype`
  package enabling end users to configure the `@beartype` decorator,
  including configuring alternative type-checking strategies *other*
  than constant-time runtime type-checking). Specifically, `beartype`
  now publishes:
  * `beartype.BeartypeStrategy`, an enumeration of all type-checking
    strategies to *eventually* be fully supported by future beartype
    releases – including:
    * `BeartypeStrategy.O0`, disabling type-checking for a callable by
      reducing `@beartype` to the identity decorator for that callable.
      Although currently useless, this strategy will usefully allow end
      users to selectively prevent callables from being type-checked by
      our as-yet-unimplemented import hook. When implemented, that hook
      will type-check *all* callables in a given package by default.
      Some means is needed to prevent that from happening for select
      callables. This is that means.
    * `BeartypeStrategy.O1`, our default `O(1)` constant-time strategy
      type-checking a single randomly selected item of a container that
      you currently enjoy. Since this is the default, this strategy need
      *not* be explicitly configured. Of course, you're going to do that
      anyway, aren't you? `</sigh>`
    * `BeartypeStrategy.Ologn`, a new `O(lgn)` logarithmic strategy
      type-checking a randomly selected number of items `j` of a
      container `obj` such that `j = len(obj)`. This strategy is
      **currently unimplemented** (but will be implemented by a future
      beartype release).
    * `BeartypeStrategy.On`, a new `O(n)` linear strategy
      deterministically type-checking *all* items of a container. This
      strategy is **currently unimplemented** (but will be implemented
      by a future beartype release).
  * `beartype.BeartypeConf`, a simple dataclass encapsulating all flags,
    options, settings, and other metadata configuring the current
    decoration of the decorated callable or class. For efficiency, this
    dataclass internally self-caches itself (i.e.,
    `BeartypeConf(*args, **kwargs) is BeartypeConf(*args, **kwargs)`).
    The `__init__()` method of this dataclass currently accepts these
    optional parameters:
    * An `is_debug` boolean instance variable. When enabled, `@beartype`
      emits debugging information for the decorated callable – including
      the code for the wrapper function dynamically generated by
      `@beartype` that type-checks that callable.
    * A `strategy` instance variable whose value must be a
      `BeartypeStrategy` enumeration member. This is how you notify
      `@beartype` of which strategy to apply to each callable.
  * **Wrapper function debuggability.** Enabling the `is_debug`
    parameter to the `BeartypeConf.__init__` method significantly
    improves the debuggability of type-checking wrapper functions
    generated by `@beartype`. This configuration option is entirely
    thanks to @posita the positive Numenorean, who pined longingly for
    debuggable wrapper functions and now receives proportionately.
    Praise be to @posita! He makes bears better. Specifically, enabling
    this option enables developer-friendly logic like:
    * Pretty-printing to stdout (standard output) the definitions of
      those functions, including line number prefixes for readability.
    * Enabling those functions to be debugged. Thanks to a phenomenal
      pull request by the dynamic dual threat that is @posita **+**
      @TeamSpen210, `@beartype` now conditionally caches the bodies of
      type-checking wrapper functions with the standard (albeit poorly
      documented) `linecache` module. Thanks so much! Bear Clan 2022!!!
    * Suffixing the declarations of `@beartype`-specific hidden private
      "special" parameters passed to those functions with comments
      embedding their human-readable representations. Safely generating
      these comments consumes non-trivial wall clock at decoration time
      and is thus conditionally enabled for external callers requesting
      `@beartype` debugging. For example, note the `"# is"`-prefixed
      comments in the following signature of a `@beartype`-generated
      wrapper function for an asynchronous callable with signature
      `async def control_the_car(said_the: Union[str, int],
      biggest_greenest_bat: Union[str, float]) -> Union[str, float]:`

      ``` python
      (line 0001) async def control_the_car(
      (line 0002)     *args,
      (line 0003)     __beartype_func=__beartype_func, # is <function test_decor_async_coroutine.<locals>.control_the_car at 0x7>
      (line 0004)     __beartype_raise_exception=__beartype_raise_exception, # is <function raise_pep_call_exception at 0x7fa13d>
      (line 0005)     __beartype_object_140328307018000=__beartype_object_140328307018000, # is (<class 'int'>, <class 'str'>)
      (line 0006)     __beartype_object_140328306652816=__beartype_object_140328306652816, # is (<class 'float'>, <class 'str'>)
      (line 0007)     **kwargs
      (line 0008) ):
      ```
* **Decorator modality.** `@beartype` now supports two orthogonal modes
  of operation:
  * **Decoration mode** (i.e., the standard mode where `@beartype`
    directly decorates a callable *without* being passed parameters). In
    this mode, `@beartype` reverts to the default configuration of
    constant-time runtime type-checking and *no* debugging behaviour.
  * **Configuration mode** (i.e., the new mode where `@beartype` is
    called as a function passed a `BeartypeConf` object via the
    keyword-only `conf` parameter). In this mode, `@beartype`
    efficiently creates, caches, and returns a memoized decorator
    encapsulating the passed configuration: e.g.,

    ``` python
    from beartype import beartype, BeartypeConf, BeartypeStrategy

    @beartype(conf=BeartypeConf(strategy=BeartypeStrategy.On))
    def muh_func(list_checked_in_linear_time: list[int]) -> int:
        return len(list_checked_in_linear_time)
    ```
  * Specifically, this commit extricates our core
    `@beartype` decorator into a new private `beartype._decor._core`
    submodule in preparation for subsequently memoizing closures
    encapsulating that decorator returned by invocations of the form
    `@beartype.beartype(conf=BeartypeConf(...))`
* **Declarative instance validator.** `beartype` now publishes a new
  `beartype.vale.IsInstance[...]` validator enforcing instancing of one
  or more classes, generalizing **isinstanceable type hints** (i.e.,
  normal pure-Python or C-based classes that can be passed as the second
  parameter to the ``isinstance()`` builtin). Unlike standard
  isinstanceable type hints, `beartype.vale.IsInstance[...]` supports
  various set theoretic operators. Critically, this includes negation.
  Instance validators prefixed by the negation operator `~` match all
  objects that are *not* instances of the classes subscripting those
  validators. Wait. Wait just a hot minute there. Doesn't a
  typing.Annotated_ type hint necessarily match instances of the class
  subscripting that type hint? Yup. This means type hints of the form
  `typing.Annotated[{superclass}, ~IsInstance[{subclass}]` match all
  instances of a superclass that are *not* also instances of a subclass.
  And... pretty sure we just invented type hint arithmetic right there.
  That sounded intellectual and thus boring. Yet, the disturbing fact that
  Python booleans are integers <sup>yup</sup> while Python strings are
  infinitely recursive sequences of strings <sup>yup</sup> means that
  type hint arithmetic can save your codebase from Guido's younger self.
  Consider this instance validator matching only non-boolean integers,
  which *cannot* be expressed with any isinstanceable type hint (e.g.,
  ``int``) or other combination of standard off-the-shelf type hints
  (e.g., unions): `Annotated[int, ~IsInstance[bool]]`. ← *bruh*
* **Functional API.** `beartype` now publishes a new public
  `beartype.abby` subpackage enabling users to type-check *anything*
  *anytime* against *any* PEP-compliant type hints, resolving feature
  request #79 kindly submitted by (*...wait for it*) typing Kung Fu
  master @qiujiangkun (Qiu Jiangkun). This subpackage is largely thanks
  to @qiujiangkuni, whose impeccable code snippets drive our initial
  implementation. This subpackage provides these utility functions:
  * `beartype.abby.is_bearable()`, strictly returning a boolean
    signifying whether the passed arbitrary object satisfies the passed
    type hint or not (e.g., `is_bearable(['the', 'centre', 'cannot',
    'hold;'], list[int]) is False`).
  * `beartype.abby.die_if_unbearable()`, raising the new
    `beartype.roar.BeartypeAbbyHintViolation` exception when the passed
    arbitrary object violates the passed type hint.

## Features Improved

* **Exception message granularity,** including exceptions raised for:
  * **Disordered builtin decorators.** `@beartype` now raises
    instructive exceptions when decorating an uncallable descriptor
    created by a builtin decorator (i.e., `@property`, `@classmethod`,
    `@staticmethod`) due to the caller incorrectly ordering `@beartype`
    above rather than below that decorator, resolving issue #80 kindly
    submitted by typing academician @qiujiangkun (Qiu Jiangkun).
    Specifically, `@beartype` now raises human-readable exceptions
    suffixed by examples instructing callers to reverse decoration
    ordering.
  * **Beartype validators.** `@beartype` now appends a detailed
    pretty-printed diagnosis of how any object either satisfies or fails
    to satisfy any beartype validator to exception messages raised by
    high-level validators synthesized from lower-level validators (e.g.,
    via overloaded set theoretic operators like `|`, `&`, and `~`),
    resolving issue #72 kindly submitted by the unwreckable type-hinting
    guru Derek Wan (@dycw). This diagnostic trivializes validation
    failures in non-trivial use cases involving multiple nested
    conjunctions, disjunctions, and/or negations.

## Features Optimized

* **`@beartype` call-time performance.** `@beartype` now generates
  faster type-checking wrapper functions with a vast and undocumented
  arsenal of absolutely "legal" weaponry, including:
  * **`typing.{Generic,Protocol}` deduplication.** `@beartype` now
    microoptimizes away redundant `isinstance()` checks in wrapper
    functions checking `@beartype`-decorated callables annotated by
    **PEP 484-compliant subgenerics or PEP 585-compliant subprotocols**
    (i.e., user-defined classes subclassing user-defined classes
    subclassing `typing.{Generic, Protocol}`), resolving issue #76
    kindly submitted by @posita the positive numerics QA guru and
    restoring the third-party `numerary` package to its glory. Our
    generics workflow has been refactored from the ground-up to stop
    behaving insane. `@beartype` now performs an inner breadth-first
    search (BFS) across generic pseudo-superclasses in its existing
    outer BFS that generates type-checking code. When you're nesting a
    BFS-in-a-BFS, your code went full-send. There's no going back from
    that.
 * **Worst-case nested data structures.** `@beartype` now resolves a
   performance regression in type-checking wrapper functions passed
   worst-case nested data structures violating PEP-compliant type hints,
   resolving issue #91 kindly submitted by Cuban type-checking
   revolutionary @mvaled (Manuel Vázquez Acosta). Specifically, this
   commit safeguards our low-level `represent_object()` function
   stringifying objects embedded in exception messages describing
   type-checking violations against worst-case behaviour. A new unit
   test shieldwalls against further performance regressions. All our
   gratitude to @mvaled for unveiling the darkness in the bear's heart.
* **`@beartype` decoration-time performance.** The `@beartype` decorator
  has been restored to its prior speed, resolving performance
  regressions present throughout our [0.8.0, 0.10.0) release cycles.
  Significant decoration-time optimizations include:
  * **Code objects.** `@beartype` now directly accesses the code object
    underlying the possibly unwrapped callable being decorated via a
    temporary cache rather than indirectly accessing that code object by
    repeatedly (and expensively) unwrapping that callable, dramatically
    optimizing low-level utility functions operating on code objects.
  * **Exception messages.** `@beartype` now defers calling expensive
    exception handling-specific functions until an exception is raised,
    dramatically restoring our decoration-time performance to the
    pre-0.8.0 era – which isn't that great, honestly. But we'll take
    anything. Substantial optimizations remain, but we are dog-tired.
    Moreover, DQXIS:EofaEA (...that's some catchy name right there)
    ain't gonna play itself – *OR IS IT!?!* Cue creepy AI.
  * **Fixed lists.** `@beartype` now internally lelaxes inapplicable
    safety measures previously imposed by our internal `FixedList`
    container type. Notably, this type previously detected erroneous
    attempts to extend the length of a fixed list by subversively
    assigning a slice of that fixed list to a container whose length
    differs from that of that slice. While advisable in theory,
    `@beartype` *never* actually sliced any fixed list -- let alone used
    such a slice as the left-hand side (LHS) of an assignment. Disabling
    this detection measurably improves the efficiency of fixed lists
    across the codebase -- which is, after all, the entire raison d'etre
    for fixed lists in the first place. `</shaking_my_head>`
  * **Parameter introspection.** `@beartype` now introspects callable
    signatures using a homegrown lightweight parameter parsing API.
    `@beartype` previously introspected signatures using the standard
    heavyweight `inspect` module, which proved... *inadvisable.*
    All references to that module have been removed from timing-critical
    code paths. All remaining references reside only in timing-agnostic
    code paths (e.g., raising human-readable exceptions for beartype
    validators defined as anonymous lambda functions).
* **`@beartype` importation-time performance.** The `beartype` package
  now avoids unconditionally importing optional first- and third-party
  subpackages, improving the efficiency of the initial ``from beartype
  import beartype`` statement in particular. `beartype` now
  intentionally defers these imports from global module scope to the
  local callable scope that requires them. A new functional test
  guarantees this to be the case.

## Features Deprecated

* **Badly named exception classes,** to be removed in `beartype` 0.1.0.
  This includes:
  * `beartype.roar.BeartypeCallHintPepException`, deprecated by
    `beartype.roar.BeartypeCallHintViolation`.
  * `beartype.roar.BeartypeCallHintPepParamException`, deprecated by
    `beartype.roar.BeartypeCallHintParamViolation`.
  * `beartype.roar.BeartypeCallHintPepReturnException`, deprecated by
    `beartype.roar.BeartypeCallHintReturnViolation`.

## Documentation Revised

* The *Frequently Asked Questions (FAQ)* section of our front-facing
  `README.rst` documentation now sports a medley of new entries,
  including instructions on:
  * **Boto3 integration,** enabling end users to type-check runtime
    types dynamically fabricated by Boto3 (i.e., the official Amazon Web
    Services (AWS) Software Development Kit (SDK) for Python), resolving
    issue #68 kindly submitted by Paul Hutchings (@paulhutchings) – the
    supremely skilled sloth rockin' big shades and ever bigger
    enthusiasm for well-typed Python web apps. Relatedly, the @beartype
    organization now officially hosts [`bearboto3`, Boto3 @beartype
    bindings by (*wait for it*) @paulhutchings](https:
    //github.com/beartype/bearboto3).
  * **Mock type type-checking,** resolving issue #92 kindly submitted by
    @Masoudas (Masoud Aghamohamadian-Sharbaf – wish I had an awesome
    name like that). Gratuitous shoutouts to @TeamSpen210 for the quick
    save with a ludicrous two-liner solving everything.

  [PEP 484]: https://www.python.org/dev/peps/pep-0484
  [PEP 557]: https://www.python.org/dev/peps/pep-0557
  [PEP 570]: https://www.python.org/dev/peps/pep-0570
  [PEP 585]: https://www.python.org/dev/peps/pep-0585
  [PEP 604]: https://www.python.org/dev/peps/pep-0604

The hype train is now boarding. All aboooooooard! (*Classless masterless masterclass!*)
@avicooper1
Copy link

To appease everyone, I'm currently finalizing a configuration API for (wait-for-it) beartype 0.10.0 to be released in a few more days I swear this by the frayed beard of Odin. I'm contemplating a hypothetical configuration option is_check_var_assignment=True for the next-next beartype 0.11.0 release cycle enabling users to type-check field reassignment on a @dataclass-by-@dataclass basis: e.g.,

from beartype import beartype, BeartypeConf
from dataclasses import dataclass

# Don't ask why this looks so awkward. Don't.
@beartype(conf=BeartypeConf(is_check_var_assignment=True))
@dataclass
class Point:
    x: int = 0
    y: int = 0

I'm curious if this was ever added – I've looked through the documentation and couldn't seem to find anything, but maybe I missed it. Any update on this would be super appreciated!

@leycec
Copy link
Member

leycec commented Jun 11, 2024

Hah, hah... ah. You're in luck! Actually, you're not in luck. This still doesn't exist. @leycec was bad and played video games all summer instead. You are now thinking:

"What part of this discussion makes me lucky?"

I was just getting to the lucky part. Rejoice, for these unfulfilling bullet points are better than nothing:

Basically, I failed to realize that everyone wanted me to fight Pydantic by implementing this. Instead, I devoted this summer to playing games going hard on @beartype's core mandate of rolling out deep O(1) type-checking for all of the remaining container types (e.g., collections, iterables, sets). Pretty much there, honestly. Couple more weeks before @beartype deeply type-checks the whole world. Would @leycec lie?

In hindsight, maybe I should've just fought Pydantic by implementing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants