Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Overly broad isinstanceable type detection raises false negatives #220

Closed
patrick-kidger opened this issue Mar 13, 2023 · 5 comments
Closed

Comments

@patrick-kidger
Copy link
Contributor

patrick-kidger commented Mar 13, 2023

from beartype import beartype

class MetaFoo(type):
    def __instancecheck__(cls, other):
        return g()

class Foo(metaclass=MetaFoo):
    pass

@beartype
def f(x: Foo):
    pass

def g():
    return True

# beartype.roar.BeartypeDecorHintNonpepException: @beartyped __main__.f() parameter "x" <class '__main__.Foo'> uncheckable at runtime (i.e., not passable as second parameter to isinstance(), due to raising "name 'g' is not defined" from metaclass __instancecheck__() method).

It looks like beartype is calling isinstance on every type annotation eagerly, but it may not yet be valid to do so at decoration time.

@leycec
Copy link
Member

leycec commented Mar 14, 2023

Fascinating. You surmise correctly, as always:

It looks like beartype is calling isinstance on every type annotation eagerly...

Indeed, that is exactly what @beartype is doing. @beartype is doing this for a reasonable reason and should thus keep doing this — but more carefully to account for your very valid use case.

But why is @beartype doing this? Malicious types. The standard library is littered with all of these harmful classes that were intentionally designed to be incompatible with runtime type-checkers. They do this by (...waitforit) using a usually private metaclass whose __instancecheck__() method unconditionally raises an exception. We call these "non-isinstanceable types."

To raise human-readable exceptions, @beartype explicitly detects non-isinstanceable types at decoration time. And... we all see where this is going. For each type hint hint that is a class, @beartype detects whether that class is isinstanceable or not by calling isinstance(hint, object). If doing so either raises an exception or returns anything other than True, @beartype treats that class as non-instanceable by raising the exception you received above. </gulp>

Unfortunately, this isn't just an edge case. Since isinstanceable types are so common in the standard library, @beartype really does need to continue detecting them and preventing them from being used as type hints. Critically, this includes the private typing._SpecialForm metaclass that underlies most type hints:

class _SpecialForm(_Final, _NotIterable, _root=True):
    ...

    def __instancecheck__(self, obj):
        raise TypeError(f"{self} cannot be used with isinstance()")

    def __subclasscheck__(self, cls):
        raise TypeError(f"{self} cannot be used with issubclass()")

Unsurprisingly, typing._ProtocolMeta and typing._TypedDictMeta all behave similarly – albeit with differing exception messages. There's no meaningful commonality between those messages. So, substring matching is right out. But other grotesque things best unseen might still be in!

I See a QA Darkness

Yeah. Me three. But I'm pretty sure (but uncertain) that @beartype can have its cake and smear it all over its muzzle, too. @beartype can accommodate both isinstanceable type detection and your vital use case by cautiously reducing the scope of that detection. Specifically, we can:

  • Consider only types whose __instancecheck__() methods raise a TypeError exception as non-isinstanceable. The typing._SpecialForm metaclass does that; yours doesn't. All other non-isinstanceable types would simply be treated as isinstanceable. This has the disadvantage of being:
    • Overly fragile. The typing module often changes in outlandish ways. But... tests will catch any future breakage, which we can then repair. It's not all despair and black eye mascara. 👩‍🎤
    • Overly narrow. This is the larger concern. There are probably other third-party packages also defining non-isinstanceable types. If we do this, @beartype might lose the ability to detect them. It is a conundrum for the ages.
  • Consider all types whose __instancecheck__() methods raise an AttributeError exception as isinstanceable. Yours does that; the typing._SpecialForm metaclass doesn't do that. This has the disadvantage of being:
    • Overly narrow, yet again. There are probably more exception classes than just AttributeError that a metaclass in a partially-initialised module could raise. But... at least this specific issue would be resolved. Surely that means something!
  • Try to detect whether the module defining the metaclass of the type hint being examined is partially-initialised or not. I'm kinda loathe to get into importlib shenanigans, thanks to non-portable implementation details. The Dark Side leads that way. Still... an option we do not want to choose. Gah!

I'm partial to the first option. All non-isinstanceable types declared by the typing module raise TypeError exceptions. That's a rather specific exception class connoting a rather specific type-checking failure. Any third-party non-isinstanceable types raising other classes of exceptions are bad; their badness isn't @beartype's fault, so we can't be held responsible.

tl;dr

Let's reduce the scope of non-isinstanceable types to only those types whose metaclass __instancecheck__() methods raise a TypeError. Raise the fist of accuracy, @beartype! ✊

@leycec leycec changed the title False positive with metaclasses in partially-initialised modules. [Bug] Overly broad isinstanceable type detection raises false negatives Mar 14, 2023
@leycec
Copy link
Member

leycec commented Mar 14, 2023

This is definitely the worst of the cavalcade of issues you recently submitted, @patrick-kidger. Everything else is a feature request in disguise – but this is a genuine issue that's probably beating up somebody else's code, too.

Thankfully, resolving this is trivial. I think. I hope. Oh, Gods! Please let this be trivial! I'll tackle this headlong when vacuous free time permits. 😮‍💨

Thanks again for this tremendous outpouring of detailed issues. You rock the typing scene. @beartype is shuddering spasmodically under the massive weight of its oppressive failings, but we'll get there. Someday. Somehow. (Utopia is still in sight, everyone.)

@patrick-kidger
Copy link
Contributor Author

Hahaha, sorry-not-sorry for the huge number of issues. :)

IIUC, this is about detecting the poorly-behaving typing.Foo hints? What about doing something like:

module = getattr(hint, "__module__", "")
metamodule = getattr(type(hint), "__module__", "")
return len({module, metamodule}.intersection({"types", "typing"})) > 0

The metamodule check handles stuff like list[int], which is either a typing._GenericAlias or a types.GenericAlias depending on Python version.

@leycec
Copy link
Member

leycec commented Mar 16, 2023

Smart. Tragically, some metaclasses defined by the typing and types modules do actually support isinstance() checks in the standard way – if I recall correctly (which I usually don't). Even if that isn't the case now, that certainly could be the case under any future Python release later. @beartype is fragile enough already. We're the bone china of the typing world here. 🍵

...actually, bone china is surprisingly robust relative to non-bone china, which is the whole point of bone china; my metaphor doesn't even make sense, despite sounding cool.

Catching TypeError it is, then! \o/

leycec added a commit that referenced this issue Mar 16, 2023
This commit is the first in a commit chain narrowing our detection of
both **non-isinstanceable classes** (i.e., classes whose metaclasses
define PEP 3119-compliant `__instancecheck__()` dunder methods
unconditionally raising `TypeError` exceptions) and **non-issubclassable
classes** (i.e., classes whose metaclasses define PEP 3119-compliant
`__subclasscheck__()` dunder methods unconditionally raising `TypeError`
exceptions) to *only* accept `TypeError` exceptions as connoting
non-isinstanceability and non-issubclassability, en-route to resolving
issue #220 kindly submitted by *ex*traordinary Google X researcher
@patrick-kidger (Patrick Kidger). Previously, @beartype broadly treated
any class raising any exception whatsoever when passed as the second
parameter to `isinstance()` and `issubclass()` as non-isinstanceable and
non-issubclassable. Sadly, doing so erroneously raised false positives
for isinstanceable and issubclassable metaclasses that have yet to be
fully "initialized" by the early time the `@beartype` decorator performs
this detection at. Specifically, this commit narrows this detection in
*all* utility functions defined by our private
`beartype._util.cls.pep.utilpep3119` submodule. This *probably*
suffices, but let's implement a few additional tests in our next commit
to be absolutely certain. Break nothing! It's the @beartype way.
(*Rapid lapidary aphid!*)
leycec added a commit that referenced this issue Mar 17, 2023
This commit is the last in a commit chain narrowing our detection of
both **non-isinstanceable classes** (i.e., classes whose metaclasses
define PEP 3119-compliant `__instancecheck__()` dunder methods
unconditionally raising `TypeError` exceptions) and **non-issubclassable
classes** (i.e., classes whose metaclasses define PEP 3119-compliant
`__subclasscheck__()` dunder methods unconditionally raising `TypeError`
exceptions) to *only* accept `TypeError` exceptions as connoting
non-isinstanceability and non-issubclassability, resolving issue #220
kindly submitted by *ex*traordinary Google X researcher @patrick-kidger
(Patrick Kidger). Previously, @beartype broadly treated any class
raising any exception whatsoever when passed as the second parameter to
`isinstance()` and `issubclass()` as non-isinstanceable and
non-issubclassable. Sadly, doing so erroneously raised false positives
for isinstanceable and issubclassable metaclasses that have yet to be
fully "initialized" by the early time the `@beartype` decorator performs
this detection at. Specifically, this commit:

* Resolves significant issues in the prior commit's naive implementation
  of this resolution.
* Exhaustively unit tests this resolution. Until you test, you know
  nothing. Nothing, @beartype!

(*Flagrant flag rant!*)
@leycec
Copy link
Member

leycec commented Mar 17, 2023

Resolved by e643b91. Thanks so much for all the vigorous discussion, @patrick-kidger. Typing bros are the best bros. It is objectively true.

Now, let's see if we can knock out TypeGuard support at issue #221 as the last order of business for our upcoming beartype 0.13.0 release. The road to proper QA is fraught with hardship and woe.

@leycec leycec closed this as completed Mar 17, 2023
leycec added a commit that referenced this issue Apr 8, 2023
This minor release delivers pulse-quickening support for **pandera
(pandas) type hints,** **PEP 484,** **PEP 585**, **PEP 591**, **PEP
647**, **PEP 3119**, and **pseudo-callables.**

This minor release resolves **12 issues** and merges **2 pull
requests.** But first, a quiet word from our wondrous sponsors. They are
monocled QA wizards who serve justice while crushing bugs for humanity.
High fives, please!

## Beartype Sponsors

* [**ZeroGuard:** The Modern Threat Hunting
  Platform](https://zeroguard.com). *All the signals, All the time.*

Thunderous applause echoes through the cavernous confines of the Bear
Den. 👏 🐻‍❄️ 👏

And now... the moment we've waited for. A heinous display of plaintext
that assaults all five senses simultaneously.

## Compatibility Added

* **Pandera (pandas) type hints** (i.e., ad-hoc PEP-noncompliant type
  hints validating pandas `DataFrame` objects, produced by subscripting
  factories published by the `pandera.typing` subpackage and validated
  *only* by user-defined callables decorated by the ad-hoc
  PEP-noncompliant `@pandera.check_types` runtime type-checking
  decorator), resolving feature request #227 kindly submitted by
  @ulfaslakprecis (Ulf Aslak) the Big Boss Typer. @beartype now:
  * Transparently supports pandera's PEP-noncompliant
    `@pandera.check_types` decorator for deeply runtime type-checking
    arbitrary pandas objects.
  * *Always* performs a rudimentary `O(1)` `isinstance()`-based
    type-check for each Pandera type hint. Doing so substantially
    improves usability in common use cases, including:
    * Callables annotated by one or more pandera type hints that are
      correctly decorated by @beartype but incorrectly *not* decorated
      by the pandera-specific `@pandera.check_types` decorator.
    * (Data)classes annotated by one or more pandera type hints.
    * Pandera type hints passed as the second argument to
      statement-level @beartype type-checkers – including:
      * `beartype.door.is_bearable()`.
      * `beartype.door.die_if_unbearable()`.
  * Implements a non-trivial trie data structure to efficiently
    detect all type hints produced by subscriptable factories in the
    `pandera.typing` submodule. Let us pretend this never happened,
    @ulfaslakprecis.
* **PEP 484- and 585-compliant generator constraints.** This release
  relaxes prior constraints erroneously imposed by @beartype
  prohibiting both asynchronous and synchronous generator callables from
  being annotated as returning unsubscripted standard abstract base
  classes (ABCs) defined by the `collections.abc` module. Now, @beartype
  permits:
  * Asynchronous generator callables to be annotated as returning the
    unsubscripted `collections.abc.AsyncGenerator` type.
  * Synchronous generator callables to be annotated as returning the
    unsubscripted `collections.abc.Generator` type.
* **PEP 591** (i.e., `typing.Final[...]` type hints), partially
  resolving issue #223 kindly submitted by the acronym known only as
  @JWCS (Jude). @beartype now trivially reduces *all*
  `typing.Final[{hint}]` type hints to merely `{hint}` (e.g.,
  `typing.Final[int]` to `int`). In other words, @beartype no longer
  raises exceptions when confronted with final type hints and instead at
  least tries to do the right thing. This still isn't *quite* what
  everyone wants @beartype to do here; ideally, @beartype should also
  raise exceptions on detecting attempts to redefine instance and class
  variables annotated as `Final[...]`. Doing so is *definitely* feasible
  and exactly what @beartype should *eventually* do – but also
  non-trivial, because whatever @beartype *eventually* does needs to
  preserve compatibility with all implementations of the `@dataclass`
  decorator across all versions of Python now and forever. Cue that
  head-throbbing migraine. It's comin'! Oh, I can feel it!
* **PEP 647** (i.e., `typing.TypeGuard[...] type hints`), resolving
  feature request #221 kindly submitted by Google X researcher
  extraordinaire @patrick-kidger. @beartype now trivially reduces *all*
  `typing.TypeGuard[...]` type hints to the builtin `bool` type.

## Compatibility Improved

* **PEP 3119.** @beartype now detects both
  **non-isinstanceable classes** (i.e., classes whose metaclasses define
  PEP 3119-compliant `__instancecheck__()` dunder methods
  unconditionally raising `TypeError` exceptions) and
  **non-issubclassable classes** (i.e., classes whose metaclasses define
  PEP 3119-compliant `__subclasscheck__()` dunder methods
  unconditionally raising `TypeError` exceptions) more narrowly for
  safety, resolving issue #220 kindly submitted by *ex*traordinary
  Google X researcher @patrick-kidger (Patrick Kidger). Notably,
  @beartype now *only* accepts `TypeError` exceptions as connoting
  non-isinstanceability and non-issubclassability. Previously, @beartype
  broadly treated any class raising any exception whatsoever when passed
  as the second parameter to `isinstance()` and `issubclass()` as
  non-isinstanceable and non-issubclassable. Sadly, doing so erroneously
  raises false positives for isinstanceable and issubclassable
  metaclasses that have yet to be fully "initialized" at the early time
  the `@beartype` decorator performs this detection.

## Features Added

* **Pseudo-callable monkey-patching support.** `@beartype` now supports
  **pseudo-callables** (i.e., otherwise uncallable objects masquerading
  as callable by defining the `__call__()` dunder method), resolving
  feature request #211 kindly submitted by Google X typing guru
  @patrick-kidger (Patrick Kidger). When passed a pseudo-callable whose
  `__call__()` method is annotated by one or more type hints,
  `@beartype` runtime type-checks that method in the standard way.

## Documentation Revised

* **Literally everything,** also known as the release that migrated
  `README.rst` -> [Read the Docs
  (RtD)](https://beartype.readthedocs.io), resolving both issue #203
  kindly submitted by @LittleBigGene (AKA the dynamo of the cell) and
  ancient issue #8 kindly submitted by @felix-hilden (AKA the Finnish
  computer vision art genius that really made all of this possible).
  Readable documentation slowly emerges from the primordial soup of
  @beartype's shameless past for which we cannot be blamed. @leycec was
  young and "spirited" back then. Specifically, this release:
  * Coerces our prior monolithic slab of unreadable `README.rst`
    documentation into a website graciously hosted by Read the Docs
    (RtD) subdividing that prior documentation into well-structured
    pages, resolving issue #203 kindly submitted by @LittleBigGene (AKA
    the dynamo of the cell).
  * Documents *most* previously undocumented public APIs in the
    @beartype codebase. Although a handful of public APIs remain
    undocumented (notably, the `beartype.peps` submodule), these
    undocumented APIs are assumed to either be sufficiently unpopular or
    non-useful to warrant investing additional scarce resources here.
  * Updates our installation instructions to note @beartype's recent
    availability as official packages in the official package
    repositories of various Linux distributions. Truly, this can only be
    the final mark of pride. These include:
    * Gentoo Linux's Portage tree.
    * Arch Linux's Arch User Repository (AUR).
  * Improves the Python code sample embedded in the ["Are We on the
    Worst Timeline?" subsection of our **Beartype Errors**
    chapter](https://beartype.readthedocs.io/en/latest/api_roar/#are-we-on-the-worst-timeline).
    Thanks to @JWCS for their related pull request (PR) #210, which
    strongly inspired this bald-faced improvement to the usability of
    our `beartype.typing` API.
  * Circumvents multiple long-standing upstream issues in the PyData
    Sphinx theme regarding empty left sidebars via the requisite
    `_templates/sidebar-nav-bs.html` template hack shamelessly
    copy-pasted into literally *every* project requiring this theme.
    This includes @beartype, because why not spew boilerplate that
    nobody understands everywhere? Sadly, doing so requires pinning to a
    maximum obsolete version of this theme that will surely die soon.
    And this is why I facepalm. These issues include:
    * pydata/pydata-sphinx-theme#90.
    * pydata/pydata-sphinx-theme#221.
    * pydata/pydata-sphinx-theme#1181.
  * Truncates our `README.rst` documentation to a placeholder stub that
    just directs everyone to RtD instead.
  * Improves `linecache` integration commentary. Specifically, a pull
    request by @faangbait (AKA the little-known third member of Daft
    Punk) improves internal commentary in our private
    `beartype._util.func.utilfuncmake.make_func()` factory function
    responsible for dynamically synthesizing new in-memory functions
    on-the-fly. Our suspicious usage of `None` as the second item of
    tuples added as values to the standard `linecache.cache` global
    dictionary has now been documented. Thanks so much for this
    stupendous contribution, @faangbait!

## Tests Improved

* **Mypy integration.** This release improves our `test_pep561_mypy()`
  integration test to intentionally ignore unhelpful non-fatal warnings
  improperly emitted by mypy (which encourage usage of
  `typing_extensions`, oddly enough).
* **Sphinx integration.** This release resolves multiple intersecting
  issues involving integration testing of Sphinx + @beartype, including:
  * `test_beartype_in_sphinx()` h0tfix is h0t. This release generalizes
    our test-specific `test_beartype_in_sphinx()` integration test to
    support arbitrary versions of Sphinx, resolving issue #209 kindly
    submitted by @danigm the sun-loving Málaga resident who frolics in
    the sea that Canadians everywhere are openly jealous of.
    Specifically, this release fundamentally refactors this integration
    test to fork a new Python interpreter as a subprocess of the current
    `pytest` process running the `sphinx-build` command.
  * A Python 3.7-specific failure in our continuous integration (CI)
    workflow caused by Sphinx attempting to call deprecated
    functionality of the third-party `pkg_resources` package. This
    release simply avoids installing Sphinx entirely under Python 3.7;
    although admittedly crude, it's unclear how else @beartype could
    possibly resolve this. Since Python 3.7 has almost hit its official
    End-Of-Life (EOL) and thus increasingly poses a security concern,
    this is hardly the worst resolution ever. Really! Believe what we're
    saying.

Break nothing! It's the @beartype way. This is why @leycec cries like a
mewling cat with no milk. (*Thrilling chills spill towards an untoward ontology!*)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants