Skip to content

Commit

Permalink
Code generation BFS commentary.
Browse files Browse the repository at this point in the history
This commit internally comments on an essential refactoring of the
breadth-first search (BFS) underlying our dynamic code generation of
wrapper functions, which is currently implemented in a rather ad-hoc
manner preventing us from calculating critical tree properties (e.g.,
non-leaf node height) required to subsequently implement sane dictionary
and set type-checking support. (*Randomized dominions of itemized minions!*)
  • Loading branch information
leycec committed Feb 10, 2021
1 parent 647b4c7 commit d17fe6c
Show file tree
Hide file tree
Showing 5 changed files with 143 additions and 16 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/python_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ on:
# error: failed to push some refs to 'git@github.com:beartype/beartype.git'
#See also: https://github.com/actions/create-release/issues/13
# branches: [ master ]
#FIXME: *URGH!* "create-release" is now officially dead, so we need to
#refactor this as quickly as feasible to leverage an alternate action. See
#also this useful @Heliotrop3 issue:
# https://github.com/beartype/beartype/issues/22

# Sequence of glob expressions matched against "refs/tags" pushed to the
# branches above.
Expand Down
5 changes: 3 additions & 2 deletions .github/workflows/python_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,9 @@ on:
branches:
- main
pull_request:
#FIXME: Is this actually right?
branches:
- main
- '**'

# ....................{ MAIN }....................
jobs:
Expand Down Expand Up @@ -120,7 +121,7 @@ jobs:

# ..................{ SETTINGS }..................
# Arbitrary human-readable description.
name: "Python ${{ matrix.python-version }}"
name: "[${{ matrix.platform }}] Python ${{ matrix.python-version }}"

# Name of the current Docker image to run tests under.
runs-on: ${{ matrix.platform }}
Expand Down
23 changes: 12 additions & 11 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ Beartype makes type-checking painless, portable, and possibly fun. Just:
Toy Example
-----------

Let's see what that looks like for a "Hello, Jungle!" toy example. Just:
Let's see what that looks like for a ``"Hello, Jungle!"`` toy example. Just:

#. Import the ``@beartype.beartype`` decorator:

Expand Down Expand Up @@ -542,18 +542,19 @@ list enumerating the various ways this invalid parameter fails to satisfy its
type hint, including the types and indices of the first container item failing
to satisfy the nested ``Sequence[int]`` hint.

See the `"Decoration" section <Decoration_>`__ for actual code dynamically
generated by beartype for real-world use cases resembling those above. Fun!
See a `subsequent section <Implementation_>`__ for actual code dynamically
generated by ``beartype`` for real-world use cases resembling those above. Fun!

Would You Like to Know More?
----------------------------

If you know `type hints <PEP 484_>`__, you know beartype. Since beartype is
driven entirely by `tool-agnostic community standards <PEP 0_>`__, beartype's
public API is simply the summation of those standards. As the end user, all you
need to know is that decorated callables magically begin raising human-readable
exceptions when you pass parameter or return values that violate the
PEP-compliant type hints annotating those parameter or return values.
If you know `type hints <PEP 484_>`__, you know ``beartype``. Since
``beartype`` is driven entirely by `tool-agnostic community standards <PEP
0_>`__, the public API for ``beartype`` is just the summation of those
standards. As the user, all you need to know is that decorated callables
magically begin raising human-readable exceptions when you pass parameters or
return values that violate the PEP-compliant type hints annotating those
parameters or return values.

If you don't know `type hints <PEP 484_>`__, this is your moment to go deep on
the hardest hammer in Python's SQA_ toolbox. Here are a few friendly primers to
Expand Down Expand Up @@ -1915,8 +1916,8 @@ Let's call that function with good types:
Behold! The terrifying power of the ``typing.Optional`` type hint, resplendent
in its highly over-optimized cache utilization.

Decoration
==========
Implementation
==============

Let's take a deep dive into the deep end of runtime type checking – the
``beartype`` way. In this subsection, we show code generated by the
Expand Down
116 changes: 114 additions & 2 deletions beartype/_decor/_code/_pep/_pephint.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,119 @@
#* For use such use, if the decorated callable accepts a "hint" parameter,
# refactor that callable to use @callable_cached_hintable instead.

#FIXME: *WOOPS.* The "LRUDuffleCacheStrong" class designed below assumes that
#calculating the semantic height of a type hint (e.g., 3 for the complex hint
#Optional[int, dict[Union[bool, tuple[int, ...], Sequence[set]], list[str]])
#is largely trivial. It isn't -- at all. Computing that without a context-free
#recursion-esque algorithm of some sort is literally infeasible. We absolutely
#*MUST* get that height right, since we'll be exponentiating that height to
#estimate space consumption of arbitrary objects. Off-by-one errors are
#unacceptable when the difference between a height of 2 and a height of 3 means
#tens of thousands in additional estimated space consumption.
#
#So. How do we do this, then? *SIMPLE.* Okay, not simple -- but absolutely
#beneficial for a medley of unrelated pragmatic reasons and thus something we
#need to pursue anyway regardless of the above concerns.
#
#The solution is to make the breadth-first search (BFS) internally performed
#by the pep_code_check_hint() function below more recursion-esque. We will
#*NOT* be refactoring that function to leverage:
#
#* Recursion rather than iteration for all of the obvious reasons.
#* A stack-like depth-first search (DFS) approach. While implementing a DFS
# with iteration can technically be done, doing so imposes non-trivial
# technical constraints because you then need to store interim results (which
# in a proper recursive function would simply be local variables) as you
# iteratively complete each non-leaf node. That's horrifying. So, we'll be
# preserving our breadth-first search (BFS) approach. The reason why a BFS is
# often avoided in the real world are space concerns: a BFS consumes
# significantly more space than a comparable DFS, because:
# * The BFS constructs the entire tree before operating on that tree.
# * The DFS only constructs a vertical slice of the entire tree before
# operating only on that slice.
# In our case, however, space consumption of a BFS versus DFS is irrelevant.
# Why? Because type hints *CANNOT* be deeply nested without raising recursion
# limit errors from deep within the CPython interpreter, as we well know.
# Ergo, a BFS will only consume slightly more temporary space than a DFS. This
# means a "FixedList" of the same size trivially supports both.
#
#First, let's recap what we're currently doing:
#
#* In a single "while ...:" loop, we simultaneously construct the BFS tree
# (stored in a "FixedList" of tuples) *AND* output results from that tree as
# we are dynamically constructing it.
#
#The "simultaneously" is the problem there. We're disappointed we didn't
#realize it sooner, but our attempt to do *EVERYTHING* in a single pass is why
#we had such extraordinary difficulties correctly situating code generated by
#child type hints into the code generated for parent type hints. We
#circumvented the issue by repeatedly performing a global search-and-replace on
#the code being generated, which is horrifyingly inefficient *AND* error-prone.
#We should have known then that something was wrong. Sadly, we proceeded.
#
#Fortunately, this is the perfect moment to correct our wrongs -- before we
#proceed any deeper into a harmful path dependency. How? By splitting our
#current monolithic BFS algorithm into two disparate BFS phases -- each
#mirroring the behaviour of a recursive algorithm:
#
#1. In the first phase, a "while ...:" loop constructs the BFS tree by
# beginning at the root hint, iteratively visiting all child hints, and
# inserting metadata describing those hints into our "hints_meta" list as we
# currently do. That's it. That's all. But that's enough. This construction
# then gives us efficient random access over the entire type hinting
# landscape, which then permits us to implement the next phase -- which does
# the bulk of the work. To do so, we'll add additional metadata to our
# current "hint_meta" tuple: e.g.,
# * "_HINT_META_INDEX_CHILD_FIRST_INDEX", the 0-based index into the
# "hints_meta" FixedList of the first child hint of the current hint if any
# *OR* "None" otherwise. Since this is a BFS, that child hint could appear
# at any 0-based index following the current hint; finding that child hint
# during the second phase thus requires persisting the index of that hint.
# Note that the corresponding index of the last child hint of the current
# hint need *NOT* be stored, as adding the length of the argument list of
# the current hint to the index of the first child hint trivially gives the
# index of the last child hint.
# * "_HINT_META_INDEX_CODE", the Python code snippet type-checking the
# current hint to be generated by the second phase.
# * "_HINT_META_INDEX_HEIGHT", the 1-based height of the current hint in this
# BFS tree. Leaf nodes have a height of 1. All non-leaf nodes have a height
# greater than 1. This height *CANNOT* be defined during the first phase
# but *MUST* instead be deferred to the second phase.
# * ...probably loads more stuff, but that's fine.
#* In the second phase, another "while ...:" loop generates a Python code
# snippet type-checking the root hint and all child hints visitable from that
# hint in full by beginning *AT THE LAST CHILD HINT ADDED TO THE* "hints_meta"
# FixedList, generating code type-checking that hint, iteratively visiting all
# hints *IN THE REVERSE DIRECTION BACK UP THE TREE*, and so on.
#
#That's insanely swag. It shames us that we only thought of it now. *sigh*
#FIXME: Now that we actually have an audience (yay!), we *REALLY* need to avoid
#breaking anything. But implementing the above refactoring would absolutely
#break everything for an indeterminate period of time. So how do we do this?
#*SIMPLE*. We leave this submodule as is *UNTIL* our refactoring passes tests.
#In the meanwhile, we safely isolate our refactoring work to the following new
#submodules:
#* "_pephinttree.py", implementing the first phase detailed above.
#* "_pephintgene.py", implementing the second phase detailed above.
#
#To test, we locally change a simple "import" statement in the parent
#"_pepcode" submodule and then revert that import before committing. Rinse
#until tests pass, which will presumably take several weeks at least.
#FIXME: Note that there exists a significant optimization that we *ABSOLUTELY*
#should add to these new modules. Currently, the "hints_meta" data structure is
#represented as a FixedList of size j, each item of which is a k-length tuple.
#If you briefly consider it, however, that structure could equivalently be
#represented as a FixedList of size j * k, where we simply store the items
#previously stored in each k-length tuple directly in that FixedList itself.
#
#Iterating forward and backward by single hints over that FixedList is still
#trivial. Rather than incrementing or decrementing an index by 1, we instead
#increment or decrement an index by k.
#
#The resulting structure is guaranteed to be considerably more space-efficient,
#due to being both contiguous in memory and requiring only a single object
#(and thus object dictionary) to maintain. Cue painless forehead slap.

#FIXME: Add support for "PEP 586 -- Literal Types". Sadly, doing so will be
#surprisingly non-trivial.
#
Expand Down Expand Up @@ -756,7 +869,7 @@
# * "_EAS_MAX", the maximum capacity of this LRU cache in EAS units. Note that
# this capacity should ideally default to something that *DYNAMICALLY SCALES
# WITH THE RAM OF THE LOCAL MACHINE.* Ergo, "_bigo_size_max" should be
# significantly larger in a standard desktop system with 32GB RAM than it is
# significantly larger in a standard desktop system with 32GB RAM than it is
# on a Raspberry Pi 2 with 1GB RAM: specifically, 32 times larger.
# * "_bigo_size_cur", the current capacity of this LRU cache in EAS units.
# * "_FIXED_LIST_SIZE", the number of additional supplementary objects to
Expand Down Expand Up @@ -877,7 +990,6 @@
# value_height: 'Optional[int]' = 1,
# ) -> None:


#FIXME: Here's a reasonably clever idea for perfect O(1) tuple type-checking
#guaranteed to check all n items of an arbitrary tuple in exactly n calls, with
#each subsequent call performing *NO* type-checking by reducing to a noop. How?
Expand Down
11 changes: 10 additions & 1 deletion beartype/_util/func/utilfunc.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,15 @@
This private submodule is *not* intended for importation by downstream callers.
'''

# ....................{ TODO }....................
#FIXME: Generalize get_callable_filename_or_placeholder() to support PyPy.
#Although PyPy clearly is internally reducing pure-Python callables into
#C-based callables, it *SHOULD* also be preserving their code objects for
#subsequent inspection. This means that the "if isinstance(func,
#CallableCTypes):" test is insufficient. We should probably instead be more
#generally testing whether the passed callable has a "__code__" attribute
#defined. Oh, PyPy. This is why good things are hard to acquire.

# ....................{ IMPORTS }....................
from collections.abc import Callable
from sys import modules
Expand Down Expand Up @@ -64,7 +73,7 @@ def get_callable_filename_or_placeholder(func: Callable) -> str:
# Fully-qualified name of the module declaring this class if this class
# was physically declared by an on-disk module *OR* "None" otherwise.
func_module_name = func.__module__

# If this class was physically declared by an on-disk module, defer to
# the absolute filename of that module.
#
Expand Down

0 comments on commit d17fe6c

Please sign in to comment.