Skip to content

Commit

Permalink
[ModuleCollectorMixin] adds option to find & call functions to collec…
Browse files Browse the repository at this point in the history
…t_top_level_attributes_in_module (Recidiviz/recidiviz-data#28837)

## Description of the change

adds the ability to collect attributes returned by a callable instead of
requiring all attributes must be defined in the global scope. the
callable must have no arguments and a return type that is either a list
of the specified attribute (or its subclasses) or an attribute itself
(or its subclasses).

in order to do so, this PR adds `explode_callables` and
`callable_prefix` as kwargs to
`ModuleCollectorMixin.collect_top_level_attributes_in_module`. the
approach in this PR:
(1) hopes that functions are accurately typed (ty mypy) but still
backstops against the case in which they are not by validating the
actual return vars of the function
  (2) relies on `typing.get_args` and  `typing.get_origin`

all existing invocations of `collect_top_level_attributes_in_module`
should remain unchanged.

i agree the naming of `explode_callables` feels a little ridiculous, but
is taken from `pandas.explode` which felt relevantly similar and is also
kinda fun ¯\\_(ツ)_/¯

## Type of change

> All pull requests must have at least one of the following labels
applied (otherwise the PR will fail):

| Label | Description |
|-----------------------------
|-----------------------------------------------------------------------------------------------------------
|
| Type: Bug | non-breaking change that fixes an issue |
| Type: Feature | non-breaking change that adds functionality |
| Type: Breaking Change | fix or feature that would cause existing
functionality to not work as expected |
| Type: Non-breaking refactor | change addresses some tech debt item or
prepares for a later change, but does not change functionality |
| Type: Configuration Change | adjusts configuration to achieve some end
related to functionality, development, performance, or security |
| Type: Dependency Upgrade | upgrades a project dependency - these
changes are not included in release notes |

## Related issues

Useful for for Recidiviz/recidiviz-data#28613
Prework for Recidiviz/recidiviz-data#28815, Recidiviz/recidiviz-data#28854

## Checklists

### Development

**This box MUST be checked by the submitter prior to merging**:
- [x] **Double- and triple-checked that there is no Personally
Identifiable Information (PII) being mistakenly added in this pull
request**

These boxes should be checked by the submitter prior to merging:
- [x] Tests have been written to cover the code changed/added as part of
this pull request

### Code review

These boxes should be checked by reviewers prior to merging:

- [x] This pull request has a descriptive title and information useful
to a reviewer
- [x] Potential security implications or infrastructural changes have
been considered, if relevant

GitOrigin-RevId: c4dde0a85d35dec1add27efd7536f5965d3e317f
  • Loading branch information
ethan-oro authored and Helper Bot committed May 1, 2024
1 parent c952cfe commit 4f9adb1
Show file tree
Hide file tree
Showing 11 changed files with 587 additions and 26 deletions.
11 changes: 11 additions & 0 deletions recidiviz/big_query/big_query_view_collector.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
from recidiviz.common.module_collector_mixin import ModuleCollectorMixin

VIEW_BUILDER_EXPECTED_NAME = "VIEW_BUILDER"
VIEW_BUILDER_CALLABLE_NAME_REGEX = "collect_.*view_builder"


class BigQueryViewCollector(Generic[BigQueryViewBuilderType], ModuleCollectorMixin):
Expand All @@ -47,6 +48,8 @@ def collect_view_builders_in_module(
builder_type: Type[BigQueryViewBuilderType],
view_dir_module: ModuleType,
recurse: bool = False,
collect_builders_from_callables: bool = False,
builder_callable_name_regex: str = VIEW_BUILDER_CALLABLE_NAME_REGEX,
view_file_prefix_filter: Optional[str] = None,
view_builder_attribute_name_regex: str = VIEW_BUILDER_EXPECTED_NAME,
validate_builder_fn: Optional[
Expand All @@ -62,6 +65,12 @@ def collect_view_builders_in_module(
view_dir_module: The module for the directory that contains all the view
definition files.
recurse: If true, look inside submodules of view_dir_module for view files.
collect_builders_from_callables: If True, will also find callables that match
the |builder_callable_regex|, collect all return objects, validate them as
valid |attribute_type|s and include them in the returned list.
builder_callable_regex: If |collect_builders_from_callables| is set, this
regex will be used to identify module attributes to invoke and include
the resulting objects in the returned list.
view_file_prefix_filter: When set, collection filters out any files whose
name does not have this prefix.
validate_builder_fn: When set, this function will be called with each
Expand All @@ -80,6 +89,8 @@ def collect_view_builders_in_module(
attribute_type=builder_type,
dir_module=view_dir_module,
recurse=recurse,
collect_from_callables=collect_builders_from_callables,
callable_name_regex=builder_callable_name_regex,
file_prefix_filter=view_file_prefix_filter,
attribute_name_regex=view_builder_attribute_name_regex,
validate_fn=validate_builder_fn,
Expand Down
141 changes: 115 additions & 26 deletions recidiviz/common/module_collector_mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
import os
import pkgutil
import re
from types import ModuleType
from typing import Callable, List, Optional, Set, Type
from types import FunctionType, ModuleType
from typing import Any, Callable, List, Optional, Set, Type

from recidiviz.utils.types import T, assert_type

Expand Down Expand Up @@ -73,6 +73,72 @@ def is_module_package(cls, module: ModuleType) -> bool:
file = assert_type(module.__file__, str)
return file.endswith("__init__.py")

@classmethod
def get_module_attribute_as_typed_list(
cls,
module: ModuleType,
attribute_name: str,
attribute: Any,
expected_attribute_type: Type[T],
) -> List[T]:
"""Given an |attribute| and an |attribute_type|, this function will verify that
the |attribute| is either of |attribute_type| or a list of |attribute_type|
"""
if isinstance(attribute, list):
if not attribute:
raise ValueError(
f"Unexpected empty list for attribute [{attribute_name}] "
f"in file [{module.__file__}]."
)
if not all(
isinstance(attribute_elem, expected_attribute_type)
for attribute_elem in attribute
):
raise ValueError(
f"An attribute in List [{attribute_name}] in file [{module.__file__}] "
f"did not match expected type [{expected_attribute_type.__name__}]."
)
return attribute

if not isinstance(attribute, expected_attribute_type):
raise ValueError(
f"Unexpected type [{attribute.__class__.__name__}] for attribute "
f"[{attribute_name}] in file [{module.__file__}]. Expected "
f"type [{expected_attribute_type.__name__}]."
)

return [attribute]

@classmethod
def collect_from_callable(
cls,
module: ModuleType,
attribute_name: str,
expected_attribute_type: Type[T],
) -> List[T]:
"""Given a |callable_name|, invoke the callable and validate the returned
objects as having the correct type.
"""
attribute = getattr(module, attribute_name)

if not isinstance(attribute, FunctionType):
raise ValueError(
f"Unexpected type [{type(attribute)}] for [{attribute_name}] in "
f"[{module.__file__}]. Expected type [Callable]"
)

try:
collected_objs = attribute()
except TypeError as e:
raise ValueError(
f"Unexpected parameters to callable [{attribute_name}] in "
f"[{module.__name__}]. Expected no paramaeters."
) from e

return cls.get_module_attribute_as_typed_list(
module, attribute_name, collected_objs, expected_attribute_type
)

@classmethod
def collect_top_level_attributes_in_module(
cls,
Expand All @@ -81,20 +147,29 @@ def collect_top_level_attributes_in_module(
dir_module: ModuleType,
attribute_name_regex: str,
recurse: bool = False,
collect_from_callables: bool = False,
callable_name_regex: str = "",
file_prefix_filter: Optional[str] = None,
validate_fn: Optional[Callable[[T, ModuleType], None]] = None,
expect_match_in_all_files: bool = True,
) -> List[T]:
"""Collects and returns a list of all attributes with the correct type /
specifications defined in files in a given directory. If the collection
encounters a matching list attribute, it will look inside the list to collect
the attributes of the correct type.
the attributes of the correct type. If the collection encounters a matching
callable without any parameters and a proper return type, it will envoke that
callable and return
Args:
attribute_type: The type of attribute that we expect to find in this subdir
dir_module: The module for the directory that contains all the files
where attributes are defined.
recurse: If true, look inside submodules of dir_module for files.
collect_from_callables: If True, will also find callables that match the
|callable_regex|, collect all return objects, validate them as
valid |attribute_type|s and include them in the returned list.
callable_regex: When |collect_from_callables| is True, this regex will be
used to determine which module attributes are callables.
file_prefix_filter: When set, collection filters returns only attributes
defined in files whose names match the provided prefix.
validate_fn: When set, this function will be called with each
Expand All @@ -107,10 +182,16 @@ def collect_top_level_attributes_in_module(
expect_match_in_all_files: If True, throws if a matching attribute does not
exist in all discovered python files. Otherwise, just skips files with
no matching attribute.
Returns:
List of |attribute_type|
"""
if collect_from_callables and not callable_name_regex:
raise ValueError(
"You must specify a callable_regex if you are trying to collect callabes"
)

dir_modules = [dir_module]
found_attributes: Set[T] = set()
dir_modules = [dir_module]
while dir_modules:
dir_module = dir_modules.pop(0)
child_modules = cls.get_submodules(
Expand All @@ -132,7 +213,19 @@ def collect_top_level_attributes_in_module(
if re.fullmatch(attribute_name_regex, attribute)
]

if expect_match_in_all_files and not attribute_variable_names:
callable_var_names: List[str] = []

if collect_from_callables:
callable_var_names = [
attribute
for attribute in dir(child_module)
if re.fullmatch(callable_name_regex, attribute)
]

if expect_match_in_all_files and not (
attribute_variable_names
or (collect_from_callables and callable_var_names)
):
raise ValueError(
f"File [{child_module.__file__}] has no top-level attribute matching "
f"[{attribute_name_regex}]"
Expand All @@ -141,28 +234,24 @@ def collect_top_level_attributes_in_module(
found_attributes_in_module: List[T] = []
for attribute_name in attribute_variable_names:
attribute = getattr(child_module, attribute_name)
if isinstance(attribute, list):
if not attribute:
raise ValueError(
f"Unexpected empty list for attribute [{attribute_name}]"
f"in file [{child_module.__file__}]."
)
if not isinstance(attribute[0], attribute_type):
raise ValueError(
f"Unexpected type [List[{attribute[0].__class__.__name__}]] for attribute "
f"[{attribute_name}] in file [{child_module.__file__}]. Expected "
f"type [List[{attribute_type.__name__}]]."
)
found_attributes_in_module += attribute
else:
if not isinstance(attribute, attribute_type):
raise ValueError(
f"Unexpected type [{attribute.__class__.__name__}] for attribute "
f"[{attribute_name}] in file [{child_module.__file__}]. Expected "
f"type [{attribute_type.__name__}]."
)
found_attributes_in_module.extend(
cls.get_module_attribute_as_typed_list(
child_module,
attribute_name,
attribute,
attribute_type,
)
)

found_attributes_in_module.append(attribute)
if collect_from_callables:
for attribute_name in callable_var_names:
found_attributes_in_module.extend(
cls.collect_from_callable(
child_module,
attribute_name,
attribute_type,
)
)

for val in found_attributes_in_module:
if validate_fn:
Expand Down
Loading

0 comments on commit 4f9adb1

Please sign in to comment.