Enables the materializers to depend on functions & driver variables #431

skrawcz · 2023-10-03T22:27:24Z

.execute() enables that. The materializers should also be able to handle it.

The complication however is on who knows what and when to do it. So I made the decision to make the driver sanitize the materializers given the context that it has. New objects are returned so we don't modify the underlying materializer object -- that way if someone is reusing a materializer definition they can do so in other driver runs/etc.

Changes

driver.py
materialization.py
common
adapters

How I tested this

locally in a notebook
unit tests

Notes

Ready for review.

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep) [there is scope creep but it's in each commit]
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

elijahbenizzy

LGTM!

.execute() enables that. The materializers should also be able to handle it. The complication however is on who knows what and when to do it. So I made the decision to make the driver sanitize the materializers given the context that it has. New objects are returned so we don't modify the underlying materializer object -- that way if someone is reusing a materializer definition they can do so in other driver runs/etc.

Given A --> B For this directional edge to be valid we want A to be a subclass of what B takes. For loaders -- we're checking A, the loader, can be fed into B. e.g. is loader_A a subclass of input B can take. i.e. the decision function is (loader, function) For savers -- we're checking A, the function, can be fed into B the saver. e.g. function_A a subclass of input B the saver can take. i.e. the decision function is (function, loader) In the code, though, we are treating the two cases the same, as a decision of (loader, function). So this fixes that by creating two separate functions for a saver and a loader. That way the directional check is correct. Adds tests for it.

We want to enable the case where `Any` can be a super type of any type. So this change makes enables the code to short circuit if we're asking whether something is a subclass of the "Any" type.

Rather than default returning to the last one. We should check for all, and then return the last one, but warn in the case of multiple being present. This I think will result in a better experience. We could make the logging statement a debug one, but we can change that later.

So both the driver and materializer need to handle converting function names and variables into strings. This is a little messy -- but it centralizes logic in common. I didn't bother with another file because I didn't know what to call it. So putting under common seems fine. Otherwise I added tests to ensure that new functionality works, and left the existing tests to ensure nothing broke.

So that one can do the following: ```python to.memory( id="output_df", dependencies=outputs, combine=base.PandasDataFrameResult() ), ``` i.e. reuse any result builder and get that result return in memory. This is in effect helps one to dynamically add/adjust a result builder -- which I think is useful for development.

With the change of applies_to -- it means that single classes are not a good idea. So splitting up the default ones appropriately.

We missed adding it -- along with the nice error message for helping people get to slack. I think what I did makes sense, and mirrors what .execute() does right now.

elijahbenizzy

LGTM!

skrawcz force-pushed the enable_materializer_dependencies_to_depend_on_funcs branch from 6561d35 to 54f0639 Compare October 4, 2023 20:32

elijahbenizzy approved these changes Oct 5, 2023

View reviewed changes

skrawcz added 8 commits October 4, 2023 17:13

Adds support for Any type in subclass check

0b7d5f4

We want to enable the case where `Any` can be a super type of any type. So this change makes enables the code to short circuit if we're asking whether something is a subclass of the "Any" type.

Split up default data adapters

43edce5

With the change of applies_to -- it means that single classes are not a good idea. So splitting up the default ones appropriately.

Adds telemetry to materialize

6095852

We missed adding it -- along with the nice error message for helping people get to slack. I think what I did makes sense, and mirrors what .execute() does right now.

skrawcz force-pushed the enable_materializer_dependencies_to_depend_on_funcs branch from f96fe39 to 6095852 Compare October 5, 2023 00:15

elijahbenizzy approved these changes Oct 5, 2023

View reviewed changes

elijahbenizzy merged commit b3f07ca into main Oct 5, 2023
21 checks passed

elijahbenizzy deleted the enable_materializer_dependencies_to_depend_on_funcs branch October 5, 2023 04:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enables the materializers to depend on functions & driver variables #431

Enables the materializers to depend on functions & driver variables #431

skrawcz commented Oct 3, 2023 •

edited

elijahbenizzy left a comment

elijahbenizzy left a comment

Enables the materializers to depend on functions & driver variables #431

Enables the materializers to depend on functions & driver variables #431

Conversation

skrawcz commented Oct 3, 2023 • edited

Changes

How I tested this

Notes

Checklist

elijahbenizzy left a comment

Choose a reason for hiding this comment

elijahbenizzy left a comment

Choose a reason for hiding this comment

skrawcz commented Oct 3, 2023 •

edited