Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables the materializers to depend on functions & driver variables #431

Merged

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Oct 3, 2023

.execute() enables that. The materializers should also be able to handle it.

The complication however is on who knows what and when to do it. So I made the decision to make the driver sanitize the materializers given the context that it has. New objects are returned so we don't modify the underlying materializer object -- that way if someone is reusing a materializer definition they can do so in other driver runs/etc.

Changes

  • driver.py
  • materialization.py
  • common
  • adapters

How I tested this

  • locally in a notebook
  • unit tests

Notes

  • Ready for review.

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep) [there is scope creep but it's in each commit]
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz skrawcz force-pushed the enable_materializer_dependencies_to_depend_on_funcs branch from 6561d35 to 54f0639 Compare October 4, 2023 20:32
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

.execute() enables that. The materializers should also be able to handle it.

The complication however is on who knows what and when to do it.
So I made the decision to make the driver sanitize the materializers given
the context that it has. New objects are returned so we don't modify the underlying
materializer object -- that way if someone is reusing a materializer definition they
can do so in other driver runs/etc.
Given A --> B

For this directional edge to be valid we want A to be a subclass of what B takes.

For loaders -- we're checking A, the loader,  can be fed into B.  e.g. is loader_A a subclass of input B can take. i.e. the decision function is (loader, function)
For savers -- we're checking A, the function, can be fed into B the saver. e.g.  function_A a subclass of input B the saver can take. i.e. the decision function is (function, loader)

In the code, though, we are treating the two cases the same, as a decision of (loader, function). So this fixes
that by creating two separate functions for a saver and a loader. That way the directional check is correct.

Adds tests for it.
We want to enable the case where `Any` can be
a super type of any type.

So this change makes enables the code to short circuit if
we're asking whether something is a subclass of the "Any"
type.
Rather than default returning to the last one. We should
check for all, and then return the last one, but warn in the
case of multiple being present. This I think will result in
a better experience. We could make the logging
statement a debug one, but we can change that later.
So both the driver and materializer need to handle converting
function names and variables into strings.

This is a little messy -- but it centralizes logic in common.
I didn't bother with another file because I didn't know what to call it.
So putting under common seems fine.

Otherwise I added tests to ensure that new functionality works,
and left the existing tests to ensure nothing broke.
So that one can do the following:

```python

to.memory(
     id="output_df",
     dependencies=outputs,
     combine=base.PandasDataFrameResult()
    ),
```
i.e. reuse any result builder and get that result return
in memory. This is in effect helps one to dynamically add/adjust
a result builder -- which I think is useful for development.
With the change of applies_to -- it means
that single classes are not a good idea. So splitting
up the default ones appropriately.
We missed adding it -- along with the nice error message
for helping people get to slack.

I think what I did makes sense, and mirrors what .execute()
does right now.
@skrawcz skrawcz force-pushed the enable_materializer_dependencies_to_depend_on_funcs branch from f96fe39 to 6095852 Compare October 5, 2023 00:15
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@elijahbenizzy elijahbenizzy merged commit b3f07ca into main Oct 5, 2023
21 checks passed
@elijahbenizzy elijahbenizzy deleted the enable_materializer_dependencies_to_depend_on_funcs branch October 5, 2023 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants