Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tags can hold list values #622

Merged
merged 2 commits into from
Jan 12, 2024
Merged

Tags can hold list values #622

merged 2 commits into from
Jan 12, 2024

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Jan 8, 2024

Implements a version of #620 .

@tag( business_value=["CA", "US"] )
def combo_node():
    ...

@tag( business_value=["US"] )`
def only_us_node():
    ...

@tag( business_value=["CA"], some_other_tag="BAR" )
def only_ca_node():
    ...

then the following queries will return:

# case A: returns only_ca_node() and combo_node()
dr.list_available_variables(tag_filter=dict(business_lines=["CA"]))
# case B: returns only_us_node() and combo_node()
dr.list_available_variables(tag_filter=dict(business_lines="US"))
# case C: returns all 3 nodes
dr.list_available_variables(tag_filter=dict(business_lines=["CA", "US"] ))
# case D: returns all 3 nodes
dr.list_available_variables(tag_filter=dict(business_lines=None ))
# case E: returns 0 nodes
dr.list_available_variables(tag_filter=dict(business_lines="UK" ))
# case F: returns 0 nodes
dr.list_available_variables(tag_filter=dict(business_lines="US", some_other_tag="FOO" ))
# case G: returns 1 nodes
dr.list_available_variables(tag_filter=dict(business_lines="CA", some_other_tag="BAR" ))

Changes

  • @tag now can take in a list of strings
  • driver.list_available_variables() now takes in a tag_filter argument
  • node.py has new function
  • tests

How I tested this

  • locally
  • via unit tests

Notes

  • this does not implement anything fancy with respect to query logic. We should implement more as the use case presents itself.

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz skrawcz linked an issue Jan 8, 2024 that may be closed by this pull request
@kzk2000
Copy link

kzk2000 commented Jan 8, 2024

@skrawcz haven't tried it yet, but just flagging my thoughts:

Do both of these return the same thing nodes?


dr.list_available_variables({"business_line": ["US", "UK"]}) 
dr.list_available_variables({"business_line": [ "UK", "US"]}) 

@skrawcz skrawcz force-pushed the feature/620 branch 2 times, most recently from f022807 to 2cc83e9 Compare January 8, 2024 23:21
@skrawcz
Copy link
Collaborator Author

skrawcz commented Jan 8, 2024

@skrawcz haven't tried it yet, but just flagging my thoughts:

Do both of these return the same thing nodes?


dr.list_available_variables({"business_line": ["US", "UK"]}) 
dr.list_available_variables({"business_line": [ "UK", "US"]}) 

yep it's order invariant.

I even handle the case:

@tag(business_line="US")
def my_function() -> ...:
     ...
# does not match
dr.list_available_variables({"business_line": ["US", "UK"]}) 
# does match
dr.list_available_variables({"business_line": ["US"]}) 

Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts -- largely on API

hamilton/driver.py Show resolved Hide resolved
hamilton/driver.py Show resolved Hide resolved
hamilton/driver.py Outdated Show resolved Hide resolved
# gets all
dr.list_available_variables()
# gets exact matching
dr.list_available_variables({"TAG_NAME": "TAG_VALUE"})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is confusing me quite a bit TBH. TAG_VALUE is a list means that they all have to match? BUt if its an individual it means they have to match any?

We should be opinionated and simple here. If not, we should leave querying up to the user... I was thinking:

  1. Only query for individual tag_values in dict. Dicts imply and, and an individual one means its one of the decorated values
  2. We leave open the option for more kw params later
dr.select_variables(tag_match={"foo" : "bar", "baz": "qux"})

This would match:

  • @tag(foo="bar")
  • @tag(baz="qux")
  • `@tag(foo=["bar", "qux"])

But not:

  • @tag(foo="not_bar")
  • @tag(foo=["not_baz", "not_bar"])
  • @tag(foo="bar", baz="not_qux")
  • @tag(foo=["bar", "qux"], baz=["bar", "not_qux"])

This is being opinionated that matching means any do exist, anything else is up to the user to specify/query independently.

Copy link
Collaborator Author

@skrawcz skrawcz Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single value, i.e. no list, is an exact match if the tag doesn't have a list, else if there is a list, it just needs to match one of them.

The issue/grey area, is how do you interpret a list of tag values? To me it stands to reason if we can tag values with lists, we can also query with lists.

E.g. Given:

@tag(business_lines=["CA", "TX"])
def my_func() -> ...:
   pass
# e.g. I want to get all nodes applicable to a particular business line (it could be variable in length)
# should this be an exact match if someone passes a single item list? 
dr.list_available_variables(tag_filter=dict(business_lines=["CA"]))

# to me the above is equivalent to this -- because a user could also write this:
dr.list_available_variables(tag_filter=dict(business_lines="CA"))

# Why?
# Well, why would someone put a tag on a function with only a single value in it?
@tag(business_lines=["CA"]) # < -- this wouldn't make sense to me.
def my_func() -> ...:
   pass

# e.g. I want to get all nodes that correspond to Cali AND Texas AND Florida:
# what about this? I think this is an exact match requirement (i.e. AND), not an OR.
dr.list_available_variables(tag_filter=dict(business_lines=["CA", "TX", "FL"]))

# e.g. I want to get all nodes that hit Cali OR Texas OR Florida
# If you want an OR you would have to do multiple calls and union the results.
(
 dr.list_available_variables(tag_filter=dict(business_lines="CA"))  | 
 dr.list_available_variables(tag_filter=dict(business_lines="TX"))  | 
 dr.list_available_variables(tag_filter=dict(business_lines="FL"))  | 
) # you'd have to make them sets for this to work but you get the picture

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kzk2000 any thoughts on lists of values? It would be simpler to not allow it.

Copy link

@kzk2000 kzk2000 Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skrawcz Whatever feels more intuitive is all right by me.

Say

@tag( business_value=["CA", "US"] )`
def combo_node():
    ...

@tag( business_value=["US"] )`
def only_us_node():
    ...

@tag( business_value=["CA"] )`
def only_ca_node():
    ...

so

dr.list_available_variables(tag_filter=dict(business_lines=["CA"])) # case A: returns only_ca_node() and combo_node()
dr.list_available_variables(tag_filter=dict(business_lines="US")) # case B: returns only_us_node() and combo_node()
dr.list_available_variables(tag_filter=dict(business_lines=["CA", "US"] )) # case C: returns all 3 nodes

That is, I would treat the last case as OR operator, if any one of the list entries matches any of the tag's list entries , return the nodes for which this is true.

But I'm also totally fine with making this more explicit as you suggested and do not allow lists in the tag_filter at all, that is, only allow for Case B above. And then do this as you suggested:

(
 dr.list_available_variables(tag_filter=dict(business_lines="CA"))  | 
 dr.list_available_variables(tag_filter=dict(business_lines="TX"))  | 
 dr.list_available_variables(tag_filter=dict(business_lines="FL"))  | 
) # you'd have to make them sets for this to work but you get the picture

Bottom line as long as `dr.list_available_variables()``` returns all nodes, I can easily run any list comprehension filter on this myself, e.g.

[ x for x in dr.list_available_variables() if x.tag.get('business_line", None) in ['US', 'CA']]

or similar

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I might switch to the OR interpretation then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to have something by Friday

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elijahbenizzy @kzk2000

For this case:

dr.list_available_variables(tag_filter=dict(business_lines=["CA"], some_other_tag="foo" ))

I'm interpreting it as an AND query, while the values in the list are OR. Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah, offline we've synced up and decided that tags are or queries. This leaves the door open with:

tag_filter=dict(foo=or_("bar", "baz"))
tag_filter=dict(foo=and_("bar", "baz"))

or something like this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm:

dr.list_available_variables(tag_filter=dict(business_lines=["CA", "US"], some_other_tag="foo" ))

this to me reads:
business_lines=CA OR business_lines=US
AND
some_other_tag=foo

tests/test_hamilton_driver.py Show resolved Hide resolved
hamilton/node.py Outdated Show resolved Hide resolved
hamilton/node.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

With prodding from #620 we decided to extend what you can
pass to tags, by allowing a list of string values to be passed.
We think this is an acceptable extension and allows one to
then have multiple values for the same tag.

See next commit for other part of #620.
This completes #620.

This implements the following behavior:

Behavior is now this. Given
```python
@tag( business_value=["CA", "US"] )`
def combo_node():
    ...

@tag( business_value=["US"] )`
def only_us_node():
    ...

@tag( business_value=["CA"], some_other_tag="BAR" )`
def only_ca_node():
    ...
```
then the following queries will return:
```python
dr.list_available_variables(tag_filter=dict(business_lines=["CA"]))
dr.list_available_variables(tag_filter=dict(business_lines="US"))
dr.list_available_variables(tag_filter=dict(business_lines=["CA", "US"] ))
dr.list_available_variables(tag_filter=dict(business_lines=None ))
dr.list_available_variables(tag_filter=dict(business_lines="UK" ))
dr.list_available_variables(tag_filter=dict(business_lines="US", some_other_tag="FOO" )
dr.list_available_variables(tag_filter=dict(business_lines="CA", some_other_tag="BAR" )
```
So values in lists are read as OR clauses, and multiple
tags in the query dict are read as AND clauses.

We settled on this design since it seemed the most intuitive to read,
while also leaving the door for more complex querying capabilities.

Squashed commits:
[f7a9752] Makes matches_query return True on empty dict
[f93282b] Updates logic for tag querying

This moves the tag filter logic to the node module,
since it's kind of coupled to it.

Behavior is now this. Given
```python
@tag( business_value=["CA", "US"] )`
def combo_node():
    ...

@tag( business_value=["US"] )`
def only_us_node():
    ...

@tag( business_value=["CA"], some_other_tag="BAR" )`
def only_ca_node():
    ...
```
then the following queries will return:
```python
dr.list_available_variables(tag_filter=dict(business_lines=["CA"]))
dr.list_available_variables(tag_filter=dict(business_lines="US"))
dr.list_available_variables(tag_filter=dict(business_lines=["CA", "US"] ))
dr.list_available_variables(tag_filter=dict(business_lines=None ))
dr.list_available_variables(tag_filter=dict(business_lines="UK" ))
dr.list_available_variables(tag_filter=dict(business_lines="US", some_other_tag="FOO" )
dr.list_available_variables(tag_filter=dict(business_lines="CA", some_other_tag="BAR" )
```

[94c58de] Fixing whitespace and type annotation for list_available_variables

These were incorrect.
[2cc83e9] Adds exposing tag_filter open on listing variables #620

This is a quick way to improve the ergonomics of listing
all available variables and filtering by some criteria.

```python
dr.list_available_variables() # gets all
dr.list_available_variables({"TAG_NAME": "TAG_VALUE"})  # gets all matching tag name & value
dr.list_available_variables({"TAG_NAME": ["TAG_VALUE", "TAG_VALUE2"]})  # gets all matching tag name & all values order invariant
dr.list_available_variables({"TAG_NAME": None})  # gets all with matching tag, irrespective of value
```

This completes 620, and handles the filtering
logic in the driver function. I think this is an reasonable
place to put it for now.

Assumptions:
 - if passed a list it's an exact match
 - AND clause across all values passed in.
 - None means just get me anything with that tag.
 - it's keyword only argument to enable us to maintain
backwards compatibility for future changes.

More complex clauses seem out of scope for this change.

But for the future, if we want to change things, e.g. take in
a query object that can express `not in, or, etc`  We can do that.

Another idea I didn't pursue was enabling query syntax:

```python
{"TAG_NAME": "=FOO"},  # equal
{"TAG_NAME": "!FOO"},  # not equal
etc
```
Since with lists of values of strings, this could get quite complex
to parse/understand...
@skrawcz skrawcz merged commit f5079d7 into main Jan 12, 2024
22 checks passed
@skrawcz skrawcz deleted the feature/620 branch January 12, 2024 06:22
@elijahbenizzy elijahbenizzy changed the title Feature/620 Tags can hold list values Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow non-strings in @tag(...)
3 participants