Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix filter parsing bug #9508

Merged
merged 6 commits into from
Feb 20, 2024
Merged

Fix filter parsing bug #9508

merged 6 commits into from
Feb 20, 2024

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Feb 2, 2024

resolves #9507

Problem

Currently, if you pass a string into a filter YAML param, it will be assumed to be a list. This means your semantic manifest ends up with a list of strings for where filters. For example, this YAML:

filter: |
    {{ Dimension('customer__customer_type') }}  = 'new'

would look like this in the semantic manifest:

{"where_filters": [{"where_sql_template": "{"}, {"where_sql_template": "{"}, {"where_sql_template": " "}, {"where_sql_template": "D"}, {"where_sql_template": "i"}, {"where_sql_template": "m"}, {"where_sql_template": "e"}, {"where_sql_template": "n"}, {"where_sql_template": "s"}, {"where_sql_template": "i"}, {"where_sql_template": "o"}, {"where_sql_template": "n"}, {"where_sql_template": "("}, {"where_sql_template": "'"}, {"where_sql_template": "c"}, {"where_sql_template": "u"}, {"where_sql_template": "s"}, {"where_sql_template": "t"}, {"where_sql_template": "o"}, {"where_sql_template": "m"}, {"where_sql_template": "e"}, {"where_sql_template": "r"}, {"where_sql_template": "_"}, {"where_sql_template": "_"}, {"where_sql_template": "c"}, {"where_sql_template": "u"}, {"where_sql_template": "s"}, {"where_sql_template": "t"}, {"where_sql_template": "o"}, {"where_sql_template": "m"}, {"where_sql_template": "e"}, {"where_sql_template": "r"}, {"where_sql_template": "_"}, {"where_sql_template": "t"}, {"where_sql_template": "y"}, {"where_sql_template": "p"}, {"where_sql_template": "e"}, {"where_sql_template": "'"}, {"where_sql_template": ")"}, {"where_sql_template": " "}, {"where_sql_template": "}"}, {"where_sql_template": "}"}, {"where_sql_template": " "}, {"where_sql_template": " "}, {"where_sql_template": "="}, {"where_sql_template": " "}, {"where_sql_template": "'"}, {"where_sql_template": "n"}, {"where_sql_template": "e"}, {"where_sql_template": "w"}, {"where_sql_template": "'"}]}

This is because the dataclass from_dict() method only works for unions if Union is the outermost type annotation. We had it nested in an Optional annotation, so this fixes that.

Solution

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
  • This PR includes type annotations for new and modified functions

@courtneyholcomb courtneyholcomb requested a review from a team as a code owner February 2, 2024 00:42
@cla-bot cla-bot bot added the cla:yes label Feb 2, 2024
Copy link
Contributor

github-actions bot commented Feb 2, 2024

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

Copy link

codecov bot commented Feb 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2411f93) 87.97% compared to head (50fc004) 87.93%.
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9508      +/-   ##
==========================================
- Coverage   87.97%   87.93%   -0.05%     
==========================================
  Files         167      167              
  Lines       22171    22171              
==========================================
- Hits        19506    19496      -10     
- Misses       2665     2675      +10     
Flag Coverage Δ
integration 85.51% <100.00%> (-0.12%) ⬇️
unit 61.90% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me based on my minuscule knowledge of Mashumaro (where the from_dict method apparently comes from), but I'm not a maintainer here so I'll let someone from the core team do the approving/merging.

@@ -564,7 +564,7 @@ def __bool__(self):
@dataclass
class UnparsedMetricInputMeasure(dbtClassMixin):
name: str
filter: Optional[Union[str, List[str]]] = None
filter: Union[Optional[str], Optional[List[str]]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use an explicit None in the type spec instead of putting in an Optional on everything? Union[str, List[str], None] or str | List[str] | None is a little less repetitive, assuming either of those work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that works too! Just tested to confirm.

@dbeatty10 dbeatty10 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Feb 2, 2024
@courtneyholcomb
Copy link
Contributor Author

based on my minuscule knowledge of Mashumaro (where the from_dict method apparently comes from)

@tlento I thought that too, but if you go into the definition in Mashumaro, it's just the type stubs. Took me forever to track this down as a result 😅 but turns out from_dict() is just coming from dataclass.

@tlento
Copy link
Contributor

tlento commented Feb 3, 2024

@tlento I thought that too, but if you go into the definition in Mashumaro, it's just the type stubs. Took me forever to track this down as a result 😅 but turns out from_dict() is just coming from dataclass.

Well I just learned something today. There is no from_dict method on a Python dataclass, which is why I was so confused at first, and that type stub is indeed non-functional as written, but I figured there was something that Mashumaro was doing to populate the body with some type-specific extraction.

Turns out the library adds it on the fly via a dynamic codegen + compile step invoked from the DataclassDictMixin.

Consequently, I have no idea what the resulting from_dict code is actually doing or why this was an issue or why or how this PR fixes it, but I 100% trust that this was the issue because I can easily imagine how it might happen.

Were you able to actually get to the from_dict method body in your investigation? I'd be curious to see what it looks like, since my feeble human brain is not able to assemble the recursively-generated code.

Wild stuff, either way.

@Fatal1ty
Copy link

Fatal1ty commented Feb 3, 2024

Consequently, I have no idea what the resulting from_dict code is actually doing

You can see the generated code by enabling debug in the config parameter as described in the docs.

why this was an issue

Currently, Union types are handled naively by trying to use a handler for each variant type in the loop. Values of typestr are handled as is by default and that can be an issue in Union. The most reliable solution in this case would be to register a custom deserialization method. For example, if str is the first variant on the list, then using a simple isinstance check as in this example can help. Other options may include a custom method for list[str] or the whole union. You can play with debug mode to see which method is the best for you. I understand that this imperfection might be confusing and dangerous in some cases, so I have plans to redo the processing of unions in next versions.

If you still have any questions or need help, I invite you to open an issue for discussion.

Copy link
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should absolutely get this in. Thank you for doing this work ❤️ We should though add a test. Especially as @Fatal1ty mentioned, the typing based parser in Mashumaro will be seeing some changes in the future, so we want to guarantee that the filter continues to be parsed correctly as the things we depend on change how they operate. A good place to add a test would be in test_metrics.py.

@@ -564,7 +564,7 @@ def __bool__(self):
@dataclass
class UnparsedMetricInputMeasure(dbtClassMixin):
name: str
filter: Optional[Union[str, List[str]]] = None
filter: Union[str, List[str], None] = None
Copy link
Contributor

@QMalcolm QMalcolm Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We should probably add a comment about whats going on here. I can imagine a future person like me accidentally changing it back to Optional during a refactor as that is our usual pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, updated!

@courtneyholcomb
Copy link
Contributor Author

Added tests! @QMalcolm thank you for pointing me to where those should be!

Note that in the process of writing tests, I discovered a separate bug. If you try to use a list filter that includes jinja on a metric or an input measure, you'll get an error when you run dbt parse. For example, parsing the following YAML:

metrics:
  - name: collective_tenure_measure_filter_list
    label: "Collective tenure2"
    description: Total number of years of team experience
    type: simple
    type_params:
      measure:
        name: "years_tenure"
        filter:
          - "{{ Dimension('id__loves_dbt') }} is true"

Gets this error:
CompilationError("Could not render {{ Dimension('id__loves_dbt') }} is true: 'Dimension' is undefined")
The exact same YAML works if you include the filter as a string.

I'll be putting up a separate issue for that bug. But that's the reason you don't see any test cases for input measures or metrics with list filters. The same bug does not exist for filters on saved queries, oddly, so I was able to include list filter tests for those.

@courtneyholcomb
Copy link
Contributor Author

I'm not sure why the artifacts check is failing in CI - it looks like the action itself is erroring? But LMK if there's something I need to fix there!

@QMalcolm QMalcolm added the artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking label Feb 17, 2024
@QMalcolm
Copy link
Contributor

I'm not sure why the artifacts check is failing in CI - it looks like the action itself is erroring? But LMK if there's something I need to fix there!

The failing CI check is one we've added recently. I've added the label artifact_minor_upgrade which essentially skips the check.

Context: we want to strongly control changes to /artifacts to help identify and mitigate breaking changes. To do this we added a CI step which checks for any file changes in /artifacts. Once we someone verifies the change isn't breaking, we add the artifact_minor_upgrade label to acknowledge this. We could implement a smarter check which recognizes what is and isn't a breaking change, and maybe we will in the future, but this gets the job done for now.

Copy link
Contributor

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you for adding the tests ❤️

@QMalcolm QMalcolm merged commit 20f9049 into main Feb 20, 2024
57 of 58 checks passed
@QMalcolm QMalcolm deleted the court/dataclass-union-fix branch February 20, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifact_minor_upgrade To bypass the CI check by confirming that the change is not breaking cla:yes ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Semantic Layer where filter strings are parsed into lists
5 participants