(#4884) Add support for ratio metrics #5027

drewbanin · 2022-04-11T14:54:44Z

resolves #4884

Description

This PR adds support for ratio metrics in dbt. A ratio metric is defined in terms of an expression, eg:

  - name: arpc
    label: Average Revenue per Customer
    type: expression
    sql: "{{ metric('revenue') }} / {{ metric('customers') }}"

This PR also adds a metric function that 1) resolves a specified metric and 2) creates an edge in the DAG between a metric node and another node.

Deliverables:

Example of metrics depending on other metrics in the DAG viz (note: this requires a PR to dbt-docs):

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have added information about my change to be included in the CHANGELOG.

…/metric-improvements

github-actions · 2022-04-11T14:55:02Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

drewbanin

Dropped some notes inline in the PR. Excited to discuss!

drewbanin · 2022-04-11T14:56:51Z

core/dbt/contracts/graph/manifest.py

+                # Skip if the metric is disabled!
+                return metric
+
+            # TODO: Figure this part out...


Do we allow other yaml-only resources (like sources) to be disabled? If so, we should implement this part

Sources can be disabled, exposures cannot (but we should support this). In the past, it's only been possible to disable a source via the sources: block in dbt_project.yml.

In v1.1, we'll be adding the ability to disable a source inline with its definition (#5008):

sources: - name: my_source config: enabled: False

drewbanin · 2022-04-11T14:58:36Z

core/dbt/contracts/graph/metrics.py

+        self.package_name = package_name
+
+
+class ResolvedMetricReference(MetricReference):


This was my take on a class analogous to the Relation object. I figured that this kind of object could be returned from the metric() function. It would help us avoid the need to find the metric in the graph manually like we do here

This looks reasonable to me.

Lets remove .metric_value based on conversation here.

drewbanin · 2022-04-11T14:59:02Z

core/dbt/contracts/graph/metrics.py

+        return f'{namespace}.{self.metric_name}'
+
+    def __str__(self):
+        return self.node.name


should this return the metric name? Or the sql expression? Or something else? Not 100% sure yet....

When would the ResolvedMetricReference get stringified?

I think this would happen when you interpolate the results of the metric() function, so:

select {{ metric('revenue') }} from {{ metric('revenue').model }}

would evaluate to

select revenue from analytics.fct_revenue

So - returning the name of the metric is kind of useless, but I am unsure if we want to return anything more sophisticated than that? I guess we could return self.node.sql instead (which feels reasonable), but this doesn't extend well to the ratio metric type which does not have a .sql value

So, uh, short answer: you probably wouldn't stringify it in practice?

Maybe we shouldn't even have a str method then? With the idea that it's better to fail noisily than fail silently.

drewbanin · 2022-04-11T15:01:04Z

core/dbt/contracts/graph/parsed.py

    created_at: float = field(default_factory=lambda: time.time())

+    def resolve_metric_references(self, context):
+        # TODO: Obviously don't do this...


this is where I implemented "patching" for the metric node. Big idea is that we have code like:

- name: arpc label: Average Revenue per Customer type: ratio ratio_terms: numerator: metric('revenue') denominator: metric('customers')

When we parse the arpc metric, we don't necessarily know anything about revenue or customers yet. I think this is sort of akin to how exposures work, though I wasn't able to find the way that we "patch" exposures when I was looking through the code. Wondering if this is different because metrics can depend on other metrics, whereas exposures can only depend on models/sources/snapshots, so we can't bypass this problem with parsing order.

Feedback/guidance very welcomed here!

Yes, exposures don't refer to themselves, so exposures refs, sources, etc get resolved in the various parser.manifest 'process_XXX' methods. I wouldn't call what the metrics need a "patch" step, because when we did that we saved a bunch of patch dictionaries (with config, etc) until the end of parsing and applied them then. So far metrics just need a resolution phase, as you've implemented.

drewbanin · 2022-04-11T15:01:24Z

core/dbt/main.py

@@ -142,6 +142,8 @@ def main(args=None):
            exit_code = e.code

        except BaseException as e:
+            import traceback


just here for debugging - will remove before PRing this

drewbanin · 2022-04-11T15:03:21Z

core/dbt/parser/schema_renderer.py

@@ -66,6 +66,9 @@ def should_render_keypath(self, keypath: Keypath) -> bool:
                    return False
            elif self._is_norender_key(keypath[0:]):
                return False
+        elif self.key == "metrics":


I actually don't think this part is necessary... will remove. This came from an initial idea of implementing ratios via the sql config, eg:

sql: "{{metric('revenue') }} / {{ metric('customers') }}"

Decided against doing it that way (instead: specify numerator and denominator separately) in order to make metric queries leveraging these ratio metrics more tractable to implement.

drewbanin · 2022-04-11T15:04:23Z

core/dbt/parser/schemas.py

@@ -1042,8 +1046,42 @@ def parse_metric(self, unparsed: UnparsedMetric) -> ParsedMetric:
            self.schema_parser.manifest,
            package_name,
        )
-        model_ref = "{{ " + unparsed.model + " }}"
-        get_rendered(model_ref, ctx, parsed, capture_macros=True)
+        if unparsed.type == 'ratio':


I don't love the idea of teaching the metric parser about metric subtypes. For now, the only special metric is the "ratio" type, though I suppose there could be other special types in the future. Happy to discuss if anyone feels strongly about us doing this part differently

Heard. If we want the validation to happen in Python code (and it should), I think we'll need to teach some of it to dbt-core (or a dbt-core dependency). I don't mind ratio being a special type for now.

Is there a more general solution that looks like "super"/"sub" metric types? Like super/subclasses for Python objects. Or maybe we don't want to build the semantic layer as OOP :)

It feels wrong to me to have this code be in the parser. I'd be more comfortable if we could move some of it into the UnparsedMetric or ParsedMetric or a subclass based on type... Having to render it does complicate things though. Is it actually necessary to render the numerator and denominator at parse time? Can they actually contain macros? Because if they can't contain macros, I would think the get_rendered is a no-op.

@gshank I think we do need to render the numerator and denominator at parse-time because we need to capture the metric() call in order to build the edges between the two metrics. In this way, metric() is kind of like ref(), but for metrics :)

I'd be ok with moving this logic into the ParsedMetric (or similar)... is it ok if we call get_rendered in there? I don't want to explode the number of places that we render jinja code, but i definitely hear what you're saying

If it's only the 'metrics' call that we need to capture, this feels like something that we might want to do with static parsing eventually. Wouldn't need to do that in the first pass though.

I think we're already calling get_rendered in ParsedMetric in "resolve_metric_references". And it does seem out of place there too. Hmm... I'm not exactly sure what the best way of factoring this is. Separation of concerns would be nice, but the concerns seem kind of inherently tangled.

gshank

This looks like a good solid start :)

We use pre-commit to do black, flake8 and mypy, and there are a number of mypy complaints about this. Sometimes in order to make mypy happy you have to move things around a bit, so you might want to pip install pre-commit, and do 'pre-commit run --all-files'.

For partial parsing, will we need to re-parse anything with a changed metric reference? I think we'll have to handle metric references in a handful of places.

core/dbt/parser/manifest.py

gshank · 2022-04-11T23:07:52Z

core/dbt/contracts/graph/metrics.py

+        return f'{namespace}.{self.metric_name}'
+
+    def __str__(self):
+        return self.node.name


When would the ResolvedMetricReference get stringified?

gshank · 2022-04-11T23:16:12Z

core/dbt/contracts/graph/parsed.py

    created_at: float = field(default_factory=lambda: time.time())

+    def resolve_metric_references(self, context):
+        # TODO: Obviously don't do this...


Yes, exposures don't refer to themselves, so exposures refs, sources, etc get resolved in the various parser.manifest 'process_XXX' methods. I wouldn't call what the metrics need a "patch" step, because when we did that we saved a bunch of patch dictionaries (with config, etc) until the end of parsing and applied them then. So far metrics just need a resolution phase, as you've implemented.

gshank · 2022-04-11T23:29:48Z

core/dbt/parser/schemas.py

@@ -1042,8 +1046,42 @@ def parse_metric(self, unparsed: UnparsedMetric) -> ParsedMetric:
            self.schema_parser.manifest,
            package_name,
        )
-        model_ref = "{{ " + unparsed.model + " }}"
-        get_rendered(model_ref, ctx, parsed, capture_macros=True)
+        if unparsed.type == 'ratio':


It feels wrong to me to have this code be in the parser. I'd be more comfortable if we could move some of it into the UnparsedMetric or ParsedMetric or a subclass based on type... Having to render it does complicate things though. Is it actually necessary to render the numerator and denominator at parse time? Can they actually contain macros? Because if they can't contain macros, I would think the get_rendered is a no-op.

gshank · 2022-04-11T23:30:43Z

core/dbt/contracts/graph/metrics.py

+        self.package_name = package_name
+
+
+class ResolvedMetricReference(MetricReference):


This looks reasonable to me.

core/dbt/contracts/graph/unparsed.py

core/dbt/context/providers.py

jtcohen6 · 2022-06-29T14:27:38Z

core/dbt/contracts/graph/manifest.py

-        self.metrics[metric.unique_id] = metric
-        source_file.metrics.append(metric.unique_id)
+        if not metric.config.enabled:
+            self.add_disabled_nofile(metric)


Not sure if this piece is actually working for a metric with

metrics: - name: some_metric config: enabled: False

I'll open a separate issue for supporting config.enabled on metrics (and exposures!) In the meantime, I think we should probably just remove the enabled-related code from this PR

jtcohen6 · 2022-06-29T14:32:23Z

core/dbt/contracts/graph/metrics.py

+    def __str__(self):
+        # TODO : Qualify the metric name as a CTE or field?
+        return f"{self.metric_name}.metric_value"


Discussed with @callum-mcdata live: let's just make this return f"{self.metric_name}" instead, since otherwise we're just replacing this with "" in Jinja here

jtcohen6 · 2022-06-29T14:34:18Z

core/dbt/contracts/graph/unparsed.py

+        # TODO: Expressions _cannot_ have `model` properties
+        if data.get("model") is None and data.get("type") != "expression":
+            raise ValidationError("Non-expression metrics require a 'model' property")


@callum-mcdata To add validation for the opposite case (this TODO)

jtcohen6 · 2022-06-29T14:42:36Z

core/dbt/contracts/graph/parsed.py

    resource_type: NodeType = NodeType.Metric
    meta: Dict[str, Any] = field(default_factory=dict)
    tags: List[str] = field(default_factory=list)
    sources: List[List[str]] = field(default_factory=list)
    depends_on: DependsOn = field(default_factory=DependsOn)
    refs: List[List[str]] = field(default_factory=list)
+    metrics: List[List[str]] = field(default_factory=list)
+    config: SourceConfig = field(default_factory=SourceConfig)


I think we should remove this (and related code) for now, and split off support for disabling metrics into a separate unit of work

…labs/dbt-core into feature/metric-improvements

* Bumping manifest version to v6 * Adding manifest file for tests * Reverting unneeded changes * Updating v6 * Updating test to add metrics field * Adding changelog

…ure/metric-improvements

gshank

This looks really good!

I think there might be a minor issue related to partial parsing. We now have metric references in models. If a model and metric have been parsed and the metric is deleted or renamed, I'm pretty sure we aren't scheduling the model for re-parsing, so the error would show up at execution time rather than parse time. I don't consider this a blocker -- we can fix it later.

* Update test_previous_version_state for v6. Cleanup * Regenerate, rm breakpoint * Code checks * Add assertion that will fail when we bump manifest version * update tests to automatically tests all previous versions Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com>

* wip * More support for ratio metrics * Formatting and linting * Fix unit tests * Support disabling metrics * mypy * address all TODOs * make pypy happy * wip * checkpoint * refactor, remove ratio_terms * flake8 and unit tests * remove debugger * quickfix for filters * Experiment with functional testing for 'expression' metrics * reformatting slightly * make file and mypy fix * remove config from metrics - wip * add metrics back to context * adding test changes * fixing test metrics * revert name audit * pre-commit fixes * add changelog * Bumping manifest version to v6 (dbt-labs#5430) * Bumping manifest version to v6 * Adding manifest file for tests * Reverting unneeded changes * Updating v6 * Updating test to add metrics field * Adding changelog * add v5 to backwards compatibility * Clean up test_previous_version_state, update for v6 (dbt-labs#5440) * Update test_previous_version_state for v6. Cleanup * Regenerate, rm breakpoint * Code checks * Add assertion that will fail when we bump manifest version * update tests to automatically tests all previous versions Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com> Co-authored-by: Callum McCann <cmccann51@gmail.com> Co-authored-by: Emily Rockman <emily.rockman@dbtlabs.com> Co-authored-by: leahwicz <60146280+leahwicz@users.noreply.github.com>

drewbanin added 4 commits March 16, 2022 12:52

wip

72cad80

Merge branch 'main' of github.com:fishtown-analytics/dbt into feature…

cf23d65

…/metric-improvements

Merge branch 'main' into feature/metric-improvements

d43e859

More support for ratio metrics

8dcfeb1

cla-bot bot added the cla:yes label Apr 11, 2022

drewbanin commented Apr 11, 2022

View reviewed changes

drewbanin changed the title ~~Add support for ratio metrics~~ (#4884) Add support for ratio metrics Apr 11, 2022

gshank requested changes Apr 11, 2022

View reviewed changes

drewbanin added 3 commits April 13, 2022 16:57

Formatting and linting

aae1c81

Fix unit tests

4d4198c

Support disabling metrics

a57d3b0

jtcohen6 added the Team:Language label Apr 14, 2022

drewbanin added 5 commits April 14, 2022 09:55

mypy

395393e

address all TODOs

55e7ab7

make pypy happy

a9e839e

Merge branch 'main' into feature/metric-improvements

3a8385b

wip

db35e88

callum-mcdata mentioned this pull request Jun 2, 2022

Use hypothetical get_metric() contextmember to remove gnarliest code dbt-labs/dbt_metrics#11

Closed

drewbanin and others added 6 commits June 3, 2022 10:55

checkpoint

d6e886c

refactor, remove ratio_terms

7af4c51

flake8 and unit tests

32c3c53

remove debugger

0344a20

quickfix for filters

bee3eea

Experiment with functional testing for 'expression' metrics

934605e

jtcohen6 mentioned this pull request Jun 28, 2022

[CT-109] [Feature] Show a hint in the error message when trying to access the graph whenever execute = False #4641

Closed

1 task

reformatting slightly

06eb926

emmyoop reviewed Jun 29, 2022

View reviewed changes

core/dbt/contracts/graph/unparsed.py Outdated Show resolved Hide resolved

emmyoop reviewed Jun 29, 2022

View reviewed changes

core/dbt/context/providers.py Outdated Show resolved Hide resolved

emmyoop reviewed Jun 29, 2022

View reviewed changes

core/dbt/context/providers.py Outdated Show resolved Hide resolved

jtcohen6 reviewed Jun 29, 2022

View reviewed changes

make file and mypy fix

4461d7a

jtcohen6 mentioned this pull request Jun 29, 2022

Expression (?) metrics dbt-labs/docs.getdbt.com#1638

Closed

1 task

emmyoop and others added 8 commits June 29, 2022 12:36

remove config from metrics - wip

31cfa7f

add metrics back to context

60ef314

adding test changes

cd957a6

fixing test metrics

ba3a78c

revert name audit

54c38d0

Merge branch 'feature/metric-improvements' of https://github.com/dbt-…

7b03989

…labs/dbt-core into feature/metric-improvements

pre-commit fixes

46679e2

add changelog

b8a7f99

emmyoop requested a review from gshank July 5, 2022 14:05

emmyoop marked this pull request as ready for review July 5, 2022 14:05

emmyoop requested a review from a team July 5, 2022 14:05

emmyoop requested review from a team as code owners July 5, 2022 14:05

emmyoop requested a review from stu-k July 5, 2022 14:05

leahwicz mentioned this pull request Jul 5, 2022

Bumping manifest version to v6 #5430

Merged

6 tasks

leahwicz and others added 3 commits July 5, 2022 15:47

Bumping manifest version to v6 (#5430)

e235ab7

* Bumping manifest version to v6 * Adding manifest file for tests * Reverting unneeded changes * Updating v6 * Updating test to add metrics field * Adding changelog

Merge branch 'main' of https://github.com/dbt-labs/dbt-core into feat…

c28628f

…ure/metric-improvements

add v5 to backwards compatibility

68f7708

emmyoop mentioned this pull request Jul 5, 2022

[CT-781] Bump manifest schema to v6 for v1.2 #5417

Closed

gshank approved these changes Jul 6, 2022

View reviewed changes

emmyoop mentioned this pull request Jul 6, 2022

[CT-815] Partial parsing error at execution for metric modification #5444

Closed

emmyoop merged commit 064d890 into main Jul 6, 2022

emmyoop deleted the feature/metric-improvements branch July 6, 2022 21:01

jtcohen6 mentioned this pull request Sep 23, 2022

Adding metric expression validation #5873

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(#4884) Add support for ratio metrics #5027

(#4884) Add support for ratio metrics #5027

drewbanin commented Apr 11, 2022 •

edited

Loading

github-actions bot commented Apr 11, 2022

drewbanin left a comment

drewbanin Apr 11, 2022

jtcohen6 Apr 11, 2022

drewbanin Apr 11, 2022

gshank Apr 11, 2022

callum-mcdata Jun 29, 2022

drewbanin Apr 11, 2022

gshank Apr 11, 2022

drewbanin Apr 12, 2022 •

edited

Loading

gshank Apr 12, 2022

drewbanin Apr 11, 2022

gshank Apr 11, 2022

drewbanin Apr 11, 2022

drewbanin Apr 11, 2022

drewbanin Apr 11, 2022

jtcohen6 Apr 11, 2022

gshank Apr 11, 2022

drewbanin Apr 12, 2022

gshank Apr 12, 2022

gshank left a comment

gshank Apr 11, 2022

gshank Apr 11, 2022

gshank Apr 11, 2022

gshank Apr 11, 2022

jtcohen6 Jun 29, 2022

jtcohen6 Jun 29, 2022

jtcohen6 Jun 29, 2022

jtcohen6 Jun 29, 2022

jtcohen6 Jun 29, 2022

jtcohen6 Jun 29, 2022

gshank left a comment

		self.package_name = package_name


		class ResolvedMetricReference(MetricReference):

(#4884) Add support for ratio metrics #5027

(#4884) Add support for ratio metrics #5027

Conversation

drewbanin commented Apr 11, 2022 • edited Loading

Description

Checklist

github-actions bot commented Apr 11, 2022

drewbanin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drewbanin Apr 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gshank left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gshank left a comment

Choose a reason for hiding this comment

drewbanin commented Apr 11, 2022 •

edited

Loading

drewbanin Apr 12, 2022 •

edited

Loading