Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SL-783] Coalesce nulls in MF #759

Closed
marcodamore opened this issue Sep 5, 2023 · 2 comments
Closed

[SL-783] Coalesce nulls in MF #759

marcodamore opened this issue Sep 5, 2023 · 2 comments
Labels
backlog dbt-core 1.7 Created by Linear-GitHub Sync High priority Created by Linear-GitHub Sync

Comments

@marcodamore
Copy link
Collaborator

@bettiolo.damore said:

🟢 Question from community:

dbt metrics had the ability to configure metrics to fill zero when no data was found for a given time grain (rather than exclude that time grain record from results) with treat_null_values_as_zero Was there a reason this was dropped? And are there plans to include similar capabilities in the future?
Transform included this as a post-processor I believe… is this something we can add to MF instead?

From SyncLinear.com | SL-783

@marcodamore marcodamore added dbt-core 1.7 Created by Linear-GitHub Sync High priority Created by Linear-GitHub Sync Metricflow Created by Linear-GitHub Sync labels Sep 5, 2023
@Jstein77
Copy link
Contributor

Jstein77 commented Sep 5, 2023

Why should we build this?

It’s common for an analyst to want to join a metric to a time-spine table and show a zero value instead of null even if there are no values for that time period. This is useful for performing further calculations on the metric or to carry values forward for reporting.

For example, say I have the following MRR data:

customer dim_date_month mrr mrr_last_month mrr_change mrr_change_category
1 2022-01-01 100 100 0 no change
1 2022-01-02 100 100 0 no change
1 2022-01-03 100 100 0 no change
1 2022-01-04 200 100 100 expansion
1 2022-01-05 200 200 0 no change

Who are we building this for?

  • I’m working with a sparse event data set, like MRR data, and i want to record a value for dates when no event occurs
  • I want to shape my data in a specific way for reporting, without having to join to a time spine table in dbt, or in a downstream tool

What are we building?

  • What does the config for the Time Spine look like?

    Users need a way to specify if a metric should fill null values in their configs. There is prior arty to this, and we can reuse the treat_null_values_as_zero: true flag from dbt_metrics.

    An example of how this would look in the yaml is below. Note this would require spec changes so the earliest we could release this in 1.7. Looking ahead, we want to enable users to have entities on the left side of the join, and metric values on the right side. This join pattern will produce a lot of null and we will want to know what to pad null values with.

    measures:
          - name: total_mrr_prev_month
            agg: sum
            expr: mrr_previous_month
            non_additive_dimension:
              name: dim_date_month
              window_choice: max  # not summable over time, take the last value per time unit
              window_groupings: []
          - name: total_mrr_change_retention
            agg: sum
            expr: mrr_change
            non_additive_dimension:
              name: dim_date_month
              window_choice: max  # not summable over time, take the last value per time unit
              window_groupings: []
    metrics:
    	- name: total_mrr_prev_month
    		type: simple
    		type_params:
    			measure:  -- only specified for input measures
    				name: total_mrr_prev_month
    				join_to_timespine: true # This will join the metric to a time spine. Default to false
    				fill_nulls_with: 0 # Value used to fill nulls
    				alias: an_alias
    	- name: total_mrr_prev_month_ratio
    		type: ratio
    		type_params:
    			metrics: 
    				- name: total_mrr_prev_month -- pad numerator
    				- name: denominator
    

    Should this configuration live on a measure, or on a metric?

    We will specify the join_to_timespine and fill_nulls_with attributes as a measure input parameter for metrics. Refer to the spec above for an example

    If you specify a simple metric with join_to_timespine, and then use this as an input to a ratio you can set this value again. Should it override?

    • We should start with just adding join_to_timespine and ``fill_nulls_with`on input measures for metrics. This means you won’t be able to specify this parameter on derived metric or ratio metrics so there won’t be a collision. Failure case we see people creating two versions, one padded and one unpadded.

@tlento
Copy link
Contributor

tlento commented Sep 7, 2023

Quick note after reading through the excellent examples in #764 - if the user selects zero substitution we need to pay careful attention to our join logic, as inner vs outer joins can have radically different behavior.

The time spine is an answer for missing values against metric_time, but broader dimension value joins can still filter out input rows inappropriately if there's an inner join and an expectation of zero substitution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog dbt-core 1.7 Created by Linear-GitHub Sync High priority Created by Linear-GitHub Sync
Projects
None yet
Development

No branches or pull requests

3 participants