Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SL-1845] [Bug] Conversion metric where metric_time clause returns not a number, BQ adapter #1072

Open
2 tasks done
Tracked by #1213
KayakinKoder opened this issue Mar 11, 2024 · 3 comments
Open
2 tasks done
Tracked by #1213
Labels
backlog bug Something isn't working linear

Comments

@KayakinKoder
Copy link

KayakinKoder commented Mar 11, 2024

Is this a new bug in metricflow?

  • I believe this is a new bug in metricflow
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When querying conversion metrics, a where with metric_time returns 'nan' for prior time grains:

where_year

Expected Behavior

Data prior to the where is not returned

Steps To Reproduce

  1. Create a conversion metric on BigQuery data
  2. Query it with a where metric_time clause

Relevant log output

`compile=True` gives the following, note I've redacted a few things just to be on the safe side, search for "[...redacted" for those:


SQL
"SELECT 
   metric_time__year 
   , CAST(customers_firstpurchdt_fortrialconv AS FLOAT64) / CAST(NULLIF(customers_firsttrialdt_fortrialconv, 0) AS FLOAT64) AS conv_completetrial_to_purchases 
 FROM ( 
   SELECT 
     COALESCE(subq_4.metric_time__year, subq_14.metric_time__year) AS metric_time__year 
     , MAX(subq_4.customers_firsttrialdt_fortrialconv) AS customers_firsttrialdt_fortrialconv 
     , MAX(subq_14.customers_firstpurchdt_fortrialconv) AS customers_firstpurchdt_fortrialconv 
   FROM ( 
     SELECT 
       metric_time__year 
       , COUNT(DISTINCT customers_firsttrialdt_fortrialconv) AS customers_firsttrialdt_fortrialconv 
     FROM ( 
       SELECT 
         DATE_TRUNC(trial_firstcompleted_endsat, year) AS metric_time__year 
         , case when ([...redacted, basic sql...] and (trial_firstcompleted_endsat < firstpurch_date or firstpurch_date is null)) then customer_id else null end AS customers_firsttrialdt_fortrialconv 
       FROM `[...redacted, bq project]`.`[...redacted, bq schema]`.`us_customers` customers_src_10000 
     ) subq_2 
     WHERE metric_time__year >= '2022-12-04' 
     GROUP BY 
       metric_time__year 
   ) subq_4 
   FULL OUTER JOIN ( 
     SELECT 
       metric_time__year 
       , COUNT(DISTINCT customers_firstpurchdt_fortrialconv) AS customers_firstpurchdt_fortrialconv 
     FROM ( 
       SELECT DISTINCT 
         first_value(subq_7.customers_firsttrialdt_fortrialconv) OVER (PARTITION BY subq_10.customer, subq_10.firstpurch_date__day, subq_10.mf_internal_uuid ORDER BY subq_7.trial_firstcompleted_endsat__day DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS customers_firsttrialdt_fortrialconv 
         , first_value(subq_7.trial_firstcompleted_endsat__day) OVER (PARTITION BY subq_10.customer, subq_10.firstpurch_date__day, subq_10.mf_internal_uuid ORDER BY subq_7.trial_firstcompleted_endsat__day DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS trial_firstcompleted_endsat__day 
         , first_value(subq_7.metric_time__year) OVER (PARTITION BY subq_10.customer, subq_10.firstpurch_date__day, subq_10.mf_internal_uuid ORDER BY subq_7.trial_firstcompleted_endsat__day DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS metric_time__year 
         , first_value(subq_7.customer) OVER (PARTITION BY subq_10.customer, subq_10.firstpurch_date__day, subq_10.mf_internal_uuid ORDER BY subq_7.trial_firstcompleted_endsat__day DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS customer 
         , subq_10.mf_internal_uuid AS mf_internal_uuid 
         , subq_10.customers_firstpurchdt_fortrialconv AS customers_firstpurchdt_fortrialconv 
       FROM ( 
         SELECT 
           DATE_TRUNC(trial_firstcompleted_endsat, day) AS trial_firstcompleted_endsat__day 
           , DATE_TRUNC(trial_firstcompleted_endsat, year) AS metric_time__year 
           , customer_id AS customer 
           , case when ([...redacted, basic sql...] and (trial_firstcompleted_endsat < firstpurch_date or firstpurch_date is null)) then customer_id else null end AS customers_firsttrialdt_fortrialconv 
         FROM `[...redacted, bq project]`.`[...redacted, bq schema]`.`us_customers` customers_src_10000 
       ) subq_7 
       INNER JOIN ( 
         SELECT 
           DATE_TRUNC(firstpurch_date, day) AS firstpurch_date__day 
           , customer_id AS customer 
           , case when ([...redacted, basic sql...] and trial_firstcompleted_endsat < firstpurch_date) then customer_id else null end AS customers_firstpurchdt_fortrialconv 
           , GENERATE_UUID() AS mf_internal_uuid 
         FROM `[...redacted, bq project]`.`[...redacted, bq schema]`.`us_customers` customers_src_10000 
       ) subq_10 
       ON 
         ( 
           subq_7.customer = subq_10.customer 
         ) AND ( 
           ( 
             subq_7.trial_firstcompleted_endsat__day <= subq_10.firstpurch_date__day 
           ) 
         ) 
     ) subq_11 
     GROUP BY 
       metric_time__year 
   ) subq_14 
   ON 
     subq_4.metric_time__year = subq_14.metric_time__year 
   GROUP BY 
     metric_time__year 
 ) subq_15 
 ORDER BY metric_time__year DESC"

Environment

dbt Cloud=1.7.8
Registered adapter: bigquery=1.7.6
metricflow: Latest? We're running this in cloud prod account

Which database are you using?

bigquery

Additional Context

No response

SL-1845

@KayakinKoder KayakinKoder added bug Something isn't working triage Tasks that need to be triaged labels Mar 11, 2024
@KayakinKoder
Copy link
Author

May or may not be relevant, but here are the measures. (x and y are columns in our prod db that I need to redact for security reasons)

The somewhat complicated expr is explained in the description:

measures:
      - name: customers_firsttrialdt_fortrialconv
        description: |
          The distinct count of customers who completed a trial, EXCLUDING users who purchased *before* they completed a trial. This is used for calculating trial conversion rate; we don't want to include users who puchased *before* completing a trial
        agg: count_distinct
        expr: case when (x = 0 and y > 0 and (trial_firstcompleted_endsat < firstpurch_date or firstpurch_date is null)) then customer_id else null end
        agg_time_dimension: trial_firstcompleted_endsat_dt
        
      - name: customers_firstpurchdt_fortrialconv
        description: |
          The distinct count of customers by their first purchase date EXCLUDING users who purchased before they completed a trial. This is used for calculating trial conversion rate; we don't want to include users who puchased *before* completing a trial
        agg: count_distinct
        expr: case when (x = 0 and has_purchased = 1 and trial_firstcompleted_endsat < firstpurch_date) then customer_id else null end
        agg_time_dimension: firstpurch_date_dt

@tlento
Copy link
Contributor

tlento commented Mar 13, 2024

Ok I see what's happening. We are failing to render the filter on the outside of the final join so any dates in the offset dataset are not getting pruned and you're hitting NULL output values as a result of the final join.

We'll get this prioritized and fixed. In the meantime you'll have to manually throw away those rows - you can do this in the JDBC interface by adding a secondary where expression outside of the call to semantic_layer.query.

Thanks for flagging!

@marcodamore marcodamore changed the title [Bug] Conversion metric where metric_time clause returns not a number, BQ adapter [SL-1845] [Bug] Conversion metric where metric_time clause returns not a number, BQ adapter Mar 13, 2024
@tlento tlento removed the triage Tasks that need to be triaged label May 14, 2024
@tlento
Copy link
Contributor

tlento commented May 14, 2024

This is still on my radar but also in the backlog, I'm collecting conversion metric filter issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog bug Something isn't working linear
Projects
None yet
Development

No branches or pull requests

3 participants