-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SL-2184] Conversion metrics do not apply filters to input conversion measures #1199
Comments
Worth noting - it's entirely possible that applying filters to just base is adequate because they're applied after the conversion join and before aggregation, so this might simply be a predicate pushdown issue. This seems unlikely since the conversion metric join is applied to the filtered and aggregated output, which means the filters only get applied correctly if the elements (in the example above, the account__fraud_score) are included in constant_properties. |
In working through enabling predicate pushdown, I discovered that conversion metrics only allow filters to be applied to the base measure, and not the conversion measure.
For example, if I have this:
as a user I think I would reasonably expect that both the base and conversion measures would throw away any input rows where the
account__fraud_score
was> 0.2
.What we do in practice is apply the filter to the base measure, but throw it away when configuring the conversion measure, which means we will overstate the rate by some arbitrary amount. Just how much depends on how the filter skews the data.
Since filters like these are often applied specifically to deal with non-representative subsets of data we might be grossly overstating conversions in this case.
This is almost certainly done to avoid applying a pre-aggregation time filter on conversions - in this case, we would need to include the following 3 days of conversion events for every base event, but if the time filter is applied directly to conversion events we may lose those inputs. However, it's a blanket obliteration of the filter predicates from both query input and metric definition, and this can produce incorrect results.
Currently, users can work around this sort of filter issue by making a categorical dimension and marking it as a constant property, but this is, of course, fairly clumsy, and the hidden gotcha of throwing away conversion measure filters seems undesirable.
Where things get tricky, however, is in figuring out what to do about a categorical dimension filter passed down from outside. If a user applies the account__fraud_score filter at query time, where should that be applied? What if they write a single filter expression like
"{{ Dimension('account__fraud_score') }} > 0.2 AND {{ Dimension('metric_time') }} BETWEEN ...."
We'll likely need to do the full base -> conversion input join along all filter elements as constant properties, apply the filter against that output, and then do the final aggregations as appropriate.
SL-2184
The text was updated successfully, but these errors were encountered: