-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-121] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension #27544
Comments
@zhaoyongjie this looks great, and I think it should solve what we're trying to build. Do you have an example of what the resulting sql query would look like from this api schema? |
Hey @zhaoyongjie Thanks for putting this together, it looks really promising. On top of the question from @eschutho I would like to know if we should consider different databases capabilities like CTE and JOINS support when comparing and having a fallback as part of this new processing? Just to handle as mentioned before, some cases where some of those are not supported. Thanks. |
I'll reply the topic tonight, thanks for the mention. |
@eschutho @Antonio-RiveroMartnez In practice, the JOIN in Superset has been discussed many times in the community, but the original technical design doesn't seem to have considered this use case. The data modeling in Superset depends on SQLLab exposing a "virtual dataset" to achieve it. |
@zhaoyongjie can you give more context on why you'd need to compute a time delta? |
This is incredibly useful and something I'm surprised isn't built in to superset's tables already. Thanks for putting this together! |
Background
The period-over-period analysis is a BI technique that compares metrics from a recent period to the same period in the past. We already have this feature in Advanced Analytics and apply it to time-series analytic visualization. This feature utilizes
time_offsets
andpost_processing
in thequery_object
to achieve period-over-period analysis. Practically, this feature, along with the entire Advanced Analytics section, should have supported all visualizations because the legacy Time Section has been removed and the Time Filter has already moved into Adhoc Filter.As an experiment, as long as the time dimension is provided and corresponding time filter populate in Adhoc Filter, any charts should easily incorporate this feature. I've opened a toy pull request to demonstrate that only 5 frontend lines (excluding import statements) are needed to integrate this feature into the current Table chart.
Motivation
Currently, there are some limitations in time_offsets and Advanced Analytics:
Theoretically, Period-over-period analysis is a method of analyzing different data slices base on specific dimensions within the same dataset. Therefore, the comparisons should not only be based on temporal dimensions, but also on textual dimensions.
According to the first perspective, flexable filters should be provided when users compare data slices, not just time filters.
By default, time_offset uses the first occurrence of a time column in
query_object.columns
as the basis for time shifted. It cannot specify which time column to use for generating shifted time range because there can be multiple time filters in current Adhoc Filter.The actual shifted time-range values cannot be displayed on the UI and there isn't an approach to provide it.
Based on the above analysis, I propose to enhance/refactoring the
time_offsets
field to achieve a more generalizedperiod-over-period analysis
.Proposed Changes
To illustrate what needs to change, I will construct a time comparison using the current
time_offsets
field and thecleaned_sales_data
dataset. Thisquery_object
will generate a time comparison based on theorder_date
, comparing data slices from1 week ago
,1 month ago
, and1 year ago
:After the change, the
time_offsets
will replace withcomparison
field:on: specifies which columns will be used to join the data slices.
how: determines how the data slices will be joined, providing options "inner" and "left," with the default value being "left"
suffix : defines suffixes for compared metrics
time_filter_offsets: specifies the time offset delta if performing a time comparison; if performing a textual dimension comparison, it should be ignored
filters: indicates the filters applied on the specific data slice
The following
query_object
payload show that how to generate a textual dimension comparisonNew or Changed Public Interfaces
New dependencies
No new dependencies
Migration Plan and Compatibility
Provide DB migration script
Rejected Alternatives
The text was updated successfully, but these errors were encountered: