[SIP-121] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension #27544

zhaoyongjie · 2024-03-17T14:56:05Z

Background

The period-over-period analysis is a BI technique that compares metrics from a recent period to the same period in the past. We already have this feature in Advanced Analytics and apply it to time-series analytic visualization. This feature utilizes time_offsets and post_processing in the query_object to achieve period-over-period analysis. Practically, this feature, along with the entire Advanced Analytics section, should have supported all visualizations because the legacy Time Section has been removed and the Time Filter has already moved into Adhoc Filter.

As an experiment, as long as the time dimension is provided and corresponding time filter populate in Adhoc Filter, any charts should easily incorporate this feature. I've opened a toy pull request to demonstrate that only 5 frontend lines (excluding import statements) are needed to integrate this feature into the current Table chart.

Motivation

Currently, there are some limitations in time_offsets and Advanced Analytics:

Theoretically, Period-over-period analysis is a method of analyzing different data slices base on specific dimensions within the same dataset. Therefore, the comparisons should not only be based on temporal dimensions, but also on textual dimensions.
According to the first perspective, flexable filters should be provided when users compare data slices, not just time filters.
By default, time_offset uses the first occurrence of a time column in query_object.columns as the basis for time shifted. It cannot specify which time column to use for generating shifted time range because there can be multiple time filters in current Adhoc Filter.
The actual shifted time-range values cannot be displayed on the UI and there isn't an approach to provide it.

Based on the above analysis, I propose to enhance/refactoring the time_offsets field to achieve a more generalized period-over-period analysis.

Proposed Changes

To illustrate what needs to change, I will construct a time comparison using the current time_offsets field and the cleaned_sales_data dataset. This query_object will generate a time comparison based on the order_date, comparing data slices from 1 week ago, 1 month ago, and 1 year ago :

{
  "columns": [{
    "timeGrain": "P1D",
    "columnType": "BASE_AXIS",
    "sqlExpression": "order_date",
    "label": "order_date",
    "expressionType": "SQL"
  }],
  "metrics": ["count"],
  "filters": [
    {
      "col": "order_date",
      "op": "TEMPORAL_RANGE",
      "val": "2004-01-01 : 2004-01-31"
    },
    {
      "col": "deal_size",
      "op": "==",
      "val": "Small"
    }
  ],
  "time_offsets": [
    "1 week ago",
    "1 month ago",
    "1 year ago"
  ],
}

After the change, the time_offsets will replace with comparison field:

{
  "columns": [{
    "timeGrain": "P1D",
    "columnType": "BASE_AXIS",
    "sqlExpression": "order_date",
    "label": "order_date",
    "expressionType": "SQL"
  }],
  "metrics": ["count"],
  "filters": [
    {
      "col": "order_date",
      "op": "TEMPORAL_RANGE",
      "val": "2004-01-01 : 2004-01-31"
    },
    {
      "col": "deal_size",
      "op": "==",
      "val": "Small"
    }
  ],
  "comparisons": {
    "on": ["order_date"],
    "how": "left",
    "suffix": ["1 week ago", "1 month ago", "1 year ago"],
    "time_filter_offsets": [
      "1 week ago",
      "1 month ago",
      "1 year ago"
    ],
    "filters": [
      [
        {
          "col": "deal_size",
          "op": "==",
          "val": "Small"
        }
      ],
      [
        {
          "col": "deal_size",
          "op": "==",
          "val": "Small"
        }
      ],
      [
        {
          "col": "deal_size",
          "op": "==",
          "val": "Small"
        }
      ],
    ],
  },
}

on: specifies which columns will be used to join the data slices.
how: determines how the data slices will be joined, providing options "inner" and "left," with the default value being "left"
suffix : defines suffixes for compared metrics
time_filter_offsets: specifies the time offset delta if performing a time comparison; if performing a textual dimension comparison, it should be ignored
filters: indicates the filters applied on the specific data slice

The following query_object payload show that how to generate a textual dimension comparison

{
  "columns": [{
    "timeGrain": "P1D",
    "columnType": "BASE_AXIS",
    "sqlExpression": "order_date",
    "label": "order_date",
    "expressionType": "SQL"
  }],
  "metrics": ["count"],
  "filters": [
    {
      "col": "order_date",
      "op": "TEMPORAL_RANGE",
      "val": "2004-01-01 : 2004-01-31"
    },
    {
      "col": "deal_size",
      "op": "==",
      "val": "Small"
    }
  ],
  "comparisons": {
    "on": ["deal_size"],
    "how": "left",
    "suffix": ["Medium", "Large"],
    "filters": [
      [
        {
          "col": "order_date",
          "op": "TEMPORAL_RANGE",
          "val": "2004-01-01 : 2004-01-31"
        },
        {
          "col": "deal_size",
          "op": "==",
          "val": "Medium"
        }
      ],
      [
        {
          "col": "order_date",
          "op": "TEMPORAL_RANGE",
          "val": "2004-01-01 : 2004-01-31"
        },
        {
          "col": "deal_size",
          "op": "==",
          "val": "Large"
        }
      ],
    ],
  },
}

New or Changed Public Interfaces

New dependencies

No new dependencies

Migration Plan and Compatibility

Provide DB migration script

Rejected Alternatives

The text was updated successfully, but these errors were encountered:

eschutho · 2024-03-21T23:24:38Z

@zhaoyongjie this looks great, and I think it should solve what we're trying to build. Do you have an example of what the resulting sql query would look like from this api schema?

Antonio-RiveroMartnez · 2024-03-25T15:51:55Z

Hey @zhaoyongjie Thanks for putting this together, it looks really promising. On top of the question from @eschutho I would like to know if we should consider different databases capabilities like CTE and JOINS support when comparing and having a fallback as part of this new processing? Just to handle as mentioned before, some cases where some of those are not supported.

Thanks.

zhaoyongjie · 2024-03-25T16:19:44Z

Hey @zhaoyongjie Thanks for putting this together, it looks really promising. On top of the question from @eschutho I would like to know if we should consider different databases capabilities like CTE and JOINS support when comparing and having a fallback as part of this new processing? Just to handle as mentioned before, there are some cases where some of those are not supported.

Also, how datasets that use for example Jinja templates would be compared and processed with the Joins and the offsets? Should we consider cases where we simply don't support comparison and offer the aforementioned fallback methods? or simply ignore them.

Thanks.

I'll reply the topic tonight, thanks for the mention.

zhaoyongjie · 2024-03-26T22:59:58Z

Do you have an example of what the resulting sql query would look like from this api schema?

I would like to know if we should consider different databases capabilities like CTE and JOINS support when comparing and having a fallback as part of this new processing? Just to handle as mentioned before, some cases where some of those are not supported.

@eschutho @Antonio-RiveroMartnez
The SQL will be the same as the time comparison in the previous AA panel, consisting of multiple separate SQL queries. I have also been thinking about how to design a JOIN QueryObject to achieve this use case and replace the entire Pandas Join. However, I am stuck on how to generate an appropriate time expression that contains time delta, as the time delta expression varies in different databases.

In practice, the JOIN in Superset has been discussed many times in the community, but the original technical design doesn't seem to have considered this use case. The data modeling in Superset depends on SQLLab exposing a "virtual dataset" to achieve it.

betodealmeida · 2024-04-01T17:32:25Z

However, I am stuck on how to generate an appropriate time expression that contains time delta, as the time delta expression varies in different databases.

@zhaoyongjie can you give more context on why you'd need to compute a time delta?

zhaoyongjie · 2024-04-01T18:17:25Z

@betodealmeida

If the different data slices would be filtered in datasource, we have to filter the time filter + time delta in the datasource. different database have different time delta expression.

cannmenus-vib · 2024-04-24T14:55:29Z

This is incredibly useful and something I'm surprised isn't built in to superset's tables already. Thanks for putting this together!

zhaoyongjie added the sip Superset Improvement Proposal label Mar 17, 2024

zhaoyongjie changed the title ~~[SIP-WIP] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension in all visualization~~ [SIP-WIP] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension Mar 17, 2024

zhaoyongjie mentioned this issue Mar 21, 2024

[DO NOT MERGE] demonstrate time comparison in Table chart #27545

Closed

9 tasks

zhaoyongjie changed the title ~~[SIP-WIP] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension~~ [SIP-121] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension Mar 22, 2024

Antonio-RiveroMartnez mentioned this issue Mar 28, 2024

feat(plugin): color option for table with time comparison #27716

Merged

9 tasks

zhaoyongjie closed this as completed May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIP-121] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension #27544

[SIP-121] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension #27544

zhaoyongjie commented Mar 17, 2024 •

edited

Loading

eschutho commented Mar 21, 2024 •

edited

Loading

Antonio-RiveroMartnez commented Mar 25, 2024 •

edited

Loading

zhaoyongjie commented Mar 25, 2024

zhaoyongjie commented Mar 26, 2024 •

edited

Loading

betodealmeida commented Apr 1, 2024

zhaoyongjie commented Apr 1, 2024

cannmenus-vib commented Apr 24, 2024

[SIP-121] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension #27544

[SIP-121] Proposal for Metrics Comparison base on the Temporal Dimension or/and Textual Dimension #27544

Comments

zhaoyongjie commented Mar 17, 2024 • edited Loading

Background

Motivation

Proposed Changes

New or Changed Public Interfaces

New dependencies

Migration Plan and Compatibility

Rejected Alternatives

eschutho commented Mar 21, 2024 • edited Loading

Antonio-RiveroMartnez commented Mar 25, 2024 • edited Loading

zhaoyongjie commented Mar 25, 2024

zhaoyongjie commented Mar 26, 2024 • edited Loading

betodealmeida commented Apr 1, 2024

zhaoyongjie commented Apr 1, 2024

cannmenus-vib commented Apr 24, 2024

zhaoyongjie commented Mar 17, 2024 •

edited

Loading

eschutho commented Mar 21, 2024 •

edited

Loading

Antonio-RiveroMartnez commented Mar 25, 2024 •

edited

Loading

zhaoyongjie commented Mar 26, 2024 •

edited

Loading