Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to use “Calculated columns” in “Custom metrics”, as they are not expanded. #15060

Closed
2 of 3 tasks
EBoisseauSierra opened this issue Jun 9, 2021 · 4 comments
Labels
enhancement:request Enhancement request submitted by anyone from the community explore:dataset Related to the dataset of Explore

Comments

@EBoisseauSierra
Copy link
Contributor

When creating a “calculated column” foo (defined, e.g. as col1 + col2), I can't reuse it in the “custom metric” bar, as the generated SQL query uses foo literally (rather than expanding into its col1 + col2 definition).

This fails as foo doesn't actually exist as a column in the table I query.

Expected results

  1. I define a calculated column (here, as the difference of 2 columns of my PostgreSQL table):
    Screenshot from 2021-06-09 10-36-40_shadow
  2. I define a custom metrics that notably reuses my previously defined calculated column:
    Screenshot from 2021-06-09 10-42-21_shadow

I want to be able to now build charts using that custom metrics:
Screenshot from 2021-06-09 10-52-04_shadow

This means, running the following query:

SELECT DATE_TRUNC('hour', timestamp) AS __timestamp,
       AVG(ABS(temperature_feelslike - temperature))::numeric(10, 2)/AVG(temperature) AS avg_temp_offset
FROM public.weather_forecast
GROUP BY DATE_TRUNC('hour', timestamp)
LIMIT 50000;

Actual results

However, trying to use the custom metrics in Explore fails:
Screenshot from 2021-06-09 10-59-37_shadow

Indeed, the following query is executed:

SELECT DATE_TRUNC('hour', timestamp) AS __timestamp,
       AVG(ABS(feelslike_offset))::numeric(10, 2)/AVG(temperature) AS avg_temp_offset
FROM public.weather_forecast
GROUP BY DATE_TRUNC('hour', timestamp)
LIMIT 50000;

(I.e. the feelslike_offset calculated column is not expanded to its definition: temperature_feelslike - temperature.)

Environment

(please complete the following information):

  • superset version: 1.1.0
  • python version: 3.8.9
  • node.js version: node -v

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.
@EBoisseauSierra EBoisseauSierra added the #bug Bug report label Jun 9, 2021
@villebro
Copy link
Member

villebro commented Jun 9, 2021

Thanks for posting! I'm not sure this is a bug, rather a feature request to expand calculated columns when referenced in other calculated columns. Something to keep in mind: The implementation could get nasty fairly quickly, as it would involve having to parse the expression, make assumptions about which elements are column references (sqlparse might do this fairly well, not sure) and match them against existing physical and calculated columns, not to mention making sure there are no circular dependencies. But definitely worth considering.

@EBoisseauSierra
Copy link
Contributor Author

You're right:

  • I naively posted that as a bug, because I assumed that the default/expected behaviour is for calculated columns to be re-usable elsewhere. However, they aren't, for example, made available in the SQL-lab — cf. the discussion on Table vs. Dataset in SIP-68 ([SIP-68] A better model for Datasets #14909). I'm fine with reframing this as a feature request.
  • I understand that expansion could lead to some possibly tricky situations. I believe it would make sense to wait for post SIP-68 to develop this feature (if ever), as significant changes are likely to happen in the meantime.

Meanwhile, the already implementable workaround is to manually expand the calculated columns in a custom metric definition — in other words, to base the metric on table's native columns only.

@villebro
Copy link
Member

villebro commented Jun 9, 2021

Another viable solution could be to make a virtual table where the expression is precalculated. This way you get to reference it similar to a regular column without that much trouble, as most modern DBs are able to effectively optimize subqueries so the performance hit should be negligible.

@zhaoyongjie
Copy link
Member

currently, Superset's "Calculated columns" is SQL snippet for specific database, instead of defined in BI-side expression (for instance: tableau function, power bi dax).

It is not easy to know whether "Calculated columns" is a reference type, or a primitive type.

@zhaoyongjie zhaoyongjie added enhancement:request Enhancement request submitted by anyone from the community and removed #bug Bug report labels Jun 9, 2021
@junlincc junlincc added the explore:dataset Related to the dataset of Explore label Jun 10, 2021
@apache apache locked and limited conversation to collaborators Feb 2, 2022
@geido geido converted this issue into discussion #18486 Feb 2, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
enhancement:request Enhancement request submitted by anyone from the community explore:dataset Related to the dataset of Explore
Projects
None yet
Development

No branches or pull requests

4 participants