Make timestamp expression native SQLAlchemy element #7131

villebro · 2019-03-26T17:17:46Z

SUMMARY

The functional part of this PR makes the timestamp expression a native SQLAlchemy element that respects quoting rules for column names (based largely on https://docs.sqlalchemy.org/en/latest/core/compiler.html) . This is a continuation of what was started in #6897, which was really just a hack to make Postgres work, but didn't address e.g. reserved keywords or similar problems that might arise in other dbs. This PR adds a new class TimestampExpression, which can be used in a SQLAlchemy query, which keeps the target column as a native Core element. The column name is only rendered to text once the query is compiled, which ensures that engine specific quoting rules are respected.

Refactoring:

Added mypy typing where code was changed.
Moved some functional parts of sqla/models/get_timestamp_expression() to db_engine_specs/get_timestamp_expr().
Removed models/core/grains() which was no longer needed.

TEST PLAN

Added unit tests that test the central features and tested timeseries graphs using both column and expression, both with and without time grains.

ADDITIONAL INFORMATION

[ ] Has associated issue:
[ ] Changes UI
[ ] Requires DB Migration. Confirm DB Migration upgrade and downgrade tested.
[ ] Introduces new feature or API
[ ] Removes existing feature or API
[x] Fixes bug
[x] Refactors code
[x] Adds test(s)

REVIEWERS

Comments much appreciated @betodealmeida @john-bodley. Also @agrawaldevesh: I had to change the Pinot logic for this PR. Do you have the opportunity to test if this works on your Pinot deployment (I don't have a Pinot installation handy right now)? I also added a few grains while at it, do they work?

john-bodley · 2019-03-26T17:50:22Z

superset/db_engine_specs.py

@@ -117,16 +138,31 @@ class BaseEngineSpec(object):
    max_column_name_length = None

    @classmethod
-    def get_time_expr(cls, expr, pdf, time_grain, grain):
+    def get_time_expr(cls, col: ColumnClause, pdf: Optional[str],


Shouldn't you also provide a default value if pdf and time_grain are optional? Also in the docstring the Optional word is redundant as that's implied for the type hints.

It's my understanding that Optional[str] is synonymous with Union[None, str], not that it is an optional argument. Good point about the docstring.

agrawaldevesh · 2019-03-26T18:05:20Z

Thanks @villebro, Happy to "test this in prod" at Uber and report back to you. The code changes look okay to me and I think they should work.

I have just one question: I am wondering if the TimestampExpression should be a good vehicle to fix the TZ issues plaguing superset: #6768 and #1149 for example.

The problem is that Superset is essentially TZ unaware, and it assumes that the stored data is in the local TZ. I was curious if TimestampExpression could have a ".tz" field in it eventually ? And for databases that do support TZ's, that tz field is plumbed in when generating the timestamp expression sql.

mistercrunch · 2019-03-27T00:51:23Z

superset/connectors/sqla/models.py

-    def get_timestamp_expression(self, time_grain):
-        """Getting the time component of the query"""
+    def get_timestamp_expression(self, time_grain: Optional[str]) \
+            -> Union[TimeExpression, Label]:


Type annotations FTW!

codecov-io · 2019-03-27T08:54:27Z

Codecov Report

Merging #7131 into master will increase coverage by 0.03%.
The diff coverage is 94.28%.

@@            Coverage Diff             @@
##           master    #7131      +/-   ##
==========================================
+ Coverage   65.24%   65.27%   +0.03%     
==========================================
  Files         435      435              
  Lines       21503    21502       -1     
  Branches     2379     2379              
==========================================
+ Hits        14030    14036       +6     
+ Misses       7353     7346       -7     
  Partials      120      120

Impacted Files	Coverage Δ
superset/models/core.py	`83.64% <ø> (-0.11%)`	⬇️
superset/connectors/sqla/models.py	`81.98% <100%> (+0.11%)`	⬆️
superset/db_engine_specs.py	`62.73% <92.3%> (+0.83%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc3b043...e317cde. Read the comment docs.

villebro · 2019-03-30T07:31:57Z

Rebased to fix conflict. @agrawaldevesh thanks for offering to help verify this PR! Regarding the timezone issue, I think it's something that runs slightly deeper in the code, so would probably require some additional planning (perhaps a SIP). Perhaps something along the lines of a setting in superset_config.py to specify a timezone in which all timestamps with timezone info would be cast to (naive timestamps would probably be kept unchanged)? Anyway, a really good idea worth investigating (committer guidance would be helpful).

villebro · 2019-04-14T12:12:37Z

kind reminder @john-bodley @agrawaldevesh for final comments on this PR.

superset/connectors/sqla/models.py

superset/db_engine_specs.py

agrawaldevesh · 2019-04-21T16:55:36Z

superset/db_engine_specs.py

+
+@compiles(TimestampExpression)
+def compile_timegrain_expression(element: TimestampExpression, compiler, **kw):
+    return element.name.replace('{col}', compiler.process(element.col, **kw))


What is "name" of a ColumnClause ?

Do we need to pass this further via the compiler ? Is this supposed to return a string back or is it supposed to return some sqlalchemy struct, and so we might have to do:

compiler.process(element.name.replace('{col}', compiler.process(element.col, **kw)))

name is the part containing the column name/expression; think SELECT name FROM table. In this case name will be a string containing a token {col} to indicate where the native element col should be compiled. The reason the element is kept in native form is because at construction time we want to build an engine agnostic select statement (SqlaTable.get_sqla_query() in connectors/sqla/models.py). By doing it this way we can get a different value for name depending on the dialect when we are finally compiling the whole statement (SqlaTable.get_query_str_extended() in connectors/sqla/models.py). Example:

name = TRUNC(CAST({col} as DATE), 'MI') (string)

col = MyMixedCaseCol (Any Core SQLAlchemy element that can be compiled, but in this example a ColumnClause whose name = MyMixedCaseCol)

When compiling this using different dialects we would get different results:

MSSQL: TRUNC(CAST([MyMixedCaseCol] as DATE), 'MI')

Oracle: TRUNC(CAST("MyMixedCaseCol" as DATE), 'MI')

And so forth. I'm actually working on adding a more generalized version of this to SQLAlchemy, but since it requires fairly big changes to the codebase I don't expect to see it ship in a released version before the end of this year. Hence the simple version above.

agrawaldevesh · 2019-04-21T17:17:11Z

superset/db_engine_specs.py

@@ -1485,20 +1503,21 @@ class PinotEngineSpec(BaseEngineSpec):
    inner_joins = False
    supports_column_aliases = False

-    _time_grain_to_datetimeconvert = {
+    # Pinot does its own conversion below
+    time_grain_functions: Dict[Optional[str], str] = {


My concern is renaming this to time_grain_functions is that they are not really behaving like "functions" (like in the rest of the db engine specs).

For example if someone (other than the pinot specific get_timestamp_expr) starts using them, then they wouldn't be getting a function. So I would be wary of changing this.

If anything, we can make time_grain_functions = None (in case they are set in the BaseEngineSpec), so that no one can use them.

And switch this back to its internal name that the below pinot specific get_timestamp_expr can use.

The way I see it all values in time_grain_functions are merely technical strings that are used by the spec to construct the final TimestampExpression. The fact that Pinot does this slightly differently I think is ok; all interaction with them should always be done via the get_timestamp_expr() function anyway, be it Pinot or Postgres. But I agree that the naming is slightly confusing. I would personally stick with my proposal, but I will change it back if you feel it is more intuitive.

Okay. This was a nit anyway :D.

Should we make time_grain_functions private then ? _time_grain_functions ?

The name is a bit of a misnomer now: The values in this dictionary are no longer pure functions.

Or we can just punt on this and perhaps the comment above "time_grain_functions are merely technical strings ...." as a code comment.

Its fine either way

I agree, it should be made private as it is. However, I think I'll let this be as it is and keep this in mind next time this time grains are refactored (originally they were a tuple of tuples which are currently emulated via BaseEngineSpec.get_time_grains()). Perhaps next iteration they'll become more final.

agrawaldevesh · 2019-04-21T17:17:56Z

Hi @villebro , I left some comments inline. LGTM to me otherwise.

villebro · 2019-04-22T05:07:28Z

Hi @villebro , I left some comments inline. LGTM to me otherwise.

Thanks @agrawaldevesh , great comments as always! Were you able to get this working? And was the unit test ok?

agrawaldevesh · 2019-04-22T19:42:41Z

Hi @villebro , I left some comments inline. LGTM to me otherwise.

Thanks @agrawaldevesh , great comments as always! Were you able to get this working? And was the unit test ok?

Hi Ville ... Everything works. The unit test is proper and also I was able to merge this branch and test it out in my prod environment and superset had no problems. All charts and everything rendered fine. The generated queries remained the same.

So this PR is totally fine from Pinot perspective.

villebro · 2019-04-24T18:08:16Z

@mistercrunch @john-bodley this one should be ok to ship.

mistercrunch

I did a pass on the PR and it LGTM @agrawaldevesh has this been running in production on your side? Which engines are you using there? Presto + Pinot I'm assuming?

agrawaldevesh · 2019-04-25T07:08:14Z

Hi Max ... I tested this with Pinot natively and it works. Presto+pinot wouldn't really test the pinot side of superset (since superset would just speak presto-sql then). Having tested it, I am confident that this PR would work with Pinot.

mistercrunch · 2019-04-30T00:20:39Z

superset/db_engine_specs.py

        # if epoch, translate to DATE using db specific conf
        if pdf == 'epoch_s':
-            expr = cls.epoch_to_dttm().format(col=expr)
+            time_expr = time_expr.replace('{col}', cls.epoch_to_dttm())


.format does much more than just replacing strings. There's a whole mini language behind it that people might have used (even though it's unlikely)

Previously the code conditionally "burned" the column name/expression into the epoch expression (later that gets conditionally "burned" into the timegrain expression), which is what this PR tries to avoid. In this proposal the epoch expression is instead conditionally burned into the timestamp expression, making it possible to keep the column object in it's native form until compilation time. I can change the .replace logic to .format, but in this context I don't think it will make a difference.

mistercrunch · 2019-04-30T00:21:03Z

superset/db_engine_specs.py

        elif pdf == 'epoch_ms':
-            expr = cls.epoch_ms_to_dttm().format(col=expr)
+            time_expr = time_expr.replace('{col}', cls.epoch_ms_to_dttm())


gogitub · 2021-01-04T18:35:36Z

Hi,
Need help on similar requirement.
We have many timestamp with time zone fields in Oracle database which is the source for our Superset (0.36 version) reports.
And we need to show the data in EST or PST or some other time zones based on customer need.

Ex:
Data in Oracle database table is like '2021-01-04 21:48:56.66 +05:30'
If we want to display in EST, report should show it like : 2021-01-04T11:18:56.660000
If we want to display in PST, report should show it like : 2021-01-04T08:18:56.660000

So, is there any configuration parameter that we can modify to display the data in desired time zone (like EST/PST or some other)?

Please advise. Thanks in advance.

gogitub · 2021-01-05T07:25:47Z

Also, we need to show the data displayed in time zone per user profile or preference if there is a way.

gogitub · 2021-01-05T17:26:01Z

It is observed that Superset is automatically converting the timestamp fields to EST time while displaying in the SQL Lab or Chart.
Anyone knows where exactly we can change this setting to show it in different time zone or customize it to work using a config parameter?

john-bodley reviewed Mar 26, 2019

View reviewed changes

mistercrunch reviewed Mar 27, 2019

View reviewed changes

villebro changed the title ~~Make time expression native SQLAlchemy element~~ Make timestamp expression native SQLAlchemy element Mar 27, 2019

kristw added enhancement:request Enhancement request submitted by anyone from the community review labels Mar 27, 2019

john-bodley reviewed Apr 14, 2019

View reviewed changes

superset/connectors/sqla/models.py Show resolved Hide resolved

superset/db_engine_specs.py Show resolved Hide resolved

villebro mentioned this pull request Apr 19, 2019

Can't select wb_health_population columns using Postgres #6009

Closed

3 tasks

agrawaldevesh reviewed Apr 21, 2019

View reviewed changes

mistercrunch approved these changes Apr 25, 2019

View reviewed changes

mistercrunch requested changes Apr 30, 2019

View reviewed changes

mistercrunch approved these changes May 2, 2019

View reviewed changes

villebro added 9 commits May 30, 2019 08:07

Add native sqla component for time expressions

f1d337a

Add unit tests and remove old tests

c1cf05b

Remove redundant _grains_dict method

b133c71

Clarify time_grain logic

2080652

Add docstrings and typing

ea14fe5

Fix flake8 errors

8123005

Add missing typings

2c66cf7

Rename to TimestampExpression

a73bf26

Remove redundant tests

d8aa292

Fix broken reference to db.database_name due to refactor

e317cde

villebro merged commit 34407e8 into apache:master May 30, 2019

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.34.0 labels Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make timestamp expression native SQLAlchemy element #7131

Make timestamp expression native SQLAlchemy element #7131

villebro commented Mar 26, 2019 •

edited

john-bodley Mar 26, 2019

villebro Mar 27, 2019

agrawaldevesh commented Mar 26, 2019

mistercrunch Mar 27, 2019 •

edited

codecov-io commented Mar 27, 2019 •

edited

villebro commented Mar 30, 2019 •

edited

villebro commented Apr 14, 2019

agrawaldevesh Apr 21, 2019

villebro Apr 22, 2019

agrawaldevesh Apr 21, 2019

villebro Apr 22, 2019 •

edited

agrawaldevesh Apr 22, 2019

villebro Apr 24, 2019

agrawaldevesh commented Apr 21, 2019

villebro commented Apr 22, 2019

agrawaldevesh commented Apr 22, 2019

villebro commented Apr 24, 2019

mistercrunch left a comment

agrawaldevesh commented Apr 25, 2019

mistercrunch Apr 30, 2019

villebro May 1, 2019

mistercrunch Apr 30, 2019

gogitub commented Jan 4, 2021

gogitub commented Jan 5, 2021

gogitub commented Jan 5, 2021

Make timestamp expression native SQLAlchemy element #7131

Make timestamp expression native SQLAlchemy element #7131

Conversation

villebro commented Mar 26, 2019 • edited

SUMMARY

TEST PLAN

ADDITIONAL INFORMATION

REVIEWERS

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agrawaldevesh commented Mar 26, 2019

mistercrunch Mar 27, 2019 • edited

Choose a reason for hiding this comment

codecov-io commented Mar 27, 2019 • edited

Codecov Report

villebro commented Mar 30, 2019 • edited

villebro commented Apr 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

villebro Apr 22, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agrawaldevesh commented Apr 21, 2019

villebro commented Apr 22, 2019

agrawaldevesh commented Apr 22, 2019

villebro commented Apr 24, 2019

mistercrunch left a comment

Choose a reason for hiding this comment

agrawaldevesh commented Apr 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gogitub commented Jan 4, 2021

gogitub commented Jan 5, 2021

gogitub commented Jan 5, 2021

villebro commented Mar 26, 2019 •

edited

mistercrunch Mar 27, 2019 •

edited

codecov-io commented Mar 27, 2019 •

edited

villebro commented Mar 30, 2019 •

edited

villebro Apr 22, 2019 •

edited