Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-15] Fixing datetime conversion and SQL literal #8464

Merged

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Oct 29, 2019

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

This PR updates a number of aspects related to the datetime SQL literal logic which is used for the RHS of the time filter comparisons. Specifically:

  1. Make DbEngineSpec.convert_dttm optionally return None if a type conversion does not exist.
  2. Updates the logic in dttm_sql_literal to first try to convert a datetime object to a SQL expression (1) prior to converting either a string on numerical type based on the python-date-format. The reason for this change is we should handle known types before falling back to using the python-date-format, as in theory these could be defined incorrectly for non-string/float types.
  3. Updates the convert_dttm logic for a number of database engines. There were a handful of examples where this was simply returning either i) a string representation of the datetime object rather than the necessary SQL expression for casting from a datetime object to the native temporal type, or ii) incorrect datetime formatting or casting logic. Note I'm not familiar with a number of these database engines and thus the updated conversions were the result of Googling. This logic should probably be validated.
  4. Adds unit tests for the database engine specific convert_dttm function.
  5. Uses f-strings rather than str.format(...).
  6. Replaced dttm.isoformat()[:10] with dttm.date().isoformat() in improve readability.

Regarding timestamp precision some databases only seem to support seconds whereas others support microseconds. Where possible the datetime object is formatted to the highest level precision available.

Note for context I sense this PoC is probably the end state for correctly handling the necessary casting of types (#7682).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TEST PLAN

CI and additional unit tests.

ADDITIONAL INFORMATION

REVIEWERS

to: @betodealmeida @dpgaspar @etr2460 @michellethomas @mistercrunch @villebro

return "'{}'".format(dttm.strftime("%Y-%m-%d"))
return "'{}'".format(dttm.strftime("%Y-%m-%d %H:%M:%S"))
return f"CAST('{dttm.isoformat()[:10]}' AS DATE)"
if tt == "TIMESTAMP":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betodealmeida do you know why the TIMESTAMP type wasn't defined for BigQuery?

return f"'{dttm.strftime('%Y-%m-%d %H:%M:%S')}'"
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
if target_type.upper() in ("DATE", "DATETIME"):
return f"CAST('{dttm.isoformat()}' AS TIMESTAMP)"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar I know you recently added this and I was hoping you could validate the logic.

@john-bodley john-bodley force-pushed the john-bodley--fix-convert-dttm branch 2 times, most recently from b65aa7a to 3122919 Compare October 29, 2019 00:19
if tt == "TIMESTAMP":
return "from_iso8601_timestamp('{}')".format(dttm.isoformat())
return "CAST ('{}' AS TIMESTAMP)".format(dttm.strftime("%Y-%m-%d %H:%M:%S"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be casting other types to TIMESTAMP. The reason for this function is to ensure that the LHS and RHS for the time filter comparison are equivalent types.

"""
return "'{}'".format(dttm.strftime("%Y-%m-%d %H:%M:%S"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now handled in dttm_sql_literal as we need to determine if the python-date-format logic needs to be invoked.

@@ -50,8 +50,10 @@ def epoch_to_dttm(cls):
return "dateadd(S, {col}, '1970-01-01')"

@classmethod
def convert_dttm(cls, target_type: str, dttm: datetime) -> str:
return "CONVERT(DATETIME, '{}', 126)".format(dttm.isoformat())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be casting all types to a DATETIME. The reason for this function is to ensure that the LHS and RHS for the time filter comparison are equivalent types.

)
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
tt = target_type.upper()
if tt == "DATE":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note there was no previous logic for handling the DATE type.

@john-bodley john-bodley changed the title [sql] Fixing datetime SQL literal [sql] Fixing datetime conversion and SQL literal Oct 29, 2019
@john-bodley john-bodley force-pushed the john-bodley--fix-convert-dttm branch 2 times, most recently from 827ab4c to 4f1543a Compare October 29, 2019 04:31
@john-bodley john-bodley changed the title [sql] Fixing datetime conversion and SQL literal [SIP-15] Fixing datetime conversion and SQL literal Oct 29, 2019
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very quick first pass, will look in closer detail later.

superset/db_engine_specs/mssql.py Outdated Show resolved Hide resolved
Comment on lines 76 to 80
for target_type in ("DATE", "DATETIME", "SMALLDATETIME", "TIMESTAMP"):
self.assertEqual(
MssqlEngineSpec.convert_dttm(target_type, dttm),
"CONVERT(DATETIME, '2019-01-02T03:04:05.678900', 126)",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

..and the test here.

@john-bodley john-bodley force-pushed the john-bodley--fix-convert-dttm branch 2 times, most recently from 3ea4968 to fa2d521 Compare October 29, 2019 18:29
@codecov-io
Copy link

codecov-io commented Oct 29, 2019

Codecov Report

Merging #8464 into master will decrease coverage by 0.04%.
The diff coverage is 42.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8464      +/-   ##
==========================================
- Coverage   66.57%   66.52%   -0.05%     
==========================================
  Files         449      449              
  Lines       22567    22595      +28     
  Branches     2367     2367              
==========================================
+ Hits        15023    15032       +9     
- Misses       7406     7425      +19     
  Partials      138      138
Impacted Files Coverage Δ
superset/db_engine_specs/db2.py 90% <ø> (+4.28%) ⬆️
superset/db_engine_specs/mssql.py 48.83% <11.11%> (-9.5%) ⬇️
superset/db_engine_specs/oracle.py 66.66% <25%> (-25%) ⬇️
superset/db_engine_specs/bigquery.py 42.66% <25%> (-2.41%) ⬇️
superset/db_engine_specs/presto.py 22.43% <25%> (ø) ⬆️
superset/db_engine_specs/hive.py 27.85% <25%> (ø) ⬆️
superset/db_engine_specs/impala.py 59.09% <33.33%> (-5.91%) ⬇️
superset/db_engine_specs/kylin.py 57.14% <40%> (+3.29%) ⬆️
superset/db_engine_specs/clickhouse.py 62.5% <40%> (+2.5%) ⬆️
superset/db_engine_specs/elasticsearch.py 81.25% <40%> (ø) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5fb0bcb...52ae511. Read the comment docs.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second pass

superset/db_engine_specs/oracle.py Outdated Show resolved Hide resolved
superset/db_engine_specs/bigquery.py Outdated Show resolved Hide resolved
Comment on lines -52 to +59
def convert_dttm(cls, target_type: str, dttm: datetime) -> str:
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
tt = target_type.upper()
if tt == "DATE":
return "CAST('{}' AS DATE)".format(dttm.isoformat()[:10])
return f"TO_DATE('{dttm.date().isoformat()}', 'yyyy-MM-dd')"
elif tt == "TIMESTAMP":
return "CAST('{}' AS TIMESTAMP)".format(dttm.strftime("%Y-%m-%d %H:%M:%S"))
return "'{}'".format(dttm.strftime("%Y-%m-%d %H:%M:%S"))
return f"""TO_TIMESTAMP('{dttm.isoformat(sep=" ", timespec="seconds")}', 'yyyy-MM-dd HH:mm:ss')""" # pylint: disable=line-too-long
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgivre can you check if these are valid and if there's anything missing? A quick googling didn't turn up any DATETIME type for Drill, just DATE, TIME and TIMESTAMP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Comment on lines 186 to 188
elif tt == "TIMESTAMP":
return "CAST('{}' AS TIMESTAMP)".format(dttm.strftime("%Y-%m-%d %H:%M:%S"))
return "'{}'".format(dttm.strftime("%Y-%m-%d %H:%M:%S"))
return f"""CAST('{dttm.isoformat(sep=" ", timespec="seconds")}' AS TIMESTAMP)""" # pylint: disable=line-too-long
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superset/db_engine_specs/impala.py Outdated Show resolved Hide resolved
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
if target_type.upper() == "TEXT":
return f"""'{dttm.isoformat(sep=" ", timespec="microseconds")}'"""
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLite also supports time stored in REAL and INTEGER columns according to the docs:

    TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
    REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
    INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.

Should we support the REAL and INTEGER conversions here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willbarrett epoch is handled by specifying epoch_s in the python-date-format.

Also a REAL or INTEGER column could encode temporal information in a different form, i.e., one could be using REAL to store a UNIX timestamp with floating point precision though and thus we can’t blindly handle these types.

Note per the referenced PoC PR the future end goal is to provide an engine specific graph which maps between various SQLAlchemy and datetime types. Much of the time grain bucketing is potentially wrong for string columns.

@john-bodley
Copy link
Member Author

@villebro @willbarrett thanks for the feedback. I've hopefully addressed all your comments (either via code changes or replies to your comments).

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and big thanks for the big effort with the unit tests. I expect something will break as a consequence of this big change, but these are easy to fix later on, so I don't see any reason not to merge this.

@john-bodley
Copy link
Member Author

Thanks for the review @villebro. I agree there could be a regression, but i) there were clearly some ill-defined mappings for certain dialects, and ii) with the addition of unit tests after any fixes future regressions should be preventable.

@villebro
Copy link
Member

Absolutely @john-bodley ; even in the light of possible regressions I expect this to fix more bugs than it introduces. I propose merging this asap.

@john-bodley john-bodley merged commit 0a3b121 into apache:master Oct 30, 2019
@john-bodley john-bodley deleted the john-bodley--fix-convert-dttm branch October 30, 2019 06:26
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
if target_type.upper() == "DATETIME":
return f"""CAST('{dttm.isoformat(timespec="seconds")}' AS DATETIME)"""
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley thanks for pointing out to this, the logic looks good and I've retested it and found no problems

@dpgaspar dpgaspar added v0.35 and removed v0.35 labels Dec 20, 2019
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XL 🚢 0.36.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants