Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(elasticsearch): time_zone setting does not work for cast datetime expressions #17048

Merged
merged 13 commits into from
Nov 25, 2021

Conversation

aniaan
Copy link
Contributor

@aniaan aniaan commented Oct 9, 2021

SUMMARY

The elasticsearch CAST function does not take effect for the time zone setting. In ES7.8 and above, we can use the DATETIME_PARSE function to solve this problem.

In the current master code, the extra dict in the database contains the version key. We can use version to solve this problem. According to the instructions, version is set for presto at the beginning, and ES should also reuse this field.

TESTING INSTRUCTIONS

  1. Install the latest elasticsearch-dbapi version, 0.2.6, which supports setting the time_zone parameter pip install elasticsearch-dbapi==0.2.6
  2. Follow the steps in Elasticsearch time_zone setting does not work for cast datetime expressions #16726 to create an ES data source, create a data source, set the time_zone parameter and version

截屏2021-10-09 下午4 47 33

  1. Follow the operation steps in Elasticsearch time_zone setting does not work for cast datetime expressions #16726 to analyze, and you can see that the DATETIME_PARSE function and time_zone parameter play a correct role

截屏2021-10-09 下午4 50 14

ADDITIONAL INFORMATION

@codecov
Copy link

codecov bot commented Oct 9, 2021

Codecov Report

Merging #17048 (aa600a6) into master (3ee9e11) will increase coverage by 0.08%.
The diff coverage is 95.58%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17048      +/-   ##
==========================================
+ Coverage   76.86%   76.95%   +0.08%     
==========================================
  Files        1042     1042              
  Lines       56254    56271      +17     
  Branches     7785     7785              
==========================================
+ Hits        43242    43301      +59     
+ Misses      12754    12712      -42     
  Partials      258      258              
Flag Coverage Δ
hive 81.54% <94.11%> (+0.01%) ⬆️
mysql 81.96% <94.11%> (+0.01%) ⬆️
postgres 81.97% <94.11%> (+0.01%) ⬆️
presto 81.83% <95.58%> (?)
python 82.47% <95.58%> (+0.15%) ⬆️
sqlite 81.65% <94.11%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/base.py 88.54% <ø> (-0.05%) ⬇️
superset/models/sql_types/base.py 52.63% <50.00%> (ø)
superset/db_engine_specs/databricks.py 90.00% <66.66%> (ø)
superset/db_engine_specs/mssql.py 95.83% <66.66%> (ø)
superset/connectors/sqla/models.py 88.42% <100.00%> (+1.40%) ⬆️
superset/connectors/sqla/utils.py 92.15% <100.00%> (ø)
superset/db_engine_specs/athena.py 89.28% <100.00%> (ø)
superset/db_engine_specs/bigquery.py 84.84% <100.00%> (ø)
superset/db_engine_specs/clickhouse.py 66.03% <100.00%> (ø)
superset/db_engine_specs/crate.py 92.30% <100.00%> (ø)
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ee9e11...aa600a6. Read the comment docs.

cls, target_type: str, dttm: datetime, **kwargs: Any
) -> Optional[str]:

if target_type.upper() != utils.TemporalType.DATETIME:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous logic of not short circuiting is clearer, i.e., if DATETIME then .


if es_version and StrictVersion(es_version) >= StrictVersion("7.8"):
datetime_formatted = dttm.isoformat(sep=" ", timespec="seconds")
return f"""DATETIME_PARSE('{datetime_formatted}', 'yyyy-MM-dd HH:mm:ss')"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the f-string as previously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was first written according to the way of not short-circuiting, so there will be nested if. If the f-string is written in one line, then the length of this line will be too long, so the expression is extracted. This way of writing refers to drill .py#convert_dttm method

superset/db_engine_specs/elasticsearch.py Outdated Show resolved Hide resolved
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! First pass observation: one concern regarding the changed signature of convert_dttm, other than that the changes here make sense. Curious to hear your thoughts on the signature, happy to take a second pass + do some testing later.

@@ -686,13 +685,14 @@ def df_to_sql(

@classmethod
def convert_dttm( # pylint: disable=unused-argument
cls, target_type: str, dttm: datetime,
cls, target_type: str, dttm: datetime, **kwargs: Any,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is **kwargs: Any and not db_extra: Optional[Dict[str, Any]] like has been done in get_column_spec? Unpacking into kwargs will make more difficult to add new parameters to this method going forward

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the writing here is not strict enough, for db_extra, it is better to show the declaration type, I think, the method signature can be like this, def convert_dttm(cls, target_type: str, dttm: datetime, db_extra: Optional[Dict[str, Any]], **kwargs: Any) also continue to retain kwargs, so that later if a data source needs non-db_extra information, this way is also compatible, what do you think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aniaan I'd prefer not to add **kwargs unless it's currently needed (it's easy enough to add later when the need comes up). And thinking more closely, I'd personally prefer to keep all arguments named so that all parameters in the method are explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can, then we will not add it for now, and we will consider it later if we have this situation.
I'll correct the PR later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro I have updated it, you can review it when you have time

@aniaan aniaan requested a review from villebro November 11, 2021 11:39
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional comments, but in general LGTM

cls, target_type: str, dttm: datetime, db_extra: Optional[Dict[str, Any]] = None
) -> Optional[str]:

db_extra = db_extra if db_extra else {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: db_extra = db_extra or {} should do the job, too

Comment on lines 73 to 77
if es_version and StrictVersion(es_version) >= StrictVersion("7.8"):
datetime_formatted = dttm.isoformat(sep=" ", timespec="seconds")
return (
f"""DATETIME_PARSE('{datetime_formatted}', 'yyyy-MM-dd HH:mm:ss')"""
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have a fallback/more clear error message if the version isn't parseable by StrictVersion (Since this is to be populated by the user, we can expect to bump into unparseable values here):

>>> from distutils.version import StrictVersion
>>> StrictVersion("7.8.0.0")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ville/.pyenv/versions/3.8-dev/lib/python3.8/distutils/version.py", line 40, in __init__
    self.parse(vstring)
  File "/Users/ville/.pyenv/versions/3.8-dev/lib/python3.8/distutils/version.py", line 137, in parse
    raise ValueError("invalid version number '%s'" % vstring)
ValueError: invalid version number '7.8.0.0'

Same for non-string values (I expect someone may try to enter it as a number, not string):

>>> StrictVersion(7.8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ville/.pyenv/versions/3.8-dev/lib/python3.8/distutils/version.py", line 40, in __init__
    self.parse(vstring)
  File "/Users/ville/.pyenv/versions/3.8-dev/lib/python3.8/distutils/version.py", line 135, in parse
    match = self.version_re.match(vstring)
TypeError: expected string or bytes-like object

Just in case, perhaps we could do something as simple as

supports_dttm_parse = False
try:
    if es_version:
        supports_dttm_parse = StrictVersion(es_version) >= StrictVersion("7.8")
    except ValueError:
        ...

Copy link
Contributor Author

@aniaan aniaan Nov 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro I have updated it, you can review it when you have time

@aniaan aniaan requested a review from villebro November 16, 2021 15:01
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your patience with all the iterations!

@villebro villebro merged commit 5a1c681 into apache:master Nov 25, 2021
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.5.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants