Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(trino): Handful of updates for the Trino engine #20152

Merged

Conversation

john-bodley
Copy link
Member

SUMMARY

This PR provides a number of updates for the Trino engine including:

  1. Making the TrinoEngineSpec derive from the PrestoEngineSpec rather than the BaseEngineSpec. I'm not really sure as to why the later was used as it violates the DRY principle and also means that many of the Presto customizations—which Trino is forked from—were previously not defined.
  2. Renames trinonative to trino—which requires a PyHive bump—thanks to @betodealmeida's work chore: rename Trino entry point dropbox/PyHive#428. I thought there was merit going forward using trino rather than trinonative everywhere.
  3. Added a TrinoTemplateProcessor so macros like {{ trino.latest_partition(...) }} etc. can be used. Note made sure the processor also supported the presto key for backwards compatibility given that it is likely people will be migrating from Presto and the presto key is likely embedded in saved queries, virtual datasets, etc.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented May 21, 2022

Codecov Report

Merging #20152 (ec12b57) into master (8097403) will decrease coverage by 0.14%.
The diff coverage is 100.00%.

❗ Current head ec12b57 differs from pull request most recent head d7f2ed4. Consider uploading reports for the commit d7f2ed4 to get more accurate results

@@            Coverage Diff             @@
##           master   #20152      +/-   ##
==========================================
- Coverage   66.52%   66.38%   -0.15%     
==========================================
  Files        1726     1726              
  Lines       64887    64845      -42     
  Branches     6832     6832              
==========================================
- Hits        43169    43046     -123     
- Misses      19986    20067      +81     
  Partials     1732     1732              
Flag Coverage Δ
hive ?
mysql 82.28% <100.00%> (+0.06%) ⬆️
postgres 82.35% <100.00%> (+0.06%) ⬆️
presto ?
python 82.43% <100.00%> (-0.29%) ⬇️
sqlite 82.08% <100.00%> (+0.06%) ⬆️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/trino.py 91.93% <100.00%> (+20.50%) ⬆️
superset/jinja_context.py 91.15% <100.00%> (+0.32%) ⬆️
superset/db_engines/hive.py 0.00% <0.00%> (-85.19%) ⬇️
superset/db_engine_specs/hive.py 70.22% <0.00%> (-15.65%) ⬇️
superset/db_engine_specs/presto.py 83.36% <0.00%> (-5.34%) ⬇️
superset/connectors/sqla/models.py 89.05% <0.00%> (-1.19%) ⬇️
superset/initialization/__init__.py 91.13% <0.00%> (-0.36%) ⬇️
superset/db_engine_specs/base.py 88.08% <0.00%> (-0.34%) ⬇️
superset/models/core.py 88.91% <0.00%> (-0.25%) ⬇️
superset/utils/core.py 90.03% <0.00%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8097403...d7f2ed4. Read the comment docs.

"P1M": "date_trunc('month', CAST({col} AS TIMESTAMP))",
"P3M": "date_trunc('quarter', CAST({col} AS TIMESTAMP))",
"P1Y": "date_trunc('year', CAST({col} AS TIMESTAMP))",
# "1969-12-28T00:00:00Z/P1W", # Week starting Sunday
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure why these were commented out. This has always been the case since the first version of this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trino was forked from Presto, I think comments this is safe. thanks for keeping clear up.

@john-bodley john-bodley force-pushed the john-bodley--trino-db-engine-spec branch 2 times, most recently from 5925872 to 3aee525 Compare May 21, 2022 04:25
@john-bodley john-bodley requested a review from etr2460 June 1, 2022 23:50
Copy link
Member

@zhaoyongjie zhaoyongjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@john-bodley john-bodley force-pushed the john-bodley--trino-db-engine-spec branch from 3aee525 to d7f2ed4 Compare June 3, 2022 21:56
@@ -18,7 +18,6 @@
#
-r base.in
-e .[cors,druid,hive,mysql,postgres,thumbnails]
flask-cors>=2.0.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate logic per line #20.

@@ -151,7 +150,7 @@ def get_git_sha() -> str:
"oracle": ["cx-Oracle>8.0.0, <8.1"],
"pinot": ["pinotdb>=0.3.3, <0.4"],
"postgres": ["psycopg2-binary==2.9.1"],
"presto": ["pyhive[presto]>=0.4.0"],
"presto": ["pyhive[presto]>=0.6.5"],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumping to ensure that PyHive doesn't specify the trino dialect.

@john-bodley john-bodley merged commit b08e21e into apache:master Jun 3, 2022
@john-bodley john-bodley deleted the john-bodley--trino-db-engine-spec branch June 3, 2022 22:27
@kevinwengh
Copy link

Looks like this commit broke the trino integration. Error "trino error 'Cursor' object has no attribute 'poll'" was thrown with latest docker image. There is no error just if I use docker image just before this commit.

@dungdm93
Copy link
Contributor

@john-bodley, @zhaoyongjie guys. Trino engine using sqlalchemy-trino and now official trino-python-client driver. Unlike Presto and Hive which are based on PyHive.
I'm not sure make Trino inherit from PrestoEngineSpec is good or should we leave it going its own path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants