Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve get_db_engine_spec_for_backend #21171

Merged

Conversation

betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Aug 23, 2022

SUMMARY

How does Superset map from a SQLAlchemy URI — eg, postgres+psycopg2://user:password@host/db — to a DB engine spec? Currently the mapping is done using only the backend part of the URI; in this example postgres matches the PostgresEngineSpec engine spec since it's listed in the engine_aliases class attribute:

class PostgresEngineSpec(PostgresBaseEngineSpec, BasicParametersMixin):
engine = "postgresql"
engine_aliases = {"postgres"}

This matching has worked perfectly until the introduction of new DB engine specs that share the backend but have different drivers. Eg, for Databricks we currently have 3 DB engine specs:

class DatabricksHiveEngineSpec(HiveEngineSpec):
engine = "databricks"
engine_name = "Databricks Interactive Cluster"
driver = "pyhive"

class DatabricksODBCEngineSpec(BaseEngineSpec):
engine = "databricks"
engine_name = "Databricks SQL Endpoint"
driver = "pyodbc"

class DatabricksNativeEngineSpec(DatabricksODBCEngineSpec):
engine = "databricks"
engine_name = "Databricks Native Connector"
driver = "connector"

Whenever we encounter a database configured with a databricks backend — be it databricks+pyhive://, databricks+pyodbc:// or databricsk+connector:// we discard the driver part, and perform a match using only the backend. The end result is that one of the 3 DB engine specs is chosen at random every time!

To fix the problem, I extended the base DB engine spec with 2 new class attributes:

  • drivers: Dict[str, str] = {}, a dictionary with driver names ("psycopg2") and their descriptions
  • default_driver: Optional[str] = None, the recommended driver to be used when one is not selected (used only when the SQLAlchemy URI is not entered directly)

With these new attributes, instead of simply matching the backend of a URI (databricks) we can use the driver as well (pyhive) to find the associated DB engine spec.

A few things of note: first, if the backend matches but no driver matches the new flow will still return a random DB engine specs from the ones that support the backend. This preserves the old behavior and allows people to use drivers that Superset is not aware of. Eg, someone could configure a database with the postgres+newfancydriver:// URI and Superset would still use PostgresEngineSpec as the engine spec.

Second, if no backend matches we also preserve the old behavior of returning the BaseEngineSpec, since this allows users to explore databases that Superset is not aware of, though at limited functionality (no time grains, for example).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

WIP

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

superset/models/core.py Outdated Show resolved Hide resolved
@betodealmeida betodealmeida force-pushed the fix_get_db_engine_spec_for_backend branch from b20b711 to 094fe68 Compare August 25, 2022 17:48
@pull-request-size pull-request-size bot added size/L and removed size/S labels Aug 25, 2022
@betodealmeida betodealmeida force-pushed the fix_get_db_engine_spec_for_backend branch from 094fe68 to a705479 Compare August 25, 2022 17:54
@betodealmeida betodealmeida force-pushed the fix_get_db_engine_spec_for_backend branch from a705479 to b2e8c66 Compare August 25, 2022 17:59
@betodealmeida betodealmeida force-pushed the fix_get_db_engine_spec_for_backend branch from 072f395 to f77246c Compare August 25, 2022 21:12
@betodealmeida betodealmeida changed the title [WIP] fix: improve get_db_engine_spec_for_backend fix: improve get_db_engine_spec_for_backend Aug 28, 2022
@codecov
Copy link

codecov bot commented Aug 28, 2022

Codecov Report

Merging #21171 (bfb87d7) into master (6a0b7e5) will increase coverage by 0.06%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##           master   #21171      +/-   ##
==========================================
+ Coverage   66.37%   66.43%   +0.06%     
==========================================
  Files        1781     1782       +1     
  Lines       67871    68018     +147     
  Branches     7239     7239              
==========================================
+ Hits        45047    45190     +143     
- Misses      20966    20970       +4     
  Partials     1858     1858              
Flag Coverage Δ
hive 53.13% <70.31%> (-0.05%) ⬇️
mysql 81.01% <93.75%> (+0.02%) ⬆️
postgres 81.06% <93.75%> (+0.02%) ⬆️
presto 53.03% <70.31%> (-0.05%) ⬇️
python 81.53% <93.75%> (+0.07%) ⬆️
sqlite 79.57% <93.75%> (?)
unit 50.74% <82.81%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/config.py 91.61% <ø> (+0.02%) ⬆️
superset/db_engine_specs/trino.py 74.76% <0.00%> (ø)
superset/db_engine_specs/base.py 88.21% <80.00%> (+0.10%) ⬆️
superset/databases/api.py 95.49% <100.00%> (ø)
superset/databases/commands/validate.py 74.54% <100.00%> (+0.40%) ⬆️
superset/databases/schemas.py 98.23% <100.00%> (-0.37%) ⬇️
superset/db_engine_specs/__init__.py 84.81% <100.00%> (+0.19%) ⬆️
superset/db_engine_specs/databricks.py 77.27% <100.00%> (+1.66%) ⬆️
superset/db_engine_specs/shillelagh.py 100.00% <100.00%> (ø)
superset/models/core.py 88.88% <100.00%> (+0.13%) ⬆️
... and 19 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@betodealmeida betodealmeida force-pushed the fix_get_db_engine_spec_for_backend branch from e364210 to 6f7876b Compare August 28, 2022 18:15
@betodealmeida betodealmeida force-pushed the fix_get_db_engine_spec_for_backend branch from 6f7876b to bfb87d7 Compare August 28, 2022 18:39
@@ -78,20 +85,31 @@ def load_engine_specs() -> List[Type[BaseEngineSpec]]:
return engine_specs


def get_engine_specs() -> Dict[str, Type[BaseEngineSpec]]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley does AirBnB use this function outside of the Superset code base?

Copy link
Member

@geido geido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@betodealmeida betodealmeida merged commit 8772e2c into apache:master Aug 29, 2022
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants