fix(spark): register Spark SQLAlchemy dialect so spark:// URIs resolve to SparkEngineSpec#38299
fix(spark): register Spark SQLAlchemy dialect so spark:// URIs resolve to SparkEngineSpec#38299Khrol wants to merge 1 commit intoapache:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #38299 +/- ##
==========================================
+ Coverage 64.10% 64.87% +0.77%
==========================================
Files 1810 2483 +673
Lines 71288 123136 +51848
Branches 22694 28567 +5873
==========================================
+ Hits 45696 79882 +34186
- Misses 25592 41857 +16265
- Partials 0 1397 +1397
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review Agent Run #e30f65
Actionable Suggestions - 1
-
tests/unit_tests/sql/test_spark_dialect.py - 1
- Parametrize argument type incorrect · Line 46-48
Review Details
-
Files reviewed - 2 · Commit Range:
2f6cc89..805fd18- superset/db_engine_specs/spark.py
- tests/unit_tests/sql/test_spark_dialect.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
…e to SparkEngineSpec - Set engine = "spark" on SparkEngineSpec so get_engine_spec correctly maps spark:// URIs - Register HiveDialect under the "spark" name via sqlalchemy.dialects.registry - Preserve Spark-native SQL functions like BOOL_OR instead of rewriting them to LOGICAL_OR via the Hive dialect - Update example connection string to use spark:// scheme
d901aab to
4c83948
Compare
Code Review Agent Run #593878Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
There was a problem hiding this comment.
Pull request overview
This PR makes Spark connections resolvable via spark:// SQLAlchemy URIs by mapping them to SparkEngineSpec, ensuring Superset uses the sqlglot Spark dialect (so Spark-native functions like BOOL_OR are preserved instead of being rewritten by the Hive dialect).
Changes:
- Set
SparkEngineSpec.engine = "spark"and register a SQLAlchemy dialect handler under thesparkscheme. - Update Spark engine spec metadata to use a
spark://...connection string template. - Add unit tests validating
spark://resolution and confirmingBOOL_ORpreservation in Spark (and Hive contrast behavior).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
superset/db_engine_specs/spark.py |
Switches Spark engine key to spark, registers a spark SQLAlchemy dialect, and updates the connection string template. |
tests/unit_tests/sql/test_spark_dialect.py |
Adds unit tests covering engine spec resolution and sqlglot formatting behavior for Spark vs Hive. |
| engine = "spark" | ||
| registry.register("spark", "pyhive.sqlalchemy_hive", "HiveDialect") |
There was a problem hiding this comment.
Changing SparkEngineSpec.engine to "spark" means this engine spec will only be considered “available” if get_available_engine_specs() can detect an installed SQLAlchemy dialect backend named spark. Today that detection only scans SQLAlchemy built-in dialects plus installed entry points; a runtime registry.register("spark", ...) doesn’t feed into that scan, so drivers["spark"] will likely stay empty and /api/v1/database/available will filter Spark out entirely (it skips engine specs with no drivers). Consider updating driver discovery to also check sqlalchemy.dialects.registry.load("spark") (or similar) and populate drivers["spark"], or add an alias/fallback mechanism that doesn’t cause get_engine_spec("hive") to resolve to SparkEngineSpec.
| engine = "spark" | |
| registry.register("spark", "pyhive.sqlalchemy_hive", "HiveDialect") | |
| engine = "hive" |
There was a problem hiding this comment.
This proposal proposes to revert the changes.
https://github.com/apache/superset/blob/master/superset/db_engine_specs/ascend.py#L27-L28 - the same pattern is used here.
SUMMARY
SparkEngineSpecwasn't even usable before becausehive://...URLs were resolved toHiveEngineSpec.spark://connection strings were not resolving toSparkEngineSpecbecause the Spark dialect was not registered with SQLAlchemy. This PR:engine = "spark"onSparkEngineSpecsoget_engine_speccorrectly mapsspark://URIsHiveDialectunder the"spark"name viasqlalchemy.dialects.registryBOOL_ORinstead of rewriting them toLOGICAL_ORvia the Hive dialectBEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Before:

After:

TESTING INSTRUCTIONS
spark://URIs resolve toSparkEngineSpecSparkEngineSpec.engineis"spark"BOOL_ORis preserved (not rewritten toLOGICAL_OR) when using the Spark engineBOOL_ORis preserved after applying aLIMIT(the SQLLab flow)BOOL_ORtoLOGICAL_OR(contrast test)ADDITIONAL INFORMATION