Skip to content

fix(spark): register Spark SQLAlchemy dialect so spark:// URIs resolve to SparkEngineSpec#38299

Open
Khrol wants to merge 1 commit intoapache:masterfrom
Khrol:spark-dialect-fix
Open

fix(spark): register Spark SQLAlchemy dialect so spark:// URIs resolve to SparkEngineSpec#38299
Khrol wants to merge 1 commit intoapache:masterfrom
Khrol:spark-dialect-fix

Conversation

@Khrol
Copy link
Contributor

@Khrol Khrol commented Feb 27, 2026

SUMMARY

SparkEngineSpec wasn't even usable before because hive://... URLs were resolved to HiveEngineSpec.

spark:// connection strings were not resolving to SparkEngineSpec because the Spark dialect was not registered with SQLAlchemy. This PR:

  • Sets engine = "spark" on SparkEngineSpec so get_engine_spec correctly maps spark:// URIs
  • Registers the HiveDialect under the "spark" name via sqlalchemy.dialects.registry
  • Preserves Spark-native SQL functions like BOOL_OR instead of rewriting them to LOGICAL_OR via the Hive dialect

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before:
Screenshot 2026-02-27 at 10 48 47

Screenshot 2026-02-27 at 10 49 15

After:
Screenshot 2026-02-27 at 10 49 41

Screenshot 2026-02-27 at 10 47 44

TESTING INSTRUCTIONS

  1. Run the new unit tests:
    pytest tests/unit_tests/sql/test_spark_dialect.py -v
  2. Verify all tests pass, confirming:
    • spark:// URIs resolve to SparkEngineSpec
    • SparkEngineSpec.engine is "spark"
    • BOOL_OR is preserved (not rewritten to LOGICAL_OR) when using the Spark engine
    • BOOL_OR is preserved after applying a LIMIT (the SQLLab flow)
    • The Hive dialect still rewrites BOOL_OR to LOGICAL_OR (contrast test)

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@Khrol Khrol marked this pull request as draft February 27, 2026 08:46
@dosubot dosubot bot added the data:connect Namespace | Anything related to db connections / integrations label Feb 27, 2026
@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.87%. Comparing base (761cee2) to head (4c83948).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #38299      +/-   ##
==========================================
+ Coverage   64.10%   64.87%   +0.77%     
==========================================
  Files        1810     2483     +673     
  Lines       71288   123136   +51848     
  Branches    22694    28567    +5873     
==========================================
+ Hits        45696    79882   +34186     
- Misses      25592    41857   +16265     
- Partials        0     1397    +1397     
Flag Coverage Δ
hive 41.11% <100.00%> (?)
mysql 64.05% <100.00%> (?)
postgres 64.13% <100.00%> (?)
presto 41.13% <100.00%> (?)
python 65.90% <100.00%> (?)
sqlite 63.72% <100.00%> (?)
unit 100.00% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pull-request-size pull-request-size bot added size/M and removed size/L labels Feb 27, 2026
@Khrol Khrol marked this pull request as ready for review February 27, 2026 09:55
@pull-request-size pull-request-size bot added size/L and removed size/M labels Feb 27, 2026
Copy link
Contributor

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #e30f65

Actionable Suggestions - 1
  • tests/unit_tests/sql/test_spark_dialect.py - 1
    • Parametrize argument type incorrect · Line 46-48
Review Details
  • Files reviewed - 2 · Commit Range: 2f6cc89..805fd18
    • superset/db_engine_specs/spark.py
    • tests/unit_tests/sql/test_spark_dialect.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

…e to SparkEngineSpec

- Set engine = "spark" on SparkEngineSpec so get_engine_spec correctly
  maps spark:// URIs
- Register HiveDialect under the "spark" name via sqlalchemy.dialects.registry
- Preserve Spark-native SQL functions like BOOL_OR instead of rewriting
  them to LOGICAL_OR via the Hive dialect
- Update example connection string to use spark:// scheme
@Khrol Khrol force-pushed the spark-dialect-fix branch from d901aab to 4c83948 Compare February 27, 2026 11:49
@bito-code-review
Copy link
Contributor

bito-code-review bot commented Feb 27, 2026

Code Review Agent Run #593878

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 4c83948..4c83948
    • superset/db_engine_specs/spark.py
    • tests/unit_tests/sql/test_spark_dialect.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes Spark connections resolvable via spark:// SQLAlchemy URIs by mapping them to SparkEngineSpec, ensuring Superset uses the sqlglot Spark dialect (so Spark-native functions like BOOL_OR are preserved instead of being rewritten by the Hive dialect).

Changes:

  • Set SparkEngineSpec.engine = "spark" and register a SQLAlchemy dialect handler under the spark scheme.
  • Update Spark engine spec metadata to use a spark://... connection string template.
  • Add unit tests validating spark:// resolution and confirming BOOL_OR preservation in Spark (and Hive contrast behavior).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
superset/db_engine_specs/spark.py Switches Spark engine key to spark, registers a spark SQLAlchemy dialect, and updates the connection string template.
tests/unit_tests/sql/test_spark_dialect.py Adds unit tests covering engine spec resolution and sqlglot formatting behavior for Spark vs Hive.

Comment on lines +45 to +46
engine = "spark"
registry.register("spark", "pyhive.sqlalchemy_hive", "HiveDialect")
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing SparkEngineSpec.engine to "spark" means this engine spec will only be considered “available” if get_available_engine_specs() can detect an installed SQLAlchemy dialect backend named spark. Today that detection only scans SQLAlchemy built-in dialects plus installed entry points; a runtime registry.register("spark", ...) doesn’t feed into that scan, so drivers["spark"] will likely stay empty and /api/v1/database/available will filter Spark out entirely (it skips engine specs with no drivers). Consider updating driver discovery to also check sqlalchemy.dialects.registry.load("spark") (or similar) and populate drivers["spark"], or add an alias/fallback mechanism that doesn’t cause get_engine_spec("hive") to resolve to SparkEngineSpec.

Suggested change
engine = "spark"
registry.register("spark", "pyhive.sqlalchemy_hive", "HiveDialect")
engine = "hive"

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal proposes to revert the changes.
https://github.com/apache/superset/blob/master/superset/db_engine_specs/ascend.py#L27-L28 - the same pattern is used here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:connect Namespace | Anything related to db connections / integrations size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants