Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add StarRocks support #23209

Merged
merged 2 commits into from
May 23, 2023
Merged

feat: Add StarRocks support #23209

merged 2 commits into from
May 23, 2023

Conversation

miomiocat
Copy link
Contributor

@miomiocat miomiocat commented Feb 27, 2023

SUMMARY

Add StarRocks support

Starrocks database is based on mysql protocal but introduce the concept of catalog and other differences, so there will be incompatibility when directly using mysqldb engine spec and it is necessary to introduce a separate Starrocks engine spec to fix various issues and use starrocks python client directly

main differences:

  • Fixed field type issues because mysql does not have STRUCT/MAP/ARRAY types.
  • Support the catalog in StarRocks

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks super solid, thanks for the contribution! ❤️ Tagging a few more committers for reviews who I believe have experience with Starrocks.

@codecov
Copy link

codecov bot commented Feb 27, 2023

Codecov Report

Merging #23209 (7a20eec) into master (357745f) will decrease coverage by 0.03%.
The diff coverage is 71.66%.

❗ Current head 7a20eec differs from pull request most recent head c6408b2. Consider uploading reports for the commit c6408b2 to get more accurate results

@@            Coverage Diff             @@
##           master   #23209      +/-   ##
==========================================
- Coverage   68.31%   68.29%   -0.03%     
==========================================
  Files        1951     1956       +5     
  Lines       75385    75589     +204     
  Branches     8185     8223      +38     
==========================================
+ Hits        51496    51620     +124     
- Misses      21786    21861      +75     
- Partials     2103     2108       +5     
Flag Coverage Δ
hive 53.36% <48.60%> (+0.16%) ⬆️
javascript 54.69% <51.72%> (-0.01%) ⬇️
mysql 78.91% <73.67%> (-0.05%) ⬇️
postgres 78.98% <73.67%> (-0.06%) ⬇️
presto 53.29% <48.86%> (+0.16%) ⬆️
python 82.81% <78.98%> (-0.02%) ⬇️
sqlite 77.53% <72.65%> (-0.03%) ⬇️
unit 53.41% <55.94%> (+0.30%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...i-core/src/color/colorSchemes/sequential/common.ts 100.00% <ø> (ø)
...set-ui-core/src/ui-overrides/ExtensionsRegistry.ts 100.00% <ø> (ø)
...ackages/superset-ui-core/src/utils/featureFlags.ts 100.00% <ø> (ø)
...plugins/legacy-preset-chart-deckgl/src/factory.tsx 0.00% <ø> (ø)
...preset-chart-deckgl/src/layers/Heatmap/Heatmap.tsx 0.00% <0.00%> (ø)
...d/plugins/legacy-preset-chart-deckgl/src/preset.js 100.00% <ø> (ø)
...hart-echarts/src/MixedTimeseries/transformProps.ts 0.00% <0.00%> (ø)
...lugin-chart-echarts/src/Timeseries/transformers.ts 56.95% <ø> (-0.29%) ⬇️
...et-frontend/src/components/TableSelector/index.tsx 81.08% <ø> (ø)
...-frontend/src/features/alerts/AlertReportModal.tsx 54.44% <0.00%> (ø)
... and 120 more

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@john-bodley
Copy link
Member

Thanks @miomiocat for the contribution. We (Airbnb) are using StarRocks within Superset, but given StarRocks is compatible with the MySQL protocol we've had good mileage by simply extending the MySQLEngineSpec, i.e.,

class StarRocksEngineSpec(MySQLEngineSpec):
    """
    The StarRocks engine specification.
    """

    engine = "starrocks"
    engine_name = "StarRocks"

I was wondering whether you considered this option as this adheres to the DRY principle.

@miomiocat
Copy link
Contributor Author

miomiocat commented Mar 1, 2023

Thanks @miomiocat for the contribution. We (Airbnb) are using StarRocks within Superset, but given StarRocks is compatible with the MySQL protocol we've had good mileage by simply extending the MySQLEngineSpec, i.e.,

class StarRocksEngineSpec(MySQLEngineSpec):
    """
    The StarRocks engine specification.
    """

    engine = "starrocks"
    engine_name = "StarRocks"

I was wondering whether you considered this option as this adheres to the DRY principle.

Thanks for your reply.

Actually, StarRocks is mostly compatible with mysql but not all.
If you are using internal table in StarRocks only, your implementation can work.
But for external catalog, it will throw Database not found exception in Superset.

for example, you can reproduce this issue with following steps:

  1. create a external catalog
  2. connect this catalog in superset using uri : starrocks://user:pass@host:port/<catalog name>.<database name>
  3. select a schema from superset UI and execute any SQL in SQL editor like show tables

The reason for this exception is that mysql does not handle the catalog in StarRocks so I resolve this issue from adjust_database_uri.

We are considering changing the format of the current URI and other things. For future extension, we need a separate StarRocks engine spec.

@miomiocat
Copy link
Contributor Author

@etr2460 @john-bodley Could you help review this PR?

superset/db_engine_specs/starrocks.py Outdated Show resolved Hide resolved
superset/db_engine_specs/starrocks.py Outdated Show resolved Hide resolved
superset/db_engine_specs/starrocks.py Outdated Show resolved Hide resolved
@john-bodley
Copy link
Member

Thanks @miomiocat for clarifying. I've left some comments.

@miomiocat
Copy link
Contributor Author

Thanks @miomiocat for clarifying. I've left some comments.

Thanks @john-bodley
I have fixed your comments, PTAL

@miomiocat miomiocat requested review from john-bodley and removed request for etr2460 March 15, 2023 09:41
@miomiocat miomiocat force-pushed the starrocks branch 2 times, most recently from e0150cc to fb6db49 Compare March 21, 2023 09:11
@rusackas
Copy link
Member

Closing and re-opening to kick-start CI (not sure why it's stuck in some eternal waiting state).

@rusackas
Copy link
Member

I've kicked CI a couple of times, but the DB tests still seem to be failing, so those need some 👀. @villebro do you know if this is victim to a flaky test, or if this looks legit?

@miomiocat
Copy link
Contributor Author

Fixed field type issues because mysql does not have STRUCT/MAP/ARRAY types.
Please help me review again.

@john-bodley @rusackas

@rusackas
Copy link
Member

PR was failing due to some pre-commit hook linting errors. Apologies for the churn as I try to get those ironed out ;)

@rusackas
Copy link
Member

Now getting an error from mypy:

superset/db_engine_specs/starrocks.py:71: error: Incompatible types in assignment (expression has type Tuple[Tuple[Pattern[str], TINYINT, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], ... <14 more items>], base class "MySQLEngineSpec" defined the type as Tuple[Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType]])  [assignment]

@miomiocat miomiocat force-pushed the starrocks branch 7 times, most recently from c1b1761 to f9b4908 Compare May 19, 2023 09:46
"starrocks://user:password@host:port/catalog.db[?key=value&key=value...]"
)

_default_column_type_mappings = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be

Suggested change
_default_column_type_mappings = (
column_type_mappings = (

Also note, that you don't need to reimplement all types - Any types that are already covered by _default_column_type_mappings can be omitted. In addition, I would suggest adding unit tests to ensure that the main types are identified correctly (see tests/unit_tests/db_engine_specs for other examples of how this is done).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks:)
added unit tests and updated codes, please review it again.

Signed-off-by: miomiocat <284487410@qq.com>
@miomiocat
Copy link
Contributor Author

Now getting an error from mypy:

superset/db_engine_specs/starrocks.py:71: error: Incompatible types in assignment (expression has type Tuple[Tuple[Pattern[str], TINYINT, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], ... <14 more items>], base class "MySQLEngineSpec" defined the type as Tuple[Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType], Tuple[Pattern[str], Any, GenericDataType]])  [assignment]

@rusackas
Thanks for your reply

I've fixed this issue and passed all check except one error in pre-commit stage with following message.
can you give me some help?

[INFO] This may take a few minutes...
isort....................................................................Failed
- hook id: isort
- files were modified by this hook

Fixing /home/runner/work/superset/superset/tests/unit_tests/db_engine_specs/test_starrocks.py
Skipped [49](https://github.com/miomiocat/superset/actions/runs/5054598952/jobs/9069709016#step:6:50) files

mypy.....................................................................Passed
pip-compile-multi verify.................................................Passed
check docstring is first.................................................Passed
check for added large files..............................................Passed
check yaml...............................................................Passed
debug statements (python)................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
black....................................................................Failed
- hook id: black
- files were modified by this hook

reformatted tests/unit_tests/db_engine_specs/test_starrocks.py

All done! ✨ 🍰 ✨
1 file reformatted, 1271 files left unchanged.

prettier.................................................................Passed
Blacklist................................................................Passed
Helm Docs................................................................Passed
Error: Process completed with exit code 1.

@villebro
Copy link
Member

@miomiocat if you check the CONTRIBUTING.md guide, it shows how to enable pre-commit hooks that automatically check and fix linting issues like this. I pushed a fix to resolve the linting issue, should be good to go after that.

@rusackas
Copy link
Member

Looks like we're good! Thanks so much, @miomiocat!!!

@rusackas rusackas merged commit f036adb into apache:master May 23, 2023
31 checks passed
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.0.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants