Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add duckdb as DataSource - Fixes #14563 #19265

Closed
wants to merge 3 commits into from

Conversation

rwhaling
Copy link
Contributor

@rwhaling rwhaling commented Mar 20, 2022

SUMMARY

Adding DuckDB as a datasource, as it's SQLAlchemy dialect appears to have matured a good deal
Mause/duckdb_engine#219
To be clear, this is @alitrack's hard work, not mine, just opening up a PR to keep momentum, and working on testing it myself. Haven't set up for local dev on superset before, so might take me a few days.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Screen Shot 2022-03-22 at 3 22 53 PM

Screen Shot 2022-03-22 at 3 23 10 PM

Screen Shot 2022-03-22 at 3 23 42 PM

Almost works! Stuck on this error - I think the issue is the implementation of `do_ping()` in the duckdb dialect needs to be tweaked a bit, but struggling to get a local duckdb_engine working.

TESTING INSTRUCTIONS

  • ensure you have duckdb_engine installed: $ pip install duckdb-engine
  • should appear in the add database dropdown

ADDITIONAL INFORMATION

  • Has associated issue: duckdb as DataSource #14563
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

needs the forked version of [duckdb-engine](https://github.com/alitrack/duckdb_engine)
update  _time_grain_expressions
"P1Y": "DATE_TRUNC('year', {col})",
}

custom_errors: Dict[Pattern[str], Tuple[str, SupersetErrorType, Dict[str, Any]]] = {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be better solved on the duckdb-engine side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would have a better judgment call on that than me; however, this looks to be more or less copy-pasted from the equivalent SQLite datasource - https://github.com/apache/superset/blob/master/superset/db_engine_specs/sqlite.py#L61
so presumably fine for now at least?

Copy link
Member

@zhaoyongjie zhaoyongjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Comment on lines +65 to +89
@classmethod
def get_all_datasource_names(
cls, database: "Database", datasource_type: str
) -> List[utils.DatasourceName]:
schemas = database.get_all_schema_names(
cache=database.schema_cache_enabled,
cache_timeout=database.schema_cache_timeout,
force=True,
)
schema = schemas[0]
if datasource_type == "table":
return database.get_all_table_names_in_schema(
schema=schema,
force=True,
cache=database.table_cache_enabled,
cache_timeout=database.table_cache_timeout,
)
if datasource_type == "view":
return database.get_all_view_names_in_schema(
schema=schema,
force=True,
cache=database.table_cache_enabled,
cache_timeout=database.table_cache_timeout,
)
raise Exception(f"Unsupported datasource_type: {datasource_type}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part looks like it can directly inherit the Super Class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good catch -
https://github.com/apache/superset/blob/master/superset/db_engine_specs/base.py#L825
incidentally, looks like the SQLite datasource follows the same pattern
https://github.com/apache/superset/blob/master/superset/db_engine_specs/sqlite.py#L74
I don't think it would be wise to clean that up in this PR but might be a quick win.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the change here - rwhaling@9fa02d7
I forgot that I can't point an existing PR to a new branch, might need to close this one - will wait for any additional feedback first though. Still working on getting set up for local testing as well.

@rwhaling rwhaling changed the title feat(duckdb) - add duckdb as DataSource - Fixes #14563 feat(duckdb) add duckdb as DataSource - Fixes #14563 Mar 22, 2022
@srinify srinify changed the title feat(duckdb) add duckdb as DataSource - Fixes #14563 feat: add duckdb as DataSource - Fixes #14563 Mar 22, 2022
@rwhaling
Copy link
Contributor Author

progress - got a local development environment mostly working, screenshots above. Working through some bugs still
Struggling to get a local version of the duckdb_engine running - the way either superset or sqlalchemy discover dialects isn't loving a pip install -e copy, throwing AttributeError: module 'duckdb_engine' has no attribute 'name'

Will do a little more debugging, might close this and open a new PR from a branch off my own fork.

@rwhaling
Copy link
Contributor Author

Got it working on a new branch, still kind of hackish - going to close this and open a fresh PR, apologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants