Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: share column type matching between model and result set #9161

Merged
merged 2 commits into from Mar 4, 2020
Merged

fix: share column type matching between model and result set #9161

merged 2 commits into from Mar 4, 2020

Conversation

villebro
Copy link
Member

@villebro villebro commented Feb 18, 2020

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

When investigating a bug affecting BigQuery time grains, I noticed that Exploring SQL Lab queries don't correctly detect TIME and DATE types due to the logic being different in the result set code vs the SqlAlchemy model. This PR adds a method to BaseEngineSpec for matching database specific column types (e.g. NVARCHAR, DATETIME, BIGINT) to generic types (temporal, numeric and date). In addition, type matching logic in SupersetResultSet is replaced with said logic, ensuring that type inference is uniform across models.

TEST PLAN

Test locally + CI (old + new tests)

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

@codecov-io
Copy link

codecov-io commented Feb 18, 2020

Codecov Report

Merging #9161 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #9161   +/-   ##
=======================================
  Coverage   59.06%   59.06%           
=======================================
  Files         372      372           
  Lines       11922    11922           
  Branches     2919     2919           
=======================================
  Hits         7042     7042           
  Misses       4698     4698           
  Partials      182      182

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0992445...f510a85. Read the comment docs.

Copy link
Member

@willbarrett willbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor @villebro! Thanks!

tests/utils_tests.py Outdated Show resolved Hide resolved
superset/utils/core.py Outdated Show resolved Hide resolved
tests/sqla_models_tests.py Outdated Show resolved Hide resolved
Comment on lines +147 to +168
# default matching patterns for identifying column types
db_column_types: Dict[utils.DbColumnType, Tuple[Pattern, ...]] = {
utils.DbColumnType.NUMERIC: (
re.compile(r".*DOUBLE.*", re.IGNORECASE),
re.compile(r".*FLOAT.*", re.IGNORECASE),
re.compile(r".*INT.*", re.IGNORECASE),
re.compile(r".*NUMBER.*", re.IGNORECASE),
re.compile(r".*LONG.*", re.IGNORECASE),
re.compile(r".*REAL.*", re.IGNORECASE),
re.compile(r".*NUMERIC.*", re.IGNORECASE),
re.compile(r".*DECIMAL.*", re.IGNORECASE),
re.compile(r".*MONEY.*", re.IGNORECASE),
),
utils.DbColumnType.STRING: (
re.compile(r".*CHAR.*", re.IGNORECASE),
re.compile(r".*STRING.*", re.IGNORECASE),
),
utils.DbColumnType.TEMPORAL: (
re.compile(r".*DATE.*", re.IGNORECASE),
re.compile(r".*TIME.*", re.IGNORECASE),
),
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These regexes essentially do the same as the previous string matches

Before:
"CHAR" in "NVARCHAR"

After:
re.compile(r".*CHAR.*", re.IGNORECASE).match("NVARCHAR")

Comment on lines 29 to 40
col = TableColumn(column_name="__time", type="INTEGER")

database = Database(database_name="druid_db", sqlalchemy_uri="druid://db")
tbl = SqlaTable(table_name="druid_tbl", database=database)
col = TableColumn(column_name="__time", type="INTEGER", table=tbl)
self.assertEqual(col.is_dttm, None)
DruidEngineSpec.alter_new_orm_column(col)
self.assertEqual(col.is_dttm, True)

col = TableColumn(column_name="__not_time", type="INTEGER")
col = TableColumn(column_name="__not_time", type="INTEGER", table=tbl)
self.assertEqual(col.is_time, False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to refactor out alter_new_orm_column in this PR, but will leave that for a later date to avoid convoluting this fix.

Copy link
Member

@dpgaspar dpgaspar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some non blocking comments

superset/connectors/sqla/models.py Outdated Show resolved Hide resolved
superset/connectors/sqla/models.py Outdated Show resolved Hide resolved
@villebro villebro requested a review from dpgaspar March 4, 2020 14:20
@villebro villebro merged commit 7a91498 into apache:master Mar 4, 2020
john-bodley added a commit to john-bodley/superset that referenced this pull request Apr 24, 2020
john-bodley added a commit that referenced this pull request Apr 24, 2020
Co-authored-by: John Bodley <john.bodley@airbnb.com>
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 0.36.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants