fix: share column type matching between model and result set #9161

villebro · 2020-02-18T15:46:46Z

SUMMARY

When investigating a bug affecting BigQuery time grains, I noticed that Exploring SQL Lab queries don't correctly detect TIME and DATE types due to the logic being different in the result set code vs the SqlAlchemy model. This PR adds a method to BaseEngineSpec for matching database specific column types (e.g. NVARCHAR, DATETIME, BIGINT) to generic types (temporal, numeric and date). In addition, type matching logic in SupersetResultSet is replaced with said logic, ensuring that type inference is uniform across models.

TEST PLAN

Test locally + CI (old + new tests)

ADDITIONAL INFORMATION

REVIEWERS

codecov-io · 2020-02-18T15:55:16Z

Codecov Report

Merging #9161 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #9161   +/-   ##
=======================================
  Coverage   59.06%   59.06%           
=======================================
  Files         372      372           
  Lines       11922    11922           
  Branches     2919     2919           
=======================================
  Hits         7042     7042           
  Misses       4698     4698           
  Partials      182      182

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0992445...f510a85. Read the comment docs.

willbarrett

Nice refactor @villebro! Thanks!

tests/utils_tests.py

superset/utils/core.py

tests/sqla_models_tests.py

villebro · 2020-03-03T22:01:23Z

superset/db_engine_specs/base.py

+    # default matching patterns for identifying column types
+    db_column_types: Dict[utils.DbColumnType, Tuple[Pattern, ...]] = {
+        utils.DbColumnType.NUMERIC: (
+            re.compile(r".*DOUBLE.*", re.IGNORECASE),
+            re.compile(r".*FLOAT.*", re.IGNORECASE),
+            re.compile(r".*INT.*", re.IGNORECASE),
+            re.compile(r".*NUMBER.*", re.IGNORECASE),
+            re.compile(r".*LONG.*", re.IGNORECASE),
+            re.compile(r".*REAL.*", re.IGNORECASE),
+            re.compile(r".*NUMERIC.*", re.IGNORECASE),
+            re.compile(r".*DECIMAL.*", re.IGNORECASE),
+            re.compile(r".*MONEY.*", re.IGNORECASE),
+        ),
+        utils.DbColumnType.STRING: (
+            re.compile(r".*CHAR.*", re.IGNORECASE),
+            re.compile(r".*STRING.*", re.IGNORECASE),
+        ),
+        utils.DbColumnType.TEMPORAL: (
+            re.compile(r".*DATE.*", re.IGNORECASE),
+            re.compile(r".*TIME.*", re.IGNORECASE),
+        ),
+    }


These regexes essentially do the same as the previous string matches

Before:
"CHAR" in "NVARCHAR"

After:
re.compile(r".*CHAR.*", re.IGNORECASE).match("NVARCHAR")

villebro · 2020-03-04T06:18:31Z

tests/sqla_models_tests.py

-        col = TableColumn(column_name="__time", type="INTEGER")
+
+        database = Database(database_name="druid_db", sqlalchemy_uri="druid://db")
+        tbl = SqlaTable(table_name="druid_tbl", database=database)
+        col = TableColumn(column_name="__time", type="INTEGER", table=tbl)
        self.assertEqual(col.is_dttm, None)
        DruidEngineSpec.alter_new_orm_column(col)
        self.assertEqual(col.is_dttm, True)

-        col = TableColumn(column_name="__not_time", type="INTEGER")
+        col = TableColumn(column_name="__not_time", type="INTEGER", table=tbl)
        self.assertEqual(col.is_time, False)


I intended to refactor out alter_new_orm_column in this PR, but will leave that for a later date to avoid convoluting this fix.

dpgaspar

LGTM, just some non blocking comments

superset/connectors/sqla/models.py

Co-authored-by: John Bodley <john.bodley@airbnb.com>

pull-request-size bot added the size/L label Feb 18, 2020

willbarrett approved these changes Feb 18, 2020

View reviewed changes

villebro requested a review from john-bodley February 19, 2020 16:49

john-bodley reviewed Feb 19, 2020

View reviewed changes

tests/utils_tests.py Outdated Show resolved Hide resolved

superset/utils/core.py Outdated Show resolved Hide resolved

tests/sqla_models_tests.py Outdated Show resolved Hide resolved

Share column type matching between model and result set

75fdc4a

villebro commented Mar 3, 2020

View reviewed changes

villebro requested review from john-bodley and dpgaspar March 3, 2020 22:11

villebro commented Mar 4, 2020

View reviewed changes

dpgaspar reviewed Mar 4, 2020

View reviewed changes

superset/connectors/sqla/models.py Outdated Show resolved Hide resolved

superset/connectors/sqla/models.py Outdated Show resolved Hide resolved

Address comments

b5525dc

villebro requested a review from dpgaspar March 4, 2020 14:20

dpgaspar approved these changes Mar 4, 2020

View reviewed changes

villebro merged commit 7a91498 into apache:master Mar 4, 2020

john-bodley mentioned this pull request Apr 23, 2020

[fix] Fixing regression from #9161 #9641

Merged

12 tasks

john-bodley added a commit to john-bodley/superset that referenced this pull request Apr 24, 2020

[fix] Fixing regression from apache#9161

f5b047f

john-bodley added a commit that referenced this pull request Apr 24, 2020

[fix] Fixing regression from #9161 (#9641)

8ae92b5

Co-authored-by: John Bodley <john.bodley@airbnb.com>

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.36.0 labels Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: share column type matching between model and result set #9161

fix: share column type matching between model and result set #9161

villebro commented Feb 18, 2020 •

edited

codecov-io commented Feb 18, 2020 •

edited

willbarrett left a comment

villebro Mar 3, 2020

villebro Mar 4, 2020

dpgaspar left a comment

fix: share column type matching between model and result set #9161

fix: share column type matching between model and result set #9161

Conversation

villebro commented Feb 18, 2020 • edited

CATEGORY

SUMMARY

TEST PLAN

ADDITIONAL INFORMATION

REVIEWERS

codecov-io commented Feb 18, 2020 • edited

Codecov Report

willbarrett left a comment

Choose a reason for hiding this comment

villebro Mar 3, 2020

Choose a reason for hiding this comment

villebro Mar 4, 2020

Choose a reason for hiding this comment

dpgaspar left a comment

Choose a reason for hiding this comment

villebro commented Feb 18, 2020 •

edited

codecov-io commented Feb 18, 2020 •

edited