New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: share column type matching between model and result set #9161
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9161 +/- ##
=======================================
Coverage 59.06% 59.06%
=======================================
Files 372 372
Lines 11922 11922
Branches 2919 2919
=======================================
Hits 7042 7042
Misses 4698 4698
Partials 182 182 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor @villebro! Thanks!
# default matching patterns for identifying column types | ||
db_column_types: Dict[utils.DbColumnType, Tuple[Pattern, ...]] = { | ||
utils.DbColumnType.NUMERIC: ( | ||
re.compile(r".*DOUBLE.*", re.IGNORECASE), | ||
re.compile(r".*FLOAT.*", re.IGNORECASE), | ||
re.compile(r".*INT.*", re.IGNORECASE), | ||
re.compile(r".*NUMBER.*", re.IGNORECASE), | ||
re.compile(r".*LONG.*", re.IGNORECASE), | ||
re.compile(r".*REAL.*", re.IGNORECASE), | ||
re.compile(r".*NUMERIC.*", re.IGNORECASE), | ||
re.compile(r".*DECIMAL.*", re.IGNORECASE), | ||
re.compile(r".*MONEY.*", re.IGNORECASE), | ||
), | ||
utils.DbColumnType.STRING: ( | ||
re.compile(r".*CHAR.*", re.IGNORECASE), | ||
re.compile(r".*STRING.*", re.IGNORECASE), | ||
), | ||
utils.DbColumnType.TEMPORAL: ( | ||
re.compile(r".*DATE.*", re.IGNORECASE), | ||
re.compile(r".*TIME.*", re.IGNORECASE), | ||
), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These regexes essentially do the same as the previous string matches
Before:
"CHAR" in "NVARCHAR"
After:
re.compile(r".*CHAR.*", re.IGNORECASE).match("NVARCHAR")
tests/sqla_models_tests.py
Outdated
col = TableColumn(column_name="__time", type="INTEGER") | ||
|
||
database = Database(database_name="druid_db", sqlalchemy_uri="druid://db") | ||
tbl = SqlaTable(table_name="druid_tbl", database=database) | ||
col = TableColumn(column_name="__time", type="INTEGER", table=tbl) | ||
self.assertEqual(col.is_dttm, None) | ||
DruidEngineSpec.alter_new_orm_column(col) | ||
self.assertEqual(col.is_dttm, True) | ||
|
||
col = TableColumn(column_name="__not_time", type="INTEGER") | ||
col = TableColumn(column_name="__not_time", type="INTEGER", table=tbl) | ||
self.assertEqual(col.is_time, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended to refactor out alter_new_orm_column
in this PR, but will leave that for a later date to avoid convoluting this fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some non blocking comments
CATEGORY
Choose one
SUMMARY
When investigating a bug affecting BigQuery time grains, I noticed that Exploring SQL Lab queries don't correctly detect
TIME
andDATE
types due to the logic being different in the result set code vs the SqlAlchemy model. This PR adds a method toBaseEngineSpec
for matching database specific column types (e.g.NVARCHAR
,DATETIME
,BIGINT
) to generic types (temporal, numeric and date). In addition, type matching logic inSupersetResultSet
is replaced with said logic, ensuring that type inference is uniform across models.TEST PLAN
Test locally + CI (old + new tests)
ADDITIONAL INFORMATION
REVIEWERS