chore: Change get_table_names/get_view_names return type #22085

john-bodley · 2022-11-09T22:42:21Z

SUMMARY

After working on #21794 I realized that the get_table_names and get_view_names methods should really return a set of names rather than an ordered list. The results can then be sorted—if needed—within the API response. Hopefully this refactor simplifies the code logic.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

villebro

LGTM with some besserwisser-comments

villebro · 2022-11-11T09:20:42Z

superset/models/core.py

@@ -556,13 +556,17 @@ def get_all_table_names_in_schema(  # pylint: disable=unused-argument
        :param cache: whether cache is enabled for the function
        :param cache_timeout: timeout in seconds for the cache
        :param force: whether to force refresh the cache
-        :return: list of tables
+        :return: set of tables


code style: Some people consider including the type in the return description bad practice if it's already typed in the sig. I'm not sure which camp I belong to (it varies from day to day), but maybe we could rather try to be more expressive regarding the funny Tuple[str, str] thingy, something like

:return: table name and schema pairs

villebro · 2022-11-11T09:22:09Z

tests/integration_tests/db_engine_specs/base_engine_spec_tests.py

        base_result = BaseEngineSpec.get_table_names(
            database=mock.ANY, schema="schema", inspector=inspector
        )
-        self.assertListEqual(base_result_expected, base_result)
+        self.assertSetEqual(base_result_expected, base_result)


could we just do this as that's what appears to be preferred these days?

assert base_result_expected == base_result

I agree. I didn't know if unittest supported this.

ktmud · 2022-11-13T08:54:56Z

I don't understand why you have to return set. Shouldn't DBAPI return unique table names to begin with?

john-bodley · 2022-11-16T16:14:28Z

@ktmud regarding your comment,

I don't understand why you have to return set. Shouldn't DBAPI return unique table names to begin with?

Yes the DB-API will return a unique set of table names so the set logic isn't for deduping purposes (except for when we need to do set differing for determining the tables vs. views). Personally I think—from reading the code—that the type hints help one grok the logic of the code better, i.e., when I see Set[str] I know the values are unique whereas when I see List[str] I'm not as evident. We should really only use List[...] when order matters.

codecov · 2022-11-16T16:30:11Z

Codecov Report

Merging #22085 (a876fb4) into master (9f7bd1e) will increase coverage by 14.51%.
The diff coverage is 62.50%.

@@             Coverage Diff             @@
##           master   #22085       +/-   ##
===========================================
+ Coverage   52.50%   67.02%   +14.51%     
===========================================
  Files        1818     1819        +1     
  Lines       69632    69555       -77     
  Branches     7496     7496               
===========================================
+ Hits        36562    46618    +10056     
+ Misses      31132    20999    -10133     
  Partials     1938     1938

Flag	Coverage Δ
hive	`52.61% <31.25%> (?)`
mysql	`78.15% <56.25%> (?)`
postgres	`78.22% <56.25%> (?)`
presto	`52.50% <31.25%> (?)`
python	`81.39% <62.50%> (+30.17%)`	⬆️
sqlite	`76.68% <62.50%> (?)`
unit	`50.86% <31.25%> (-0.35%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/db_engine_specs/base.py	`89.21% <ø> (+16.37%)`	⬆️
superset/views/core.py	`76.11% <ø> (+49.31%)`	⬆️
superset/models/core.py	`90.50% <33.33%> (+19.11%)`	⬆️
superset/db_engine_specs/databricks.py	`82.92% <50.00%> (+5.65%)`	⬆️
superset/db_engine_specs/duckdb.py	`75.86% <50.00%> (ø)`
superset/db_engine_specs/postgres.py	`96.55% <100.00%> (+42.31%)`	⬆️
superset/db_engine_specs/presto.py	`88.22% <100.00%> (+61.77%)`	⬆️
superset/db_engine_specs/sqlite.py	`96.42% <100.00%> (+7.14%)`	⬆️
superset/tables/schemas.py	`0.00% <0.00%> (-100.00%)`	⬇️
superset/columns/schemas.py	`0.00% <0.00%> (-100.00%)`	⬇️
... and 325 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ktmud · 2022-11-16T16:36:48Z

Set has the additional memory cost of keeping the hashtable of elements: https://towardsdatascience.com/memory-efficiency-of-common-python-data-structures-88f0f720421

Maybe not a big deal in this case but in general I don't think we should sacrifice performance for an implied convention for readability that is not very obvious.

Regarding readability, in the case where sets are used, people would wonder "why this HAS to be a set" (like I did), which actually makes the code more confusing to them.

Also, using sets to collect data that is already supposed to be unique may hide duplicates that were erroneously generated, hiding a potential performance issue or bug.

pull-request-size bot added the size/L label Nov 9, 2022

john-bodley requested review from ktmud, villebro and betodealmeida November 9, 2022 23:01

john-bodley marked this pull request as ready for review November 9, 2022 23:02

chore: Change get_table_names/get_view_names return type

e7c73d8

john-bodley force-pushed the john-bodley--get-table-view-names-set branch from 6e49fc4 to e7c73d8 Compare November 9, 2022 23:20

villebro approved these changes Nov 11, 2022

View reviewed changes

john-bodley added 3 commits November 16, 2022 08:18

Update core.py

0945e68

Update base_engine_spec_tests.py

e2fdf07

Update postgres_tests.py

a876fb4

john-bodley merged commit 7e54b88 into apache:master Nov 18, 2022

john-bodley deleted the john-bodley--get-table-view-names-set branch November 18, 2022 20:41

diegomedina248 pushed a commit to preset-io/superset that referenced this pull request Dec 3, 2022

chore: Change get_table_names/get_view_names return type (apache#22085)

fc8fae0

mistercrunch added the 🚢 2.1.3 label Feb 18, 2024

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Change get_table_names/get_view_names return type #22085

chore: Change get_table_names/get_view_names return type #22085

john-bodley commented Nov 9, 2022 •

edited

Loading

villebro left a comment

villebro Nov 11, 2022

villebro Nov 11, 2022

john-bodley Nov 16, 2022

ktmud commented Nov 13, 2022

john-bodley commented Nov 16, 2022

codecov bot commented Nov 16, 2022 •

edited

Loading

ktmud commented Nov 16, 2022 •

edited

Loading

chore: Change get_table_names/get_view_names return type #22085

chore: Change get_table_names/get_view_names return type #22085

Conversation

john-bodley commented Nov 9, 2022 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

villebro left a comment

Choose a reason for hiding this comment

villebro Nov 11, 2022

Choose a reason for hiding this comment

villebro Nov 11, 2022

Choose a reason for hiding this comment

john-bodley Nov 16, 2022

Choose a reason for hiding this comment

ktmud commented Nov 13, 2022

john-bodley commented Nov 16, 2022

codecov bot commented Nov 16, 2022 • edited Loading

Codecov Report

ktmud commented Nov 16, 2022 • edited Loading

john-bodley commented Nov 9, 2022 •

edited

Loading

codecov bot commented Nov 16, 2022 •

edited

Loading

ktmud commented Nov 16, 2022 •

edited

Loading