-
Notifications
You must be signed in to change notification settings - Fork 16.5k
chore: annotate important types #36034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've completed my review and didn't find any issues.
Files scanned
| File Path | Reviewed |
|---|---|
| superset/commands/explore/get.py | ✅ |
| superset/models/sql_lab.py | ✅ |
| superset/common/query_object.py | ✅ |
| superset/superset_typing.py | ✅ |
| superset/views/utils.py | ✅ |
| superset/views/core.py | ✅ |
| superset/jinja_context.py | ✅ |
| superset/utils/core.py | ✅ |
| superset/connectors/sqla/models.py | ✅ |
| superset/models/helpers.py | ✅ |
| superset/viz.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review Agent Run #5bb146
Actionable Suggestions - 1
-
superset/utils/core.py - 1
- Add null check for metric column access · Line 1621-1621
Additional Suggestions - 10
-
superset/connectors/sqla/models.py - 1
-
Move typing imports to type checking block · Line 107-112Move application imports `AdhocColumn`, `AdhocMetric`, `BaseDatasourceData`, `Metric`, `QueryObjectDict`, and `ResultSetColumnType` from `superset.superset_typing` into a type-checking block to improve runtime performance. These imports are only used for type annotations.
Code suggestion
@@ -18,6 +18,7 @@ from __future__ import annotations import builtins +from typing import TYPE_CHECKING import dataclasses import logging from collections import defaultdict @@ -106,12 +107,15 @@ from superset.models.slice import Slice from superset.sql.parse import Table from superset.superset_typing import ( - AdhocColumn, - AdhocMetric, - BaseDatasourceData, - Metric, - QueryObjectDict, - ResultSetColumnType, +) + +if TYPE_CHECKING: + from superset.superset_typing import ( + AdhocColumn, + AdhocMetric, + BaseDatasourceData, + Metric, + QueryObjectDict, + ResultSetColumnType, ) from superset.utils import core as utils, json from superset.utils.backports import StrEnum
-
-
superset/models/sql_lab.py - 2
-
Missing docstring and unused parameter · Line 334-334The `get_extra_cache_keys` method is missing a docstring and has an unused parameter `query_obj`. Consider adding documentation and removing or using the parameter.
Code suggestion
@@ -334,2 +334,3 @@ def get_extra_cache_keys(self, query_obj: QueryObjectDict) -> list[Hashable]: + """Return additional cache keys for the query object.""" return [] -
Missing period in docstring first line · Line 242-242The docstring for the `data` property should end with a period. This issue also affects the same docstring with rule D415.
Code suggestion
@@ -241,3 +241,3 @@ @property def data(self) -> QueryData: - """Returns query data for the frontend""" + """Returns query data for the frontend.""" order_by_choices = []
-
-
superset/common/query_object.py - 2
-
Move typing imports to TYPE_CHECKING block · Line 39-39Move application imports `Column`, `Metric`, `OrderBy`, and `QueryObjectDict` from `superset.superset_typing` into the existing `TYPE_CHECKING` block to improve runtime performance and avoid circular imports.
Code suggestion
@@ -38,3 +38,7 @@ from superset.sql.parse import sanitize_clause -from superset.superset_typing import Column, Metric, OrderBy, QueryObjectDict from superset.utils import json, pandas_postprocessing +from superset.utils import json, pandas_postprocessing @@ -51,3 +51,7 @@ if TYPE_CHECKING: from superset.connectors.sqla.models import BaseDatasource + from superset.superset_typing import Column, Metric, OrderBy, QueryObjectDict
-
Add docstring to public method to_dict · Line 373-373Add a docstring to the public method `to_dict` to document its purpose and return value.
Code suggestion
@@ -372,2 +372,5 @@ def to_dict(self) -> QueryObjectDict: + """Convert QueryObject to dictionary representation. + + Returns: Dictionary containing all query object properties. + """
-
-
superset/superset_typing.py - 1
-
Fix multi-line docstring summary formatting · Line 115-115Multi-line docstring summary should start at the first line. This issue also appears at lines 194 and 290.
Code suggestion
@@ -114,3 +114,2 @@ class QueryObjectDict(TypedDict, total=False): - """ - TypedDict representation of query objects used throughout Superset. + """TypedDict representation of query objects used throughout Superset.
-
-
tests/integration_tests/sqla_models_tests.py - 1
-
Use of assert detected in test · Line 980-980Multiple `assert` statements detected in test functions (lines 980, 984, 1025, 1026). Consider using pytest assertions or test framework methods for better error reporting.
Code suggestion
@@ -979,2 +979,2 @@ - assert ( - table.has_extra_cache_key_calls(cast(QueryObjectDict, query_obj)) + assert table.has_extra_cache_key_calls(cast(QueryObjectDict, query_obj)) == has_extra_cache_keys
-
-
superset/commands/explore/get.py - 1
-
Move typing import to type checking block · Line 40-40The import `BaseDatasourceData` from `superset.superset_typing` should be moved into a `TYPE_CHECKING` block since it's only used for type annotations. This improves runtime performance by avoiding unnecessary imports.
Code suggestion
@@ -19,6 +19,7 @@ from abc import ABC -from typing import Any, cast, Optional +from typing import Any, cast, Optional, TYPE_CHECKING from flask import request @@ -38,7 +39,6 @@ from superset.explore.permalink.exceptions import ExplorePermalinkGetFailedError from superset.extensions import security_manager -from superset.superset_typing import BaseDatasourceData from superset.utils import core as utils, json from superset.views.utils import ( get_datasource_info, @@ -46,6 +46,9 @@ sanitize_datasource_data, ) +if TYPE_CHECKING: + from superset.superset_typing import BaseDatasourceData + logger = logging.getLogger(__name__)
-
-
superset/views/utils.py - 2
-
Use modern union syntax for type annotations · Line 95-95Consider using the modern `X | Y` union syntax instead of `Union[X, Y]` for type annotations on line 95. This follows PEP 604 and improves readability.
Code suggestion
@@ -94,3 +94,3 @@ def sanitize_datasource_data( - datasource_data: Union[dict[str, Any], BaseDatasourceData, QueryData], + datasource_data: dict[str, Any] | BaseDatasourceData | QueryData, ) -> dict[str, Any]:
-
Fix multi-line docstring summary formatting · Line 97-97The docstring summary should start on the first line instead of the second line. Move the summary text to line 97 for proper docstring formatting.
Code suggestion
@@ -96,5 +96,4 @@ ) -> dict[str, Any]: - """ - Sanitize datasource data by removing sensitive database parameters. + """Sanitize datasource data by removing sensitive database parameters. Accepts TypedDict types (BaseDatasourceData, QueryData) or plain dicts.
-
Review Details
-
Files reviewed - 12 · Commit Range:
be80712..be80712- superset/commands/explore/get.py
- superset/common/query_object.py
- superset/connectors/sqla/models.py
- superset/jinja_context.py
- superset/models/helpers.py
- superset/models/sql_lab.py
- superset/superset_typing.py
- superset/utils/core.py
- superset/views/core.py
- superset/views/utils.py
- superset/viz.py
- tests/integration_tests/sqla_models_tests.py
-
Files skipped - 0
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| metric = cast(AdhocMetric, metric) | ||
| if metric["expressionType"] == AdhocMetricExpressionType.SIMPLE: | ||
| return cast(dict[str, Any], metric["column"])["column_name"] | ||
| column = metric["column"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code directly accesses metric["column"]["column_name"] without checking if column is None, which can cause a TypeError in production when the column field is missing. The change adds a null check by extracting column into a variable and verifying it exists before accessing its properties, preventing runtime errors and ensuring graceful handling of None values.
Code suggestion
Check the AI-generated fix before applying
- return cast(dict[str, Any], metric["column"])["column_name"]
+ column = metric["column"]
+ if column:
+ return column["column_name"]
Code Review Run #5bb146
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
637401c to
b73f166
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves type safety throughout the Superset codebase by introducing structured TypedDict definitions for commonly used dictionary types and replacing loose dict[str, Any] annotations with more precise types. The changes modernize type annotations using Python 3.10+ syntax (Union to |, Optional to | None) and add comprehensive documentation for the new TypedDict structures.
- Introduces three new TypedDict classes (
QueryObjectDict,BaseDatasourceData,QueryData) with extensive documentation - Updates type annotations across datasource models, visualization classes, and API endpoints to use the new TypedDicts
- Removes redundant code that was overriding values already present in Query.data property
- Adds defensive null checking in
get_column_name_from_metricfunction
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| superset/superset_typing.py | Defines new TypedDict classes (QueryObjectDict, BaseDatasourceData, QueryData) with comprehensive documentation; modernizes type aliases using Python 3.10+ syntax |
| superset/common/query_object.py | Updates QueryObject.to_dict() return type to QueryObjectDict; adds explicit dict conversion in cache_key method |
| superset/connectors/sqla/models.py | Updates BaseDatasource.data and SqlaTable.data return types to BaseDatasourceData; adds filtering logic for get_sqla_query parameters in get_extra_cache_keys |
| superset/models/helpers.py | Updates get_extra_cache_keys signature to use QueryObjectDict; adds parameter filtering in get_query_str_extended |
| superset/models/sql_lab.py | Updates Query.data return type to QueryData; updates get_extra_cache_keys signature |
| superset/views/core.py | Updates datasource_data types to BaseDatasourceData or QueryData; removes redundant Query-specific column assignment since Query.data already includes columns |
| superset/views/utils.py | Updates sanitize_datasource_data to accept BaseDatasourceData or QueryData types; adds documentation |
| superset/commands/explore/get.py | Updates datasource_data type annotations to BaseDatasourceData or QueryData |
| superset/viz.py | Updates cache_key method with explicit cast to dict[str, Any] for mutation operations |
| superset/utils/core.py | Adds null check for column field in get_column_name_from_metric to prevent potential AttributeError |
| superset/jinja_context.py | Updates dataset_macro query_obj type to QueryObjectDict with explicit Column list cast |
| tests/integration_tests/sqla_models_tests.py | Adds cast to QueryObjectDict in test assertions to satisfy type checkers |
superset/models/helpers.py
Outdated
| sqla_query_keys = { | ||
| "apply_fetch_values_predicate", | ||
| "columns", | ||
| "extras", | ||
| "filter", | ||
| "from_dttm", | ||
| "granularity", | ||
| "groupby", | ||
| "inner_from_dttm", | ||
| "inner_to_dttm", | ||
| "is_rowcount", | ||
| "is_timeseries", | ||
| "metrics", | ||
| "orderby", | ||
| "order_desc", | ||
| "to_dttm", | ||
| "series_columns", | ||
| "series_limit", | ||
| "series_limit_metric", | ||
| "group_others_when_limit_reached", | ||
| "row_limit", | ||
| "row_offset", | ||
| "timeseries_limit", | ||
| "timeseries_limit_metric", | ||
| "time_shift", | ||
| } |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sqla_query_keys set is duplicated in both get_query_str_extended (here) and in superset/connectors/sqla/models.py in the SqlaTable.get_extra_cache_keys method (lines 1924-1949). Consider extracting this set to a module-level constant or a shared helper method to avoid maintaining two copies of the same parameter list.
superset/connectors/sqla/models.py
Outdated
| sqla_query_keys = { | ||
| "apply_fetch_values_predicate", | ||
| "columns", | ||
| "extras", | ||
| "filter", | ||
| "from_dttm", | ||
| "granularity", | ||
| "groupby", | ||
| "inner_from_dttm", | ||
| "inner_to_dttm", | ||
| "is_rowcount", | ||
| "is_timeseries", | ||
| "metrics", | ||
| "orderby", | ||
| "order_desc", | ||
| "to_dttm", | ||
| "series_columns", | ||
| "series_limit", | ||
| "series_limit_metric", | ||
| "group_others_when_limit_reached", | ||
| "row_limit", | ||
| "row_offset", | ||
| "timeseries_limit", | ||
| "timeseries_limit_metric", | ||
| "time_shift", | ||
| } |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sqla_query_keys set is duplicated in both get_extra_cache_keys (here) and in superset/models/helpers.py in the BaseDatasource.get_query_str_extended method (lines 978-1003). Consider extracting this set to a module-level constant or a shared helper method to avoid maintaining two copies of the same parameter list.
| for column_ in data["columns"]: | ||
| generic_type = column_.get("type_generic") | ||
| for column_ in cast(list[dict[str, Any]], data["columns"]): # type: ignore[assignment] | ||
| column_dict = cast(dict[str, Any], column_) |
Copilot
AI
Nov 7, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast on this line is redundant since column_ is already typed as dict[str, Any] from the cast on line 480. The variable can be used directly without the additional cast.
| column_dict = cast(dict[str, Any], column_) | |
| column_dict = column_ |
| # Filter out keys that aren't parameters to get_sqla_query | ||
| filtered_query_obj = { | ||
| k: v for k, v in query_obj.items() if k in SQLA_QUERY_KEYS | ||
| } | ||
| sqla_query = self.get_sqla_query(**cast(Any, filtered_query_obj)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any guardrails or warning if there is a new query_obj.items() but they didn't add it to SQLA_QUERY_KEYS? Currently this would just drop the key, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if someone adds a new parameter to get_sqla_query but forgets to add it to SQLA_QUERY_KEYS then it won't be passed. I assume they would catch the error when testing their changes, because otherwise their changes will be no-op.
|
|
||
| # Additional fields used throughout the codebase | ||
| time_range: str | None | ||
| datasource: Any # BaseDatasource instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is supposed to be a BaseDatasource instance, why are we typing it as Any? does it fail if we don't use Any?
Extract the duplicated sqla_query_keys set to a shared constant SQLA_QUERY_KEYS to avoid maintaining two identical copies in helpers.py and sqla/models.py. This addresses code review feedback to reduce code duplication. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
aac7a46 to
bd53e14
Compare
91feaaf to
2550123
Compare
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit fb7d0e0)
Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit fb7d0e0)
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit fb7d0e0)
Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit fb7d0e0)
SUMMARY
We pass a
QueryObjectDictaround a lot, but it's annotated as onlydict[str, Any]. Also related to visualization, the return object fromdata()is also annotated asdict[str, Any], but we can do much better.This PR replaces those generic type annotations with typed dicts.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION