chore: annotate important types #36034

betodealmeida · 2025-11-06T21:03:29Z

SUMMARY

We pass a QueryObjectDict around a lot, but it's annotated as only dict[str, Any]. Also related to visualization, the return object from data() is also annotated as dict[str, Any], but we can do much better.

This PR replaces those generic type annotations with typed dicts.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

korbit-ai

I've completed my review and didn't find any issues.

Files scanned

File Path	Reviewed
superset/commands/explore/get.py	✅
superset/models/sql_lab.py	✅
superset/common/query_object.py	✅
superset/superset_typing.py	✅
superset/views/utils.py	✅
superset/views/core.py	✅
superset/jinja_context.py	✅
superset/utils/core.py	✅
superset/connectors/sqla/models.py	✅
superset/models/helpers.py	✅
superset/viz.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

bito-code-review

Code Review Agent Run #5bb146

Actionable Suggestions - 1

superset/utils/core.py - 1
- Add null check for metric column access · Line 1621-1621

Additional Suggestions - 10

superset/connectors/sqla/models.py - 1

Move typing imports to type checking block · Line 107-112

Move application imports `AdhocColumn`, `AdhocMetric`, `BaseDatasourceData`, `Metric`, `QueryObjectDict`, and `ResultSetColumnType` from `superset.superset_typing` into a type-checking block to improve runtime performance. These imports are only used for type annotations.

Code suggestion

 @@ -18,6 +18,7 @@
  from __future__ import annotations
 
  import builtins
+from typing import TYPE_CHECKING
  import dataclasses
  import logging
  from collections import defaultdict
 @@ -106,12 +107,15 @@
  from superset.models.slice import Slice
  from superset.sql.parse import Table
  from superset.superset_typing import (
-    AdhocColumn,
-    AdhocMetric,
-    BaseDatasourceData,
-    Metric,
-    QueryObjectDict,
-    ResultSetColumnType,
+)
+
+if TYPE_CHECKING:
+    from superset.superset_typing import (
+        AdhocColumn,
+        AdhocMetric,
+        BaseDatasourceData,
+        Metric,
+        QueryObjectDict,
+        ResultSetColumnType,
  )
  from superset.utils import core as utils, json
  from superset.utils.backports import StrEnum

superset/models/sql_lab.py - 2
- Missing docstring and unused parameter · Line 334-334
  
  The `get_extra_cache_keys` method is missing a docstring and has an unused parameter `query_obj`. Consider adding documentation and removing or using the parameter.
  Code suggestion
  @@ -334,2 +334,3 @@ def get_extra_cache_keys(self, query_obj: QueryObjectDict) -> list[Hashable]: + """Return additional cache keys for the query object.""" return []
- Missing period in docstring first line · Line 242-242
  
  The docstring for the `data` property should end with a period. This issue also affects the same docstring with rule D415.
  Code suggestion
  @@ -241,3 +241,3 @@ @property def data(self) -> QueryData: - """Returns query data for the frontend""" + """Returns query data for the frontend.""" order_by_choices = []

superset/common/query_object.py - 2

Move typing imports to TYPE_CHECKING block · Line 39-39

Move application imports `Column`, `Metric`, `OrderBy`, and `QueryObjectDict` from `superset.superset_typing` into the existing `TYPE_CHECKING` block to improve runtime performance and avoid circular imports.

Code suggestion

 @@ -38,3 +38,7 @@
 from superset.sql.parse import sanitize_clause
-from superset.superset_typing import Column, Metric, OrderBy, QueryObjectDict
 from superset.utils import json, pandas_postprocessing
+from superset.utils import json, pandas_postprocessing
 @@ -51,3 +51,7 @@
 if TYPE_CHECKING:
     from superset.connectors.sqla.models import BaseDatasource
+    from superset.superset_typing import Column, Metric, OrderBy, QueryObjectDict

Add docstring to public method to_dict · Line 373-373

Add a docstring to the public method `to_dict` to document its purpose and return value.

Code suggestion

 @@ -372,2 +372,5 @@
     def to_dict(self) -> QueryObjectDict:
+        """Convert QueryObject to dictionary representation.
+        
+        Returns: Dictionary containing all query object properties.
+        """

superset/superset_typing.py - 1

Fix multi-line docstring summary formatting · Line 115-115

Multi-line docstring summary should start at the first line. This issue also appears at lines 194 and 290.

Code suggestion

 @@ -114,3 +114,2 @@
 class QueryObjectDict(TypedDict, total=False):
-    """
-    TypedDict representation of query objects used throughout Superset.
+    """TypedDict representation of query objects used throughout Superset.

tests/integration_tests/sqla_models_tests.py - 1
- Use of assert detected in test · Line 980-980
  
  Multiple `assert` statements detected in test functions (lines 980, 984, 1025, 1026). Consider using pytest assertions or test framework methods for better error reporting.
  Code suggestion
  @@ -979,2 +979,2 @@ - assert ( - table.has_extra_cache_key_calls(cast(QueryObjectDict, query_obj)) + assert table.has_extra_cache_key_calls(cast(QueryObjectDict, query_obj)) == has_extra_cache_keys

superset/commands/explore/get.py - 1

Move typing import to type checking block · Line 40-40

The import `BaseDatasourceData` from `superset.superset_typing` should be moved into a `TYPE_CHECKING` block since it's only used for type annotations. This improves runtime performance by avoiding unnecessary imports.

Code suggestion

 @@ -19,6 +19,7 @@
  from abc import ABC
-from typing import Any, cast, Optional
+from typing import Any, cast, Optional, TYPE_CHECKING
 
  from flask import request
 @@ -38,7 +39,6 @@
  from superset.explore.permalink.exceptions import ExplorePermalinkGetFailedError
  from superset.extensions import security_manager
-from superset.superset_typing import BaseDatasourceData
  from superset.utils import core as utils, json
  from superset.views.utils import (
      get_datasource_info,
 @@ -46,6 +46,9 @@
      sanitize_datasource_data,
  )
 
+if TYPE_CHECKING:
+    from superset.superset_typing import BaseDatasourceData
+
  logger = logging.getLogger(__name__)

superset/views/utils.py - 2

Use modern union syntax for type annotations · Line 95-95

Consider using the modern `X | Y` union syntax instead of `Union[X, Y]` for type annotations on line 95. This follows PEP 604 and improves readability.

Code suggestion

 @@ -94,3 +94,3 @@
 def sanitize_datasource_data(
-    datasource_data: Union[dict[str, Any], BaseDatasourceData, QueryData],
+    datasource_data: dict[str, Any] | BaseDatasourceData | QueryData,
 ) -> dict[str, Any]:

Fix multi-line docstring summary formatting · Line 97-97

The docstring summary should start on the first line instead of the second line. Move the summary text to line 97 for proper docstring formatting.

Code suggestion

 @@ -96,5 +96,4 @@
 ) -> dict[str, Any]:
-    """
-    Sanitize datasource data by removing sensitive database parameters.
+    """Sanitize datasource data by removing sensitive database parameters.

      Accepts TypedDict types (BaseDatasourceData, QueryData) or plain dicts.

Review Details

Files reviewed - 12 · Commit Range: be80712..be80712
- superset/commands/explore/get.py
- superset/common/query_object.py
- superset/connectors/sqla/models.py
- superset/jinja_context.py
- superset/models/helpers.py
- superset/models/sql_lab.py
- superset/superset_typing.py
- superset/utils/core.py
- superset/views/core.py
- superset/views/utils.py
- superset/viz.py
- tests/integration_tests/sqla_models_tests.py
Files skipped - 0
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

/review - Manually triggers a full AI review.
/pause - Pauses automatic reviews on this pull request.
/resume - Resumes automatic reviews.
/resolve - Marks all Bito-posted review comments as resolved.
/abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by

bito-code-review · 2025-11-06T21:13:43Z

superset/utils/core.py

        metric = cast(AdhocMetric, metric)
        if metric["expressionType"] == AdhocMetricExpressionType.SIMPLE:
-            return cast(dict[str, Any], metric["column"])["column_name"]
+            column = metric["column"]


Add null check for metric column access

The original code directly accesses metric["column"]["column_name"] without checking if column is None, which can cause a TypeError in production when the column field is missing. The change adds a null check by extracting column into a variable and verifying it exists before accessing its properties, preventing runtime errors and ensuring graceful handling of None values.

Code suggestion

Check the AI-generated fix before applying

- return cast(dict[str, Any], metric["column"])["column_name"] + column = metric["column"] + if column: + return column["column_name"]

Code Review Run #5bb146

Should Bito avoid suggestions like this for future reviews? (Manage Rules)

Yes, avoid them

Copilot

Pull Request Overview

This PR improves type safety throughout the Superset codebase by introducing structured TypedDict definitions for commonly used dictionary types and replacing loose dict[str, Any] annotations with more precise types. The changes modernize type annotations using Python 3.10+ syntax (Union to |, Optional to | None) and add comprehensive documentation for the new TypedDict structures.

Introduces three new TypedDict classes (QueryObjectDict, BaseDatasourceData, QueryData) with extensive documentation
Updates type annotations across datasource models, visualization classes, and API endpoints to use the new TypedDicts
Removes redundant code that was overriding values already present in Query.data property
Adds defensive null checking in get_column_name_from_metric function

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
superset/superset_typing.py	Defines new TypedDict classes (QueryObjectDict, BaseDatasourceData, QueryData) with comprehensive documentation; modernizes type aliases using Python 3.10+ syntax
superset/common/query_object.py	Updates QueryObject.to_dict() return type to QueryObjectDict; adds explicit dict conversion in cache_key method
superset/connectors/sqla/models.py	Updates BaseDatasource.data and SqlaTable.data return types to BaseDatasourceData; adds filtering logic for get_sqla_query parameters in get_extra_cache_keys
superset/models/helpers.py	Updates get_extra_cache_keys signature to use QueryObjectDict; adds parameter filtering in get_query_str_extended
superset/models/sql_lab.py	Updates Query.data return type to QueryData; updates get_extra_cache_keys signature
superset/views/core.py	Updates datasource_data types to BaseDatasourceData or QueryData; removes redundant Query-specific column assignment since Query.data already includes columns
superset/views/utils.py	Updates sanitize_datasource_data to accept BaseDatasourceData or QueryData types; adds documentation
superset/commands/explore/get.py	Updates datasource_data type annotations to BaseDatasourceData or QueryData
superset/viz.py	Updates cache_key method with explicit cast to dict[str, Any] for mutation operations
superset/utils/core.py	Adds null check for column field in get_column_name_from_metric to prevent potential AttributeError
superset/jinja_context.py	Updates dataset_macro query_obj type to QueryObjectDict with explicit Column list cast
tests/integration_tests/sqla_models_tests.py	Adds cast to QueryObjectDict in test assertions to satisfy type checkers

Copilot · 2025-11-07T18:27:06Z

superset/models/helpers.py

+        sqla_query_keys = {
+            "apply_fetch_values_predicate",
+            "columns",
+            "extras",
+            "filter",
+            "from_dttm",
+            "granularity",
+            "groupby",
+            "inner_from_dttm",
+            "inner_to_dttm",
+            "is_rowcount",
+            "is_timeseries",
+            "metrics",
+            "orderby",
+            "order_desc",
+            "to_dttm",
+            "series_columns",
+            "series_limit",
+            "series_limit_metric",
+            "group_others_when_limit_reached",
+            "row_limit",
+            "row_offset",
+            "timeseries_limit",
+            "timeseries_limit_metric",
+            "time_shift",
+        }


The sqla_query_keys set is duplicated in both get_query_str_extended (here) and in superset/connectors/sqla/models.py in the SqlaTable.get_extra_cache_keys method (lines 1924-1949). Consider extracting this set to a module-level constant or a shared helper method to avoid maintaining two copies of the same parameter list.

Copilot · 2025-11-07T18:27:07Z

superset/connectors/sqla/models.py

+            sqla_query_keys = {
+                "apply_fetch_values_predicate",
+                "columns",
+                "extras",
+                "filter",
+                "from_dttm",
+                "granularity",
+                "groupby",
+                "inner_from_dttm",
+                "inner_to_dttm",
+                "is_rowcount",
+                "is_timeseries",
+                "metrics",
+                "orderby",
+                "order_desc",
+                "to_dttm",
+                "series_columns",
+                "series_limit",
+                "series_limit_metric",
+                "group_others_when_limit_reached",
+                "row_limit",
+                "row_offset",
+                "timeseries_limit",
+                "timeseries_limit_metric",
+                "time_shift",
+            }


The sqla_query_keys set is duplicated in both get_extra_cache_keys (here) and in superset/models/helpers.py in the BaseDatasource.get_query_str_extended method (lines 978-1003). Consider extracting this set to a module-level constant or a shared helper method to avoid maintaining two copies of the same parameter list.

Copilot · 2025-11-07T18:27:07Z

superset/connectors/sqla/models.py

-        for column_ in data["columns"]:
-            generic_type = column_.get("type_generic")
+        for column_ in cast(list[dict[str, Any]], data["columns"]):  # type: ignore[assignment]
+            column_dict = cast(dict[str, Any], column_)


The cast on this line is redundant since column_ is already typed as dict[str, Any] from the cast on line 480. The variable can be used directly without the additional cast.

Suggested change

column_dict = cast(dict[str, Any], column_)

column_dict = column_

sadpandajoe · 2025-11-10T21:52:45Z

superset/connectors/sqla/models.py

+            # Filter out keys that aren't parameters to get_sqla_query
+            filtered_query_obj = {
+                k: v for k, v in query_obj.items() if k in SQLA_QUERY_KEYS
+            }
+            sqla_query = self.get_sqla_query(**cast(Any, filtered_query_obj))


Are there any guardrails or warning if there is a new query_obj.items() but they didn't add it to SQLA_QUERY_KEYS? Currently this would just drop the key, right?

Right, if someone adds a new parameter to get_sqla_query but forgets to add it to SQLA_QUERY_KEYS then it won't be passed. I assume they would catch the error when testing their changes, because otherwise their changes will be no-op.

sadpandajoe · 2025-11-12T23:22:23Z

superset/superset_typing.py

+
+    # Additional fields used throughout the codebase
+    time_range: str | None
+    datasource: Any  # BaseDatasource instance


if this is supposed to be a BaseDatasource instance, why are we typing it as Any? does it fail if we don't use Any?

Extract the duplicated sqla_query_keys set to a shared constant SQLA_QUERY_KEYS to avoid maintaining two identical copies in helpers.py and sqla/models.py. This addresses code review feedback to reduce code duplication. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Co-authored-by: Claude <noreply@anthropic.com>

Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit fb7d0e0)

Co-authored-by: Claude <noreply@anthropic.com>

Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit fb7d0e0)

pull-request-size bot added the size/L label Nov 6, 2025

github-actions bot added api Related to the REST API preset-io labels Nov 6, 2025

korbit-ai bot approved these changes Nov 6, 2025

View reviewed changes

bito-code-review bot reviewed Nov 6, 2025

View reviewed changes

betodealmeida force-pushed the annotate-important-types branch 4 times, most recently from 637401c to b73f166 Compare November 6, 2025 21:48

pull-request-size bot added size/XL and removed size/L labels Nov 6, 2025

sadpandajoe requested review from Copilot, msyavuz and sadpandajoe November 7, 2025 18:22

Copilot AI reviewed Nov 7, 2025

View reviewed changes

pull-request-size bot added size/L and removed size/XL labels Nov 10, 2025

sadpandajoe reviewed Nov 10, 2025

View reviewed changes

sadpandajoe reviewed Nov 12, 2025

View reviewed changes

eschutho approved these changes Nov 14, 2025

View reviewed changes

betodealmeida and others added 2 commits November 17, 2025 10:19

chore: annotate important types

8e0b2b6

betodealmeida force-pushed the annotate-important-types branch from aac7a46 to bd53e14 Compare November 17, 2025 15:20

Fix lint

2550123

betodealmeida force-pushed the annotate-important-types branch from 91feaaf to 2550123 Compare November 17, 2025 17:05

betodealmeida merged commit fb7d0e0 into master Nov 17, 2025
69 of 70 checks passed

betodealmeida deleted the annotate-important-types branch November 17, 2025 19:41

kshi020302 pushed a commit to jl141/superset that referenced this pull request Nov 30, 2025

chore: annotate important types (apache#36034)

50f49e8

Co-authored-by: Claude <noreply@anthropic.com>

sadpandajoe pushed a commit that referenced this pull request Dec 4, 2025

chore: annotate important types (#36034)

c6cf90e

Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit fb7d0e0)

sadpandajoe pushed a commit that referenced this pull request Dec 4, 2025

chore: annotate important types (#36034)

b5ffb31

Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit fb7d0e0)

Facyla pushed a commit to Facyla/superset-contrib that referenced this pull request Dec 16, 2025

chore: annotate important types (apache#36034)

ea84bc3

Co-authored-by: Claude <noreply@anthropic.com>

sadpandajoe pushed a commit that referenced this pull request Dec 16, 2025

chore: annotate important types (#36034)

64c1371

Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit fb7d0e0)

aminghadersohi pushed a commit to aminghadersohi/superset that referenced this pull request Jan 17, 2026

chore: annotate important types (apache#36034)

9f178e6

Co-authored-by: Claude <noreply@anthropic.com> (cherry picked from commit fb7d0e0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: annotate important types #36034

chore: annotate important types #36034

Uh oh!

betodealmeida commented Nov 6, 2025 •

edited

Loading

Uh oh!

korbit-ai bot left a comment

Uh oh!

bito-code-review bot left a comment •

edited

Loading

Uh oh!

bito-code-review bot Nov 6, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

Copilot AI Nov 7, 2025

Uh oh!

sadpandajoe Nov 10, 2025

Uh oh!

betodealmeida Nov 12, 2025

Uh oh!

sadpandajoe Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	column_dict = cast(dict[str, Any], column_)
	column_dict = column_

chore: annotate important types #36034

chore: annotate important types #36034

Uh oh!

Conversation

betodealmeida commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Uh oh!

korbit-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

bito-code-review bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Code Review Agent Run #5bb146

Uh oh!

bito-code-review bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

sadpandajoe Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

betodealmeida Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

sadpandajoe Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

betodealmeida commented Nov 6, 2025 •

edited

Loading

bito-code-review bot left a comment •

edited

Loading