Skip to content

Conversation

@betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Nov 6, 2025

SUMMARY

We pass a QueryObjectDict around a lot, but it's annotated as only dict[str, Any]. Also related to visualization, the return object from data() is also annotated as dict[str, Any], but we can do much better.

This PR replaces those generic type annotations with typed dicts.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added api Related to the REST API preset-io labels Nov 6, 2025
Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed my review and didn't find any issues.

Files scanned
File Path Reviewed
superset/commands/explore/get.py
superset/models/sql_lab.py
superset/common/query_object.py
superset/superset_typing.py
superset/views/utils.py
superset/views/core.py
superset/jinja_context.py
superset/utils/core.py
superset/connectors/sqla/models.py
superset/models/helpers.py
superset/viz.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

Copy link
Contributor

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #5bb146

Actionable Suggestions - 1
  • superset/utils/core.py - 1
Additional Suggestions - 10
  • superset/connectors/sqla/models.py - 1
    • Move typing imports to type checking block · Line 107-112
      Move application imports `AdhocColumn`, `AdhocMetric`, `BaseDatasourceData`, `Metric`, `QueryObjectDict`, and `ResultSetColumnType` from `superset.superset_typing` into a type-checking block to improve runtime performance. These imports are only used for type annotations.
      Code suggestion
       @@ -18,6 +18,7 @@
        from __future__ import annotations
       
        import builtins
      +from typing import TYPE_CHECKING
        import dataclasses
        import logging
        from collections import defaultdict
       @@ -106,12 +107,15 @@
        from superset.models.slice import Slice
        from superset.sql.parse import Table
        from superset.superset_typing import (
      -    AdhocColumn,
      -    AdhocMetric,
      -    BaseDatasourceData,
      -    Metric,
      -    QueryObjectDict,
      -    ResultSetColumnType,
      +)
      +
      +if TYPE_CHECKING:
      +    from superset.superset_typing import (
      +        AdhocColumn,
      +        AdhocMetric,
      +        BaseDatasourceData,
      +        Metric,
      +        QueryObjectDict,
      +        ResultSetColumnType,
        )
        from superset.utils import core as utils, json
        from superset.utils.backports import StrEnum
  • superset/models/sql_lab.py - 2
    • Missing docstring and unused parameter · Line 334-334
      The `get_extra_cache_keys` method is missing a docstring and has an unused parameter `query_obj`. Consider adding documentation and removing or using the parameter.
      Code suggestion
       @@ -334,2 +334,3 @@
           def get_extra_cache_keys(self, query_obj: QueryObjectDict) -> list[Hashable]:
      +        """Return additional cache keys for the query object."""
               return []
    • Missing period in docstring first line · Line 242-242
      The docstring for the `data` property should end with a period. This issue also affects the same docstring with rule D415.
      Code suggestion
       @@ -241,3 +241,3 @@
           @property
           def data(self) -> QueryData:
      -        """Returns query data for the frontend"""
      +        """Returns query data for the frontend."""
               order_by_choices = []
  • superset/common/query_object.py - 2
    • Move typing imports to TYPE_CHECKING block · Line 39-39
      Move application imports `Column`, `Metric`, `OrderBy`, and `QueryObjectDict` from `superset.superset_typing` into the existing `TYPE_CHECKING` block to improve runtime performance and avoid circular imports.
      Code suggestion
       @@ -38,3 +38,7 @@
       from superset.sql.parse import sanitize_clause
      -from superset.superset_typing import Column, Metric, OrderBy, QueryObjectDict
       from superset.utils import json, pandas_postprocessing
      +from superset.utils import json, pandas_postprocessing
       @@ -51,3 +51,7 @@
       if TYPE_CHECKING:
           from superset.connectors.sqla.models import BaseDatasource
      +    from superset.superset_typing import Column, Metric, OrderBy, QueryObjectDict
    • Add docstring to public method to_dict · Line 373-373
      Add a docstring to the public method `to_dict` to document its purpose and return value.
      Code suggestion
       @@ -372,2 +372,5 @@
           def to_dict(self) -> QueryObjectDict:
      +        """Convert QueryObject to dictionary representation.
      +        
      +        Returns: Dictionary containing all query object properties.
      +        """
  • superset/superset_typing.py - 1
    • Fix multi-line docstring summary formatting · Line 115-115
      Multi-line docstring summary should start at the first line. This issue also appears at lines 194 and 290.
      Code suggestion
       @@ -114,3 +114,2 @@
       class QueryObjectDict(TypedDict, total=False):
      -    """
      -    TypedDict representation of query objects used throughout Superset.
      +    """TypedDict representation of query objects used throughout Superset.
  • tests/integration_tests/sqla_models_tests.py - 1
    • Use of assert detected in test · Line 980-980
      Multiple `assert` statements detected in test functions (lines 980, 984, 1025, 1026). Consider using pytest assertions or test framework methods for better error reporting.
      Code suggestion
       @@ -979,2 +979,2 @@
      -    assert (
      -        table.has_extra_cache_key_calls(cast(QueryObjectDict, query_obj))
      +    assert table.has_extra_cache_key_calls(cast(QueryObjectDict, query_obj)) == has_extra_cache_keys
  • superset/commands/explore/get.py - 1
    • Move typing import to type checking block · Line 40-40
      The import `BaseDatasourceData` from `superset.superset_typing` should be moved into a `TYPE_CHECKING` block since it's only used for type annotations. This improves runtime performance by avoiding unnecessary imports.
      Code suggestion
       @@ -19,6 +19,7 @@
        from abc import ABC
      -from typing import Any, cast, Optional
      +from typing import Any, cast, Optional, TYPE_CHECKING
       
        from flask import request
       @@ -38,7 +39,6 @@
        from superset.explore.permalink.exceptions import ExplorePermalinkGetFailedError
        from superset.extensions import security_manager
      -from superset.superset_typing import BaseDatasourceData
        from superset.utils import core as utils, json
        from superset.views.utils import (
            get_datasource_info,
       @@ -46,6 +46,9 @@
            sanitize_datasource_data,
        )
       
      +if TYPE_CHECKING:
      +    from superset.superset_typing import BaseDatasourceData
      +
        logger = logging.getLogger(__name__)
  • superset/views/utils.py - 2
    • Use modern union syntax for type annotations · Line 95-95
      Consider using the modern `X | Y` union syntax instead of `Union[X, Y]` for type annotations on line 95. This follows PEP 604 and improves readability.
      Code suggestion
       @@ -94,3 +94,3 @@
       def sanitize_datasource_data(
      -    datasource_data: Union[dict[str, Any], BaseDatasourceData, QueryData],
      +    datasource_data: dict[str, Any] | BaseDatasourceData | QueryData,
       ) -> dict[str, Any]:
    • Fix multi-line docstring summary formatting · Line 97-97
      The docstring summary should start on the first line instead of the second line. Move the summary text to line 97 for proper docstring formatting.
      Code suggestion
       @@ -96,5 +96,4 @@
       ) -> dict[str, Any]:
      -    """
      -    Sanitize datasource data by removing sensitive database parameters.
      +    """Sanitize datasource data by removing sensitive database parameters.
      
            Accepts TypedDict types (BaseDatasourceData, QueryData) or plain dicts.
Review Details
  • Files reviewed - 12 · Commit Range: be80712..be80712
    • superset/commands/explore/get.py
    • superset/common/query_object.py
    • superset/connectors/sqla/models.py
    • superset/jinja_context.py
    • superset/models/helpers.py
    • superset/models/sql_lab.py
    • superset/superset_typing.py
    • superset/utils/core.py
    • superset/views/core.py
    • superset/views/utils.py
    • superset/viz.py
    • tests/integration_tests/sqla_models_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

metric = cast(AdhocMetric, metric)
if metric["expressionType"] == AdhocMetricExpressionType.SIMPLE:
return cast(dict[str, Any], metric["column"])["column_name"]
column = metric["column"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add null check for metric column access

The original code directly accesses metric["column"]["column_name"] without checking if column is None, which can cause a TypeError in production when the column field is missing. The change adds a null check by extracting column into a variable and verifying it exists before accessing its properties, preventing runtime errors and ensuring graceful handling of None values.

Code suggestion
Check the AI-generated fix before applying
 -            return cast(dict[str, Any], metric["column"])["column_name"]
 +            column = metric["column"]
 +            if column:
 +                return column["column_name"]

Code Review Run #5bb146


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

@betodealmeida betodealmeida force-pushed the annotate-important-types branch 4 times, most recently from 637401c to b73f166 Compare November 6, 2025 21:48
@pull-request-size pull-request-size bot added size/XL and removed size/L labels Nov 6, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves type safety throughout the Superset codebase by introducing structured TypedDict definitions for commonly used dictionary types and replacing loose dict[str, Any] annotations with more precise types. The changes modernize type annotations using Python 3.10+ syntax (Union to |, Optional to | None) and add comprehensive documentation for the new TypedDict structures.

  • Introduces three new TypedDict classes (QueryObjectDict, BaseDatasourceData, QueryData) with extensive documentation
  • Updates type annotations across datasource models, visualization classes, and API endpoints to use the new TypedDicts
  • Removes redundant code that was overriding values already present in Query.data property
  • Adds defensive null checking in get_column_name_from_metric function

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
superset/superset_typing.py Defines new TypedDict classes (QueryObjectDict, BaseDatasourceData, QueryData) with comprehensive documentation; modernizes type aliases using Python 3.10+ syntax
superset/common/query_object.py Updates QueryObject.to_dict() return type to QueryObjectDict; adds explicit dict conversion in cache_key method
superset/connectors/sqla/models.py Updates BaseDatasource.data and SqlaTable.data return types to BaseDatasourceData; adds filtering logic for get_sqla_query parameters in get_extra_cache_keys
superset/models/helpers.py Updates get_extra_cache_keys signature to use QueryObjectDict; adds parameter filtering in get_query_str_extended
superset/models/sql_lab.py Updates Query.data return type to QueryData; updates get_extra_cache_keys signature
superset/views/core.py Updates datasource_data types to BaseDatasourceData or QueryData; removes redundant Query-specific column assignment since Query.data already includes columns
superset/views/utils.py Updates sanitize_datasource_data to accept BaseDatasourceData or QueryData types; adds documentation
superset/commands/explore/get.py Updates datasource_data type annotations to BaseDatasourceData or QueryData
superset/viz.py Updates cache_key method with explicit cast to dict[str, Any] for mutation operations
superset/utils/core.py Adds null check for column field in get_column_name_from_metric to prevent potential AttributeError
superset/jinja_context.py Updates dataset_macro query_obj type to QueryObjectDict with explicit Column list cast
tests/integration_tests/sqla_models_tests.py Adds cast to QueryObjectDict in test assertions to satisfy type checkers

Comment on lines 978 to 1003
sqla_query_keys = {
"apply_fetch_values_predicate",
"columns",
"extras",
"filter",
"from_dttm",
"granularity",
"groupby",
"inner_from_dttm",
"inner_to_dttm",
"is_rowcount",
"is_timeseries",
"metrics",
"orderby",
"order_desc",
"to_dttm",
"series_columns",
"series_limit",
"series_limit_metric",
"group_others_when_limit_reached",
"row_limit",
"row_offset",
"timeseries_limit",
"timeseries_limit_metric",
"time_shift",
}
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sqla_query_keys set is duplicated in both get_query_str_extended (here) and in superset/connectors/sqla/models.py in the SqlaTable.get_extra_cache_keys method (lines 1924-1949). Consider extracting this set to a module-level constant or a shared helper method to avoid maintaining two copies of the same parameter list.

Copilot uses AI. Check for mistakes.
Comment on lines 1924 to 1949
sqla_query_keys = {
"apply_fetch_values_predicate",
"columns",
"extras",
"filter",
"from_dttm",
"granularity",
"groupby",
"inner_from_dttm",
"inner_to_dttm",
"is_rowcount",
"is_timeseries",
"metrics",
"orderby",
"order_desc",
"to_dttm",
"series_columns",
"series_limit",
"series_limit_metric",
"group_others_when_limit_reached",
"row_limit",
"row_offset",
"timeseries_limit",
"timeseries_limit_metric",
"time_shift",
}
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sqla_query_keys set is duplicated in both get_extra_cache_keys (here) and in superset/models/helpers.py in the BaseDatasource.get_query_str_extended method (lines 978-1003). Consider extracting this set to a module-level constant or a shared helper method to avoid maintaining two copies of the same parameter list.

Copilot uses AI. Check for mistakes.
for column_ in data["columns"]:
generic_type = column_.get("type_generic")
for column_ in cast(list[dict[str, Any]], data["columns"]): # type: ignore[assignment]
column_dict = cast(dict[str, Any], column_)
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast on this line is redundant since column_ is already typed as dict[str, Any] from the cast on line 480. The variable can be used directly without the additional cast.

Suggested change
column_dict = cast(dict[str, Any], column_)
column_dict = column_

Copilot uses AI. Check for mistakes.
Comment on lines +1924 to +1937
# Filter out keys that aren't parameters to get_sqla_query
filtered_query_obj = {
k: v for k, v in query_obj.items() if k in SQLA_QUERY_KEYS
}
sqla_query = self.get_sqla_query(**cast(Any, filtered_query_obj))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any guardrails or warning if there is a new query_obj.items() but they didn't add it to SQLA_QUERY_KEYS? Currently this would just drop the key, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if someone adds a new parameter to get_sqla_query but forgets to add it to SQLA_QUERY_KEYS then it won't be passed. I assume they would catch the error when testing their changes, because otherwise their changes will be no-op.


# Additional fields used throughout the codebase
time_range: str | None
datasource: Any # BaseDatasource instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is supposed to be a BaseDatasource instance, why are we typing it as Any? does it fail if we don't use Any?

betodealmeida and others added 2 commits November 17, 2025 10:19
Extract the duplicated sqla_query_keys set to a shared constant SQLA_QUERY_KEYS
to avoid maintaining two identical copies in helpers.py and sqla/models.py.

This addresses code review feedback to reduce code duplication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@betodealmeida betodealmeida force-pushed the annotate-important-types branch from aac7a46 to bd53e14 Compare November 17, 2025 15:20
@betodealmeida betodealmeida force-pushed the annotate-important-types branch from 91feaaf to 2550123 Compare November 17, 2025 17:05
@betodealmeida betodealmeida merged commit fb7d0e0 into master Nov 17, 2025
69 of 70 checks passed
@betodealmeida betodealmeida deleted the annotate-important-types branch November 17, 2025 19:41
kshi020302 pushed a commit to jl141/superset that referenced this pull request Nov 30, 2025
Co-authored-by: Claude <noreply@anthropic.com>
sadpandajoe pushed a commit that referenced this pull request Dec 4, 2025
Co-authored-by: Claude <noreply@anthropic.com>
(cherry picked from commit fb7d0e0)
sadpandajoe pushed a commit that referenced this pull request Dec 4, 2025
Co-authored-by: Claude <noreply@anthropic.com>
(cherry picked from commit fb7d0e0)
Facyla pushed a commit to Facyla/superset-contrib that referenced this pull request Dec 16, 2025
Co-authored-by: Claude <noreply@anthropic.com>
sadpandajoe pushed a commit that referenced this pull request Dec 16, 2025
Co-authored-by: Claude <noreply@anthropic.com>
(cherry picked from commit fb7d0e0)
aminghadersohi pushed a commit to aminghadersohi/superset that referenced this pull request Jan 17, 2026
Co-authored-by: Claude <noreply@anthropic.com>
(cherry picked from commit fb7d0e0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Related to the REST API preset-io size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants