Skip to content

fix: use pd.to_numeric in df_metrics_to_num to handle string-encoded numerics from ClickHouse#40190

Open
Cloud-Architect-Emma wants to merge 1 commit into
apache:masterfrom
Cloud-Architect-Emma:fix/clickhouse-numeric-conversion
Open

fix: use pd.to_numeric in df_metrics_to_num to handle string-encoded numerics from ClickHouse#40190
Cloud-Architect-Emma wants to merge 1 commit into
apache:masterfrom
Cloud-Architect-Emma:fix/clickhouse-numeric-conversion

Conversation

@Cloud-Architect-Emma
Copy link
Copy Markdown
Contributor

Fixes #39951

Problem

ClickHouse (and some other backends) return numeric SUM() columns as
strings. The existing infer_objects() call does not convert these to
numeric types, causing pandas/core/nanops.py to raise:
TypeError: Could not convert string '655' to numeric

Fix

Replace infer_objects() with pd.to_numeric(errors="ignore") which
correctly converts string-encoded numerics while leaving truly
non-numeric columns unchanged.

Testing

The existing df_metrics_to_num tests cover this path.

@dosubot dosubot Bot added change:backend Requires changing the backend data:connect:clickhouse Related to Clickhouse labels May 16, 2026
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 16, 2026

Code Review Agent Run #27c151

Actionable Suggestions - 0
Additional Suggestions - 1
  • superset/common/utils/dataframe_utils.py - 1
    • Error handling inconsistency · Line 60-60
      The `errors="ignore"` parameter keeps original values on conversion failure, while the equivalent code in `viz.py` uses `errors="coerce"` (converts failures to NaN). If the intent is to silently keep original values on failure (matching the comment "will stay as strings if conversion fails"), this is correct. Otherwise, align the error handling strategy between both files for consistent metric conversion behavior.
Review Details
  • Files reviewed - 1 · Commit Range: 7e3cb89..7e3cb89
    • superset/common/utils/dataframe_utils.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@Cloud-Architect-Emma
Copy link
Copy Markdown
Contributor Author

Hi @konmitya, I've opened a PR to fix this: #40190

The root cause is that ClickHouse returns numeric SUM() columns as
strings, and the existing infer_objects() call doesn't convert them.
The fix replaces it with pd.to_numeric(errors="ignore") which
correctly handles string-encoded numerics.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 16, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 56b724d
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a083b354262be000858068f
😎 Deploy Preview https://deploy-preview-40190--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 16, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 63.73%. Comparing base (d62f154) to head (ebdcd9c).
⚠️ Report is 509 commits behind head on master.

Files with missing lines Patch % Lines
superset/common/utils/dataframe_utils.py 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40190      +/-   ##
==========================================
- Coverage   64.58%   63.73%   -0.86%     
==========================================
  Files        2564     2586      +22     
  Lines      133576   137510    +3934     
  Branches    31033    31631     +598     
==========================================
+ Hits        86271    87636    +1365     
- Misses      45813    48338    +2525     
- Partials     1492     1536      +44     
Flag Coverage Δ
hive 39.24% <0.00%> (-0.64%) ⬇️
mysql 58.76% <66.66%> (-1.68%) ⬇️
postgres 58.84% <0.00%> (-1.69%) ⬇️
presto 40.92% <0.00%> (-0.74%) ⬇️
python 60.40% <66.66%> (-1.71%) ⬇️
sqlite 58.48% <0.00%> (-1.69%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Cloud-Architect-Emma Cloud-Architect-Emma force-pushed the fix/clickhouse-numeric-conversion branch from 56b724d to 7e3cb89 Compare May 16, 2026 09:45
Comment thread superset/common/utils/dataframe_utils.py Outdated
@bito-code-review
Copy link
Copy Markdown
Contributor

The flagged issue is valid. Using errors='ignore' in pd.to_numeric leaves the column as an object type if any value is non-numeric, which can cause downstream numeric operations to fail. The suggested fix is to use errors='coerce' instead, which converts invalid entries to NaN and ensures the column becomes numeric. Here's the fix for the specific line in dataframe_utils.py:

superset/common/utils/dataframe_utils.py

df[col] = pd.to_numeric(df[col], errors='coerce')

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 16, 2026

Code Review Agent Run #8c3520

Actionable Suggestions - 0
Additional Suggestions - 1
  • superset/common/utils/dataframe_utils.py - 1
    • Inconsistent error handling strategy · Line 60-60
      The error handling strategy (`errors="ignore"`) is inconsistent with the deprecated equivalent `BaseViz.df_metrics_to_num` at `viz.py:317` which uses `errors="coerce"`. Both functions share the same name, docstring, and purpose. Using `"coerce"` converts unconvertible values to NaN (matching `viz.py`), while `"ignore"` preserves the original value. Align these to prevent silently divergent behavior when conversion fails on malformed metric data.
      Code suggestion
      --- superset/common/utils/dataframe_utils.py
      +++ superset/common/utils/dataframe_utils.py
       @@ -57,6 +57,6 @@ def df_metrics_to_num(df: pd.DataFrame, query_object: QueryObject) -> None:
                if dtype.type == np.object_ and col in query_object.metric_names:
                    # soft-convert a metric column to numeric
                    # will stay as strings if conversion fails
      -            df[col] = pd.to_numeric(df[col], errors="ignore")
      +            df[col] = pd.to_numeric(df[col], errors="coerce")
Review Details
  • Files reviewed - 1 · Commit Range: 7e3cb89..7e3cb89
    • superset/common/utils/dataframe_utils.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@Cloud-Architect-Emma Cloud-Architect-Emma force-pushed the fix/clickhouse-numeric-conversion branch from 7e3cb89 to dd2a0d2 Compare May 16, 2026 13:16
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 16, 2026

Code Review Agent Run #bbb707

Actionable Suggestions - 0
Additional Suggestions - 1
  • superset/common/utils/dataframe_utils.py - 1
    • Docstring mismatch with behavior · Line 60-60
      The docstring and comment claim this function 'Converting metrics to numeric' with 'will stay as strings if conversion fails', but the old code using `infer_objects()` did NOT convert string columns to numeric. Verified locally: `infer_objects()` leaves string numeric values as strings (object dtype), while `pd.to_numeric()` correctly converts to float64. The comment implies the method tries to convert but fails gracefully — in reality, `infer_objects()` simply doesn't attempt string-to-numeric conversion at all. Update comment to: '# convert a metric column to numeric (NaN if conversion fails)'
Review Details
  • Files reviewed - 1 · Commit Range: dd2a0d2..dd2a0d2
    • superset/common/utils/dataframe_utils.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@Cloud-Architect-Emma
Copy link
Copy Markdown
Contributor Author

Hi @hainenber, all checks are passing and the fix has been updated
to use errors="coerce" as suggested by the code review bots.
Could you take a look when you get a chance?

Thank you!

Copy link
Copy Markdown
Contributor

@hainenber hainenber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix is a bit broad and can potentially affect other DB sources as well. Is there a way to limit it to ClickHouse instead?

@Cloud-Architect-Emma
Copy link
Copy Markdown
Contributor Author

Hi @hainenber, updated the fix to be safer, now it only applies
pd.to_numeric when all non-null values in the column are successfully
convert to numeric (i.e. the null pattern is preserved). This way
it only affects columns where the backend returned numeric values
as strings (like ClickHouse SUM()), and leaves truly mixed/non-numeric
columns unchanged.

@Cloud-Architect-Emma Cloud-Architect-Emma force-pushed the fix/clickhouse-numeric-conversion branch from dd2a0d2 to 4379aea Compare May 19, 2026 02:53
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 20, 2026

Code Review Agent Run #6844e8

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 4379aea..4379aea
    • superset/common/utils/dataframe_utils.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@Cloud-Architect-Emma
Copy link
Copy Markdown
Contributor Author

Hi @hainenber, I've updated the fix based on your feedback. The implementation now only converts a column when ALL non-null values successfully convert to numeric, making it safe for mixed-type columns from any backend:

converted = pd.to_numeric(df[col], errors="coerce")
if converted.notna().eq(df[col].notna()).all():
    df[col] = converted

This is safer than the original infer_objects() and scoped by behavior rather than by database backend, it only affects columns where the backend returned genuinely numeric values as strings (like ClickHouse SUM()). Could you re-review? Thank you!

Copy link
Copy Markdown
Contributor

@hainenber hainenber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a unit test to prevent regression since this is quite impactful.

@Cloud-Architect-Emma Cloud-Architect-Emma force-pushed the fix/clickhouse-numeric-conversion branch from 4379aea to a650c9f Compare May 23, 2026 10:14
@pull-request-size pull-request-size Bot added size/M and removed size/S labels May 23, 2026
@Cloud-Architect-Emma
Copy link
Copy Markdown
Contributor Author

Hi @hainenber, added a unit test to tests/unit_tests/common/test_dataframe_utils.py
that covers:

  • String-encoded numeric columns (e.g. ClickHouse SUM()) → converted to numeric
  • Mixed columns with non-numeric values → NOT converted
  • Non-numeric text columns → NOT converted
  • Non-metric columns → NOT touched

Could you re-review? Thank you!

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 23, 2026

Code Review Agent Run #fd79c1

Actionable Suggestions - 0
Additional Suggestions - 1
  • tests/unit_tests/common/test_dataframe_utils.py - 1
    • SEMANTIC_DEAD_CODE: Unused import · Line 55-55
      Remove unused `numpy as np` import from line 55. The test function does not reference `np` anywhere, making this import dead code that adds maintenance overhead and potential confusion.
      Code suggestion
      --- a/tests/unit_tests/common/test_dataframe_utils.py
      +++ b/tests/unit_tests/common/test_dataframe_utils.py
       @@ -52,7 +52,6 @@
        def test_df_metrics_to_num_converts_string_numerics():
            """Test that string-encoded numeric columns (e.g. from ClickHouse) are converted."""
            from unittest.mock import MagicMock
      -    import numpy as np
Review Details
  • Files reviewed - 2 · Commit Range: a650c9f..a650c9f
    • superset/common/utils/dataframe_utils.py
    • tests/unit_tests/common/test_dataframe_utils.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

hainenber
hainenber previously approved these changes May 24, 2026
Comment thread tests/unit_tests/common/test_dataframe_utils.py Outdated
@hainenber hainenber dismissed their stale review May 24, 2026 08:53

there are some changes needed

@Cloud-Architect-Emma Cloud-Architect-Emma force-pushed the fix/clickhouse-numeric-conversion branch from a650c9f to e6c1532 Compare May 24, 2026 13:15
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 24, 2026

Code Review Agent Run #1b821b

Actionable Suggestions - 0
Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

  • tests/unit_tests/common/test_dataframe_utils.py - 1
    • CWE-arness dtype conversion in Pandas 3.0 · Line 60-63
Review Details
  • Files reviewed - 2 · Commit Range: e6c1532..e6c1532
    • superset/common/utils/dataframe_utils.py
    • tests/unit_tests/common/test_dataframe_utils.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@Cloud-Architect-Emma Cloud-Architect-Emma force-pushed the fix/clickhouse-numeric-conversion branch 3 times, most recently from 48435ee to 12c9a4a Compare May 24, 2026 18:03
@Cloud-Architect-Emma
Copy link
Copy Markdown
Contributor Author

Hi @hainenber, the only failing check is Python-Integration / test-postgres (next)
which is failing due to a pre-existing flaky test in
tests/integration_tests/queries/saved_queries/api_tests.py::TestSavedQueryApi::test_get_saved_query
with AssertionError: assert 'a second ago' == 'now', a timing race condition
completely unrelated to this PR.

All other required checks are passing. Could you re-review and approve?

Thank you!

Copy link
Copy Markdown
Contributor

@bito-code-review bito-code-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #fc4dc1

Actionable Suggestions - 1
  • tests/unit_tests/common/test_dataframe_utils.py - 1
Review Details
  • Files reviewed - 2 · Commit Range: 12c9a4a..12c9a4a
    • superset/common/utils/dataframe_utils.py
    • tests/unit_tests/common/test_dataframe_utils.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment thread tests/unit_tests/common/test_dataframe_utils.py Outdated
Fixes apache#39951

The exclude_mount_points patterns used glob syntax (e.g. /dev/*) with
match_type: regexp. Replace with properly anchored regexp patterns.

Co-authored-by: Đỗ Trọng Hải <41283691+hainenber@users.noreply.github.com>
@Cloud-Architect-Emma Cloud-Architect-Emma force-pushed the fix/clickhouse-numeric-conversion branch from 12c9a4a to ebdcd9c Compare May 25, 2026 08:51
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 25, 2026

Code Review Agent Run #86e44d

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: ebdcd9c..ebdcd9c
    • superset/common/utils/dataframe_utils.py
    • tests/unit_tests/common/test_dataframe_utils.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend data:connect:clickhouse Related to Clickhouse size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Superset 6.0.0 bug: Could not convert string '849' to numeric

2 participants