Skip to content

fix(mixed-timeseries): preserve all-NaN metric columns after pivot when Jinja evaluates to NULL#40004

Closed
aminghadersohi wants to merge 1 commit into
apache:masterfrom
aminghadersohi:bug-100398
Closed

fix(mixed-timeseries): preserve all-NaN metric columns after pivot when Jinja evaluates to NULL#40004
aminghadersohi wants to merge 1 commit into
apache:masterfrom
aminghadersohi:bug-100398

Conversation

@aminghadersohi
Copy link
Copy Markdown
Contributor

SUMMARY

When a Jinja conditional metric expression (e.g. {% if filter_values('x') %}SUM(col){% else %}null{% endif %}) evaluates to null because no dashboard filter is selected, the SQL query returns an all-NULL column for that metric. pandas pivot_table with dropna=True (the default, controlled by drop_missing_columns: !show_empty_columns in pivotOperator.ts) then silently drops that column from the pivot result.

Downstream post-processing steps — rename and rolling — use validate_column_args to assert that all referenced columns exist before executing. Because the metric column was dropped, they raise:

InvalidPostProcessingError("Referenced columns not available in DataFrame.")

This surfaces as an error on mixed-timeseries charts whenever rename (triggered by groupby + truncate_metric) or rolling is configured.

Root cause chain:

  1. Jinja template with filter_values() evaluates to null when no filter selected
  2. SQL returns all-NULL column for the metric
  3. pivot_table(dropna=True) drops the all-NaN column
  4. rename / rolling post-processing fails because column is missing

Fix: Introduce _restore_dropped_metric_columns() in pivot.py which re-adds any expected metric columns that pivot_table dropped due to all-NaN values. This preserves the expected DataFrame schema for downstream post-processing, while still dropping sparse category combinations (the intended behavior of drop_missing_columns).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before: Mixed-timeseries chart with Jinja SQL metric and no active dashboard filter shows error "Referenced columns not available in DataFrame."

After: Chart renders correctly (empty series since metric evaluates to NULL, but no error).

TESTING INSTRUCTIONS

  1. Create a mixed-timeseries chart with a SQL-type custom metric using a Jinja conditional:
    {% if filter_values('my_column') %}
      SUM(CASE WHEN my_column IN {{ filter_values('my_column') | where_in }} THEN value ELSE 0 END)
    {% else %}
      null
    {% endif %}
  2. View the chart without any dashboard filter selected
  3. Before fix: chart shows "Referenced columns not available in DataFrame" error
  4. After fix: chart renders without error (no data series shown for the NULL metric)

ADDITIONAL INFORMATION

  • Has associated issue: SC-100398
  • Required feature flags:
  • Changes UI
  • Includes DB Migration
  • Introduces new feature or API
  • Removes existing feature or API

Regression tests added in tests/unit_tests/pandas_postprocessing/test_pivot.py:

  • test_pivot_preserves_all_nan_metric_flat: flat pivot (no groupby) — metric column restored as all-NaN
  • test_pivot_preserves_all_nan_metric_with_columns: MultiIndex pivot (with groupby) — metric level preserved at level 0

…en Jinja evaluates to NULL

When a Jinja conditional metric expression (e.g. using filter_values()) evaluates
to null/NULL because no dashboard filter is selected, the SQL query returns an
all-NULL column. pandas pivot_table with dropna=True (controlled by
drop_missing_columns, default True) then silently drops that column.

Downstream post-processing steps like rename and rolling use validate_column_args
to assert referenced columns exist before executing. Because the metric column was
dropped, they raise InvalidPostProcessingError("Referenced columns not available
in DataFrame"), which surfaces as an error on mixed-timeseries charts.

Fix: introduce _restore_dropped_metric_columns() which re-adds any expected metric
columns that pivot_table dropped due to all-NaN values. This keeps the DataFrame
schema consistent with what the frontend and subsequent post-processing expect,
while still allowing sparse category combinations to be dropped.

Fixes SC-100398

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 11, 2026

Code Review Agent Run #fd63b5

Actionable Suggestions - 0
Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

  • tests/unit_tests/pandas_postprocessing/test_pivot.py - 2
Review Details
  • Files reviewed - 2 · Commit Range: 4da9c03..4da9c03
    • superset/utils/pandas_postprocessing/pivot.py
    • tests/unit_tests/pandas_postprocessing/test_pivot.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot Bot added change:backend Requires changing the backend global:jinja Related to Jinja templating viz:charts:timeseries Related to Timeseries labels May 11, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 37.50000% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.83%. Comparing base (f67dd4a) to head (4da9c03).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
superset/utils/pandas_postprocessing/pivot.py 37.50% 7 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40004      +/-   ##
==========================================
- Coverage   63.83%   63.83%   -0.01%     
==========================================
  Files        2589     2589              
  Lines      137821   137837      +16     
  Branches    31928    31935       +7     
==========================================
+ Hits        87978    87984       +6     
- Misses      48327    48334       +7     
- Partials     1516     1519       +3     
Flag Coverage Δ
hive 39.36% <12.50%> (-0.01%) ⬇️
mysql 59.00% <37.50%> (-0.01%) ⬇️
postgres 59.08% <37.50%> (-0.01%) ⬇️
presto 41.05% <12.50%> (-0.01%) ⬇️
python 60.52% <37.50%> (-0.01%) ⬇️
sqlite 58.72% <37.50%> (-0.01%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aminghadersohi
Copy link
Copy Markdown
Contributor Author

Closing in favour of PR from correct fork.

Comment on lines +45 to +52
category_values = (
df.columns.get_level_values(-1).unique()
if len(df.columns) > 0
else [None]
)
for metric in missing:
for cat in category_values:
df[(metric, cat)] = float("nan")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The restoration logic for MultiIndex pivots only uses the last column level (get_level_values(-1)) and then writes 2-tuples ((metric, cat)), which is incorrect when the pivot has more than one columns dimension. For pivots like columns=["a","b"], this creates malformed/incomplete column keys (or can raise when tuple depth does not match), so all-NaN metrics are not restored correctly and downstream operations can still fail. Build restored keys using the full non-metric tuple for every existing column combination (all levels after metric), not just the last level. [logic error]

Severity Level: Major ⚠️
- ❌ Multi-column pivot post-processing can crash during pivot().
- ❌ All-NaN metrics not restored for multi-level pivots.
- ⚠️ Mixed-timeseries charts using multi-column pivots may fail.
- ⚠️ Rolling/rename post-processing fails for affected metrics.
Steps of Reproduction ✅
1. Configure a query with post-processing in `superset/common/query_object.py:500-534` so
that `exec_post_processing()` includes a pivot operation with `operation: "pivot"` and
`options` containing `columns=["a", "b"]`, `drop_missing_columns=True` (default), and
multiple metrics (e.g. `"metric": {"operator": "mean"}` and `"metric2": {"operator":
"mean"}`).

2. Ensure the database result feeding this query contains multiple grouping columns `a`
and `b` and that, for at least one metric (e.g. `"metric"`), all values are NULL/NaN for
every row; this matches the all-NaN metric scenario already covered for single-column
pivots in
`tests/unit_tests/pandas_postprocessing/test_pivot.py:test_pivot_preserves_all_nan_metric_flat`
(lines 70-92), but extended here to `columns=["a", "b"]` as in
`test_pivot_eliminate_cartesian_product_columns` (lines 16-66).

3. When `exec_post_processing()` calls `pandas_postprocessing.pivot()` (implemented in
`superset/utils/pandas_postprocessing/pivot.py:60-142`) with `columns=["a", "b"]`,
`df.pivot_table(...)` (lines 122-131) produces a MultiIndex on `df.columns` where level 0
is the metric name and subsequent levels correspond to each `columns` entry (`a`, then
`b`). Because one metric is entirely NaN and `dropna=drop_missing_columns=True`, pandas
drops that metric's columns completely.

4. `_restore_dropped_metric_columns()` (lines 31-57 in `pivot.py`) then runs for this
MultiIndex DataFrame. In the MultiIndex branch (lines 41-52), it collects only the last
column level via `df.columns.get_level_values(-1).unique()` (line 45) and, for each
missing metric, assigns new columns using 2‑tuples `df[(metric, cat)] = float("nan")`
(line 52). For a MultiIndex with more than two levels (metric + `a` + `b`), this key has
the wrong depth: on current pandas versions this either (a) raises a pandas error when
attempting to insert a column with a tuple shorter than `df.columns.nlevels`, or (b)
creates malformed/incomplete column labels that don't match the original `(metric, a, b)`
shape. In either case, the all-NaN metric is not properly restored for every `(a, b)`
combination, so subsequent post-processing such as `rename` and `rolling` (both decorated
with `validate_column_args` in `superset/utils/pandas_postprocessing/utils.py:114-131`)
can still fail with `InvalidPostProcessingError("Referenced columns not available in
DataFrame.")` when they reference that metric.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/utils/pandas_postprocessing/pivot.py
**Line:** 45:52
**Comment:**
	*Logic Error: The restoration logic for MultiIndex pivots only uses the last column level (`get_level_values(-1)`) and then writes 2-tuples (`(metric, cat)`), which is incorrect when the pivot has more than one `columns` dimension. For pivots like `columns=["a","b"]`, this creates malformed/incomplete column keys (or can raise when tuple depth does not match), so all-NaN metrics are not restored correctly and downstream operations can still fail. Build restored keys using the full non-metric tuple for every existing column combination (all levels after metric), not just the last level.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@bito-code-review
Copy link
Copy Markdown
Contributor

The flagged issue is correct: the restoration logic incorrectly uses only the last column level for MultiIndex pivots, creating tuples with wrong depth for multi-column pivots. To resolve, modify the code to collect all unique combinations of non-metric levels and build full tuples. No other comments found in the PR.

superset/utils/pandas_postprocessing/pivot.py

if isinstance(df.columns, pd.MultiIndex):
        existing_metrics = set(df.columns.get_level_values(0))
        missing = [m for m in expected_metrics if m not in existing_metrics]
        if missing:
            if df.columns.nlevels > 1:
                unique_combos = df.columns.droplevel(0).unique()
                for metric in missing:
                    for combo in unique_combos:
                        df[(metric,) + combo] = float("nan")
            else:
                # Fallback for single level, though unlikely
                pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend global:jinja Related to Jinja templating size/M viz:charts:timeseries Related to Timeseries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant