fix: allow comma typing in D3 format for Table chart#37740
fix: allow comma typing in D3 format for Table chart#37740RamiNoodle733 wants to merge 2 commits intoapache:masterfrom
Conversation
Add .copy() after dropna() to prevent pandas SettingWithCopyWarning when modifying the DataFrame column. Fixes apache#36530
Add commaChoosesOption: false to d3NumberFormat and d3TimeFormat controls in ColumnConfigControl to prevent comma from triggering option selection. Fixes apache#37038
Code Review Agent Run #b3d15fActionable Suggestions - 0Additional Suggestions - 2
Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
|
||
| # drop empty values from the target column | ||
| df = df.dropna(subset=[column]) | ||
| df = df.dropna(subset=[column]).copy() |
There was a problem hiding this comment.
Suggestion: Dropping NaN rows from the entire DataFrame before grouping removes groups that only contain null values in the target column, so those groups will disappear from the histogram output instead of being represented with zero counts, which breaks the documented "each row corresponds to a group" contract. [logic error]
Severity Level: Major ⚠️
- ❌ superset/utils/pandas_postprocessing/histogram.histogram() omits groups.
- ⚠️ Grouped histogram outputs have missing group rows.
- ⚠️ Downstream consumers receive fewer rows than expected.| df = df.dropna(subset=[column]).copy() | |
| # NaN values in the target column are dropped later in hist_values(), so keep all rows here |
Steps of Reproduction ✅
1. Create a DataFrame file and call the function `histogram()` defined in
`superset/utils/pandas_postprocessing/histogram.py` (function signature at lines 24-31).
2. Prepare input DataFrame where one group has only nulls, e.g.:
- DataFrame with columns `group` and `value` where group "B" rows exist but `value` is
NaN for all "B" rows.
3. Call histogram(df, column="value", groupby=["group"]) -> execution reaches the line
that pre-filters nulls at line 52: `df = df.dropna(subset=[column]).copy()`.
4. Rows for group "B" are removed by that dropna at line 52, so later grouping (code paths
at lines 81-86: the df.groupby(...) branch) never sees group "B" and no output row for "B"
is produced. The helper `hist_values` (lines 71-74) which drops NaNs inside a group's
Series is never given a chance to generate a zero-count vector for an absent group.
Explanation: The current code intentionally removes rows with NaN at line 52; that causes
groups whose only rows had NaN in the target column to disappear instead of appearing with
zero-count bins.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/utils/pandas_postprocessing/histogram.py
**Line:** 52:52
**Comment:**
*Logic Error: Dropping NaN rows from the entire DataFrame before grouping removes groups that only contain null values in the target column, so those groups will disappear from the histogram output instead of being represented with zero counts, which breaks the documented "each row corresponds to a group" contract.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
closing since there are multiple PRs for this fix. It looks like the same pr may have been opened multiple times. |
Add commaChoosesOption: false to d3NumberFormat and d3TimeFormat controls in ColumnConfigControl to prevent comma from triggering option selection.
Fixes #37038