chore(deps): bump pandas >=2.0 #24705

sebastianliebscher · 2023-07-15T10:33:11Z

SUMMARY

In Pandas 2.0 installing the optional but recommended performance dependencies is now possible via pip extras, reference. These deps are also available in 1.5.3 but with the help of pip extras and pip-compile-multi, it should be much cleaner and more obvious as why these deps are installed. I think pandas[performance] should be installed in a separate follow-up PR.
This PR bumps dependency pandas from 1.5.3 to latest stable version 2.0.3 to be able to add these optional dependencies.

With Pandas 2.0 we could also potentially

leverage new pandas features
mitigate bugs
improve performance

Changes in 2.0: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html

TESTING INSTRUCTIONS

scripts/tests/run.sh --module tests/integration_tests/
pytest ./tests/unit_tests/

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

- https://pandas.pydata.org/docs/whatsnew/v1.5.0.html#other-deprecations

- https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes

sebastianliebscher · 2023-07-15T10:37:33Z

superset/common/query_context_processor.py

@@ -570,7 +568,7 @@ def get_data(self, df: pd.DataFrame) -> str | list[dict[str, Any]]:
                    df, index=include_index, **config["CSV_EXPORT"]
                )
            elif self._query_context.result_format == ChartDataResultFormat.XLSX:
-                result = excel.df_to_excel(df, **config["EXCEL_EXPORT"])
+                result = excel.df_to_excel(df)


keyword arg "encoding" is deprecated since 1.5.0 https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.DataFrame.to_excel.html

sebastianliebscher · 2023-07-15T10:39:50Z

superset/reports/notifications/slack.py

@@ -121,17 +122,19 @@ def _get_body(self) -> str:
        # need to truncate the data
        for i in range(len(df) - 1):
            truncated_df = df[: i + 1].fillna("")
-            truncated_df = truncated_df.append(


df.append deprecated since 1.4 https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.DataFrame.append.html

codecov · 2023-07-15T10:40:32Z

Codecov Report

Merging #24705 (4f54588) into master (0328dd2) will decrease coverage by 0.08%.
The diff coverage is 63.63%.

❗ Current head 4f54588 differs from pull request most recent head 07360b0. Consider uploading reports for the commit 07360b0 to get more accurate results

@@            Coverage Diff             @@
##           master   #24705      +/-   ##
==========================================
- Coverage   68.97%   68.89%   -0.08%     
==========================================
  Files        1901     1901              
  Lines       74008    73927      -81     
  Branches     8183     8183              
==========================================
- Hits        51047    50932     -115     
- Misses      20840    20874      +34     
  Partials     2121     2121

Flag	Coverage Δ
hive	`54.16% <28.57%> (+0.04%)`	⬆️
mysql	`79.21% <63.63%> (-0.15%)`	⬇️
postgres	`79.29% <63.63%> (-0.15%)`	⬇️
presto	`54.06% <28.57%> (+0.04%)`	⬆️
python	`83.31% <63.63%> (-0.14%)`	⬇️
sqlite	`77.88% <63.63%> (-0.15%)`	⬇️
unit	`54.88% <27.27%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...rontend/src/filters/components/Range/buildQuery.ts	`0.00% <ø> (ø)`
superset/examples/birth_names.py	`69.69% <ø> (ø)`
superset/views/database/views.py	`92.68% <ø> (ø)`
superset/common/query_context_processor.py	`85.08% <10.71%> (-4.88%)`	⬇️
superset/reports/notifications/slack.py	`90.58% <60.00%> (-0.88%)`	⬇️
superset/charts/commands/warm_up_cache.py	`98.14% <96.00%> (+0.64%)`	⬆️
superset/charts/commands/export.py	`94.11% <100.00%> (ø)`
superset/config.py	`92.17% <100.00%> (ø)`
superset/row_level_security/schemas.py	`100.00% <100.00%> (ø)`
superset/views/core.py	`69.60% <100.00%> (-0.75%)`	⬇️
... and 1 more

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

sebastianliebscher · 2023-07-15T10:41:14Z

superset/views/database/views.py

@@ -201,7 +201,6 @@ def form_post(self, form: CsvToDatabaseForm) -> Response:
                    infer_datetime_format=form.infer_datetime_format.data,
                    iterator=True,
                    keep_default_na=not form.null_values.data,
-                    mangle_dupe_cols=form.overwrite_duplicate.data,


deprecated since 1.5 https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.read_csv.html

sebastianliebscher · 2023-07-15T10:42:27Z

superset/viz.py

@@ -2849,7 +2849,7 @@ def levels_for(
        for i in range(0, len(groups) + 1):
            agg_df = df.groupby(groups[:i]) if i else df
            levels[i] = (
-                agg_df.mean()
+                agg_df.mean(numeric_only=True)


new default in 2.0 is False https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes

EugeneTorap · 2023-07-15T15:52:19Z

LGTM! Thanks for this nice PR!

john-bodley · 2023-07-18T17:25:10Z

@sebastianliebscher would you mind updating the PR description to include the reason for bumping said package as this would help provide more context when reviewing? Is it purely from a code maintenance/hygiene perspective or is this change a precursor for something else?

john-bodley · 2023-07-18T17:32:21Z

superset/config.py

-# Excel Options: key/value pairs that will be passed as argument to DataFrame.to_excel
-# method.
-# note: index option should not be overridden
-EXCEL_EXPORT = {"encoding": "utf-8"}


The EXCEL_EXPORT could include additional keyword arguments other than encoding. The challenge is if any of the other DataFrame.to_excel arguments changed, then this PR would be deemed a breaking change and thus we would need to punt on this PR until Superset 4.0.

Only encoding and verbose have been removed in 2.0. Both arguments were never used. The other args still have the same default.

I will leave EXCEL_EXPORT as an empty variable, so users can still set their custom overrides.

Thank you for your feedback!

sebastianliebscher · 2023-07-19T12:50:29Z

@john-bodley PR summary updated

john-bodley · 2023-07-19T22:41:39Z

superset/common/query_context_processor.py

@@ -134,17 +134,15 @@ def get_df_payload(

        if query_obj and cache_key and not cache.is_loaded:
            try:
-                invalid_columns = [
+                if invalid_columns := [


Love the walrus.

john-bodley · 2023-07-19T22:43:53Z

tests/unit_tests/pandas_postprocessing/test_rolling.py

@@ -162,8 +162,8 @@ def test_rolling_after_pivot_with_single_metric():
        pd.DataFrame(
            data={
                "dttm": pd.to_datetime(["2019-01-01", "2019-01-02"]),
-                FLAT_COLUMN_SEPARATOR.join(["sum_metric", "UK"]): [5.0, 12.0],
-                FLAT_COLUMN_SEPARATOR.join(["sum_metric", "US"]): [6.0, 14.0],
+                FLAT_COLUMN_SEPARATOR.join(["sum_metric", "UK"]): [5, 12],


NIce! I'm glad to see integers remain integers when summed. Would you mind updating the comment above as it still references floats.

EugeneTorap · 2023-07-20T17:59:05Z

Hey @john-bodley @rusackas! Can we merge the PR?

Co-authored-by: EugeneTorap <evgenykrutpro@gmail.com>

sebastianliebscher added 7 commits July 14, 2023 14:50

feat(deps): bump pandas

69daeac

remove deprecated mangle_dupe_cols

5f93d39

- https://pandas.pydata.org/docs/whatsnew/v1.5.0.html#other-deprecations

remove deprecated mangle_dupe_cols

71c991a

- https://pandas.pydata.org/docs/whatsnew/v1.5.0.html#other-deprecations

explicitly set numeric_only

15b3cec

- https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes

to_excel and pandas.concat

5ffbf40

update requirements

f93ccb9

update requirements

5bd5a2c

pull-request-size bot added the size/M label Jul 15, 2023

sebastianliebscher commented Jul 15, 2023

View reviewed changes

Merge branch 'apache:master' into bump_pandas

b5f71b3

john-bodley self-requested a review July 18, 2023 16:48

john-bodley reviewed Jul 18, 2023

View reviewed changes

re-enable EXCEL_EXPORT

a09d22d

john-bodley reviewed Jul 19, 2023

View reviewed changes

update comment examples

07360b0

john-bodley approved these changes Jul 20, 2023

View reviewed changes

john-bodley merged commit 91e6f5c into apache:master Jul 20, 2023

EugeneTorap deleted the bump_pandas branch July 20, 2023 18:04

sebastianliebscher mentioned this pull request Jul 21, 2023

feat: add pandas performance dependencies #24768

Merged

9 tasks

michael-s-molina added v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch and removed v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch labels Jul 24, 2023

gnought mentioned this pull request Nov 17, 2023

chore: cleanup unused code in pandas 2.0+ #26013

Merged

9 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024

vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024

chore(deps): bump pandas >=2.0 (apache#24705)

d6727a6

Co-authored-by: EugeneTorap <evgenykrutpro@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(deps): bump pandas >=2.0 #24705

chore(deps): bump pandas >=2.0 #24705

sebastianliebscher commented Jul 15, 2023 •

edited

Loading

sebastianliebscher Jul 15, 2023 •

edited

Loading

sebastianliebscher Jul 15, 2023

codecov bot commented Jul 15, 2023 •

edited

Loading

sebastianliebscher Jul 15, 2023

sebastianliebscher Jul 15, 2023 •

edited

Loading

EugeneTorap commented Jul 15, 2023

john-bodley commented Jul 18, 2023

john-bodley Jul 18, 2023

sebastianliebscher Jul 19, 2023 •

edited

Loading

sebastianliebscher Jul 19, 2023

sebastianliebscher commented Jul 19, 2023

john-bodley Jul 19, 2023

john-bodley Jul 19, 2023

sebastianliebscher Jul 20, 2023

EugeneTorap commented Jul 20, 2023

chore(deps): bump pandas >=2.0 #24705

chore(deps): bump pandas >=2.0 #24705

Conversation

sebastianliebscher commented Jul 15, 2023 • edited Loading

SUMMARY

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

sebastianliebscher Jul 15, 2023 • edited Loading

Choose a reason for hiding this comment

sebastianliebscher Jul 15, 2023

Choose a reason for hiding this comment

codecov bot commented Jul 15, 2023 • edited Loading

Codecov Report

sebastianliebscher Jul 15, 2023

Choose a reason for hiding this comment

sebastianliebscher Jul 15, 2023 • edited Loading

Choose a reason for hiding this comment

EugeneTorap commented Jul 15, 2023

john-bodley commented Jul 18, 2023

john-bodley Jul 18, 2023

Choose a reason for hiding this comment

sebastianliebscher Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

sebastianliebscher Jul 19, 2023

Choose a reason for hiding this comment

sebastianliebscher commented Jul 19, 2023

john-bodley Jul 19, 2023

Choose a reason for hiding this comment

john-bodley Jul 19, 2023

Choose a reason for hiding this comment

sebastianliebscher Jul 20, 2023

Choose a reason for hiding this comment

EugeneTorap commented Jul 20, 2023

sebastianliebscher commented Jul 15, 2023 •

edited

Loading

sebastianliebscher Jul 15, 2023 •

edited

Loading

codecov bot commented Jul 15, 2023 •

edited

Loading

sebastianliebscher Jul 15, 2023 •

edited

Loading

sebastianliebscher Jul 19, 2023 •

edited

Loading