Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): bump pandas >=2.0 #24705

Merged
merged 10 commits into from
Jul 20, 2023
Merged

Conversation

sebastianliebscher
Copy link
Contributor

@sebastianliebscher sebastianliebscher commented Jul 15, 2023

SUMMARY

In Pandas 2.0 installing the optional but recommended performance dependencies is now possible via pip extras, reference. These deps are also available in 1.5.3 but with the help of pip extras and pip-compile-multi, it should be much cleaner and more obvious as why these deps are installed. I think pandas[performance] should be installed in a separate follow-up PR.
This PR bumps dependency pandas from 1.5.3 to latest stable version 2.0.3 to be able to add these optional dependencies.

With Pandas 2.0 we could also potentially

  • leverage new pandas features
  • mitigate bugs
  • improve performance

Changes in 2.0: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html

TESTING INSTRUCTIONS

  • scripts/tests/run.sh --module tests/integration_tests/
  • pytest ./tests/unit_tests/

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@@ -570,7 +568,7 @@ def get_data(self, df: pd.DataFrame) -> str | list[dict[str, Any]]:
df, index=include_index, **config["CSV_EXPORT"]
)
elif self._query_context.result_format == ChartDataResultFormat.XLSX:
result = excel.df_to_excel(df, **config["EXCEL_EXPORT"])
result = excel.df_to_excel(df)
Copy link
Contributor Author

@sebastianliebscher sebastianliebscher Jul 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -121,17 +122,19 @@ def _get_body(self) -> str:
# need to truncate the data
for i in range(len(df) - 1):
truncated_df = df[: i + 1].fillna("")
truncated_df = truncated_df.append(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Jul 15, 2023

Codecov Report

Merging #24705 (4f54588) into master (0328dd2) will decrease coverage by 0.08%.
The diff coverage is 63.63%.

❗ Current head 4f54588 differs from pull request most recent head 07360b0. Consider uploading reports for the commit 07360b0 to get more accurate results

@@            Coverage Diff             @@
##           master   #24705      +/-   ##
==========================================
- Coverage   68.97%   68.89%   -0.08%     
==========================================
  Files        1901     1901              
  Lines       74008    73927      -81     
  Branches     8183     8183              
==========================================
- Hits        51047    50932     -115     
- Misses      20840    20874      +34     
  Partials     2121     2121              
Flag Coverage Δ
hive 54.16% <28.57%> (+0.04%) ⬆️
mysql 79.21% <63.63%> (-0.15%) ⬇️
postgres 79.29% <63.63%> (-0.15%) ⬇️
presto 54.06% <28.57%> (+0.04%) ⬆️
python 83.31% <63.63%> (-0.14%) ⬇️
sqlite 77.88% <63.63%> (-0.15%) ⬇️
unit 54.88% <27.27%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rontend/src/filters/components/Range/buildQuery.ts 0.00% <ø> (ø)
superset/examples/birth_names.py 69.69% <ø> (ø)
superset/views/database/views.py 92.68% <ø> (ø)
superset/common/query_context_processor.py 85.08% <10.71%> (-4.88%) ⬇️
superset/reports/notifications/slack.py 90.58% <60.00%> (-0.88%) ⬇️
superset/charts/commands/warm_up_cache.py 98.14% <96.00%> (+0.64%) ⬆️
superset/charts/commands/export.py 94.11% <100.00%> (ø)
superset/config.py 92.17% <100.00%> (ø)
superset/row_level_security/schemas.py 100.00% <100.00%> (ø)
superset/views/core.py 69.60% <100.00%> (-0.75%) ⬇️
... and 1 more

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -201,7 +201,6 @@ def form_post(self, form: CsvToDatabaseForm) -> Response:
infer_datetime_format=form.infer_datetime_format.data,
iterator=True,
keep_default_na=not form.null_values.data,
mangle_dupe_cols=form.overwrite_duplicate.data,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -2849,7 +2849,7 @@ def levels_for(
for i in range(0, len(groups) + 1):
agg_df = df.groupby(groups[:i]) if i else df
levels[i] = (
agg_df.mean()
agg_df.mean(numeric_only=True)
Copy link
Contributor Author

@sebastianliebscher sebastianliebscher Jul 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EugeneTorap
Copy link
Contributor

LGTM! Thanks for this nice PR!

@john-bodley john-bodley self-requested a review July 18, 2023 16:48
@john-bodley
Copy link
Member

@sebastianliebscher would you mind updating the PR description to include the reason for bumping said package as this would help provide more context when reviewing? Is it purely from a code maintenance/hygiene perspective or is this change a precursor for something else?

# Excel Options: key/value pairs that will be passed as argument to DataFrame.to_excel
# method.
# note: index option should not be overridden
EXCEL_EXPORT = {"encoding": "utf-8"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EXCEL_EXPORT could include additional keyword arguments other than encoding. The challenge is if any of the other DataFrame.to_excel arguments changed, then this PR would be deemed a breaking change and thus we would need to punt on this PR until Superset 4.0.

Copy link
Contributor Author

@sebastianliebscher sebastianliebscher Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only encoding and verbose have been removed in 2.0. Both arguments were never used. The other args still have the same default.

I will leave EXCEL_EXPORT as an empty variable, so users can still set their custom overrides.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback!

@sebastianliebscher
Copy link
Contributor Author

@john-bodley PR summary updated

@@ -134,17 +134,15 @@ def get_df_payload(

if query_obj and cache_key and not cache.is_loaded:
try:
invalid_columns = [
if invalid_columns := [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the walrus.

@@ -162,8 +162,8 @@ def test_rolling_after_pivot_with_single_metric():
pd.DataFrame(
data={
"dttm": pd.to_datetime(["2019-01-01", "2019-01-02"]),
FLAT_COLUMN_SEPARATOR.join(["sum_metric", "UK"]): [5.0, 12.0],
FLAT_COLUMN_SEPARATOR.join(["sum_metric", "US"]): [6.0, 14.0],
FLAT_COLUMN_SEPARATOR.join(["sum_metric", "UK"]): [5, 12],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIce! I'm glad to see integers remain integers when summed. Would you mind updating the comment above as it still references floats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@EugeneTorap
Copy link
Contributor

Hey @john-bodley @rusackas! Can we merge the PR?

@john-bodley john-bodley merged commit 91e6f5c into apache:master Jul 20, 2023
@EugeneTorap EugeneTorap deleted the bump_pandas branch July 20, 2023 18:04
@michael-s-molina michael-s-molina added v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch and removed v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch labels Jul 24, 2023
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
Co-authored-by: EugeneTorap <evgenykrutpro@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants