Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6 #21002

Merged
merged 13 commits into from Aug 17, 2022
Merged

chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6 #21002

merged 13 commits into from Aug 17, 2022

Conversation

EugeneTorap
Copy link
Contributor

Fix #19986 issue when a user tries to install superset using Python 3.10 because pyarrow 5.0.0 doesn't have a wheel for Python 3.10

SUMMARY

In order to use Python 3.10 in superset we need to bump PyArrow (from 5.0.0 to 6.0.1)
Also bump Pandas to latest minor (from 1.3.4 to 1.4.3).

Pandas 1.4 added a wheel for Python 3.9, Apple Silicon

Pandas 1.4 introduced support for using pyarrow as an engine for reading CSVs, which brings performance improvements (see https://pandas.pydata.org/docs/whatsnew/v1.4.0.html#multi-threaded-csv-reading-with-a-new-csv-engine-based-on-pyarrow for details). Therefore engine="pyarrow" has been added everywhere we're calling pd.read_csv.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Aug 7, 2022

Codecov Report

Merging #21002 (122d691) into master (e214e1a) will decrease coverage by 0.09%.
The diff coverage is 63.98%.

❗ Current head 122d691 differs from pull request most recent head 213bf79. Consider uploading reports for the commit 213bf79 to get more accurate results

@@            Coverage Diff             @@
##           master   #21002      +/-   ##
==========================================
- Coverage   66.34%   66.25%   -0.10%     
==========================================
  Files        1767     1770       +3     
  Lines       67312    67526     +214     
  Branches     7144     7182      +38     
==========================================
+ Hits        44656    44737      +81     
- Misses      20828    20953     +125     
- Partials     1828     1836       +8     
Flag Coverage Δ
hive 53.17% <45.76%> (+0.01%) ⬆️
mysql 80.96% <69.49%> (+0.04%) ⬆️
postgres 81.00% <69.49%> (+0.01%) ⬆️
presto 53.07% <45.76%> (+0.01%) ⬆️
python 81.43% <69.49%> (-0.04%) ⬇️
sqlite ?
unit 50.74% <52.54%> (+0.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...packages/superset-ui-core/src/query/types/Query.ts 100.00% <ø> (ø)
...set-ui-core/src/ui-overrides/ExtensionsRegistry.ts 100.00% <ø> (ø)
...ackages/superset-ui-core/src/utils/featureFlags.ts 100.00% <ø> (ø)
...rts/src/BigNumber/BigNumberTotal/transformProps.ts 0.00% <0.00%> (ø)
...lugin-chart-echarts/src/BigNumber/BigNumberViz.tsx 0.00% <0.00%> (ø)
...lugin-chart-echarts/src/BoxPlot/EchartsBoxPlot.tsx 0.00% <0.00%> (ø)
.../plugins/plugin-chart-echarts/src/BoxPlot/types.ts 0.00% <ø> (ø)
.../plugin-chart-echarts/src/Funnel/EchartsFunnel.tsx 0.00% <0.00%> (ø)
...d/plugins/plugin-chart-echarts/src/Funnel/types.ts 100.00% <ø> (ø)
...ns/plugin-chart-echarts/src/Gauge/EchartsGauge.tsx 0.00% <0.00%> (ø)
... and 89 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@EugeneTorap
Copy link
Contributor Author

@hughhhh @betodealmeida Can you review it?

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! But I'm very concerned with how in the unit tests some of the NaNs are now being returned as zeros, since it would lead to wrong results. Any idea why that is happening here?

@EugeneTorap
Copy link
Contributor Author

EugeneTorap commented Aug 8, 2022

How should I fix this test?
Pandas returns 0 instead of nan for the API

@betodealmeida
Copy link
Member

How should I fix this test? Pandas returns 0 instead of nan for the API

Taking another look, I guess 0 makes sense from a contribution point of view. It should be fine in this case.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look and have a few questions.

superset/charts/post_processing.py Outdated Show resolved Hide resolved
superset/examples/bart_lines.py Outdated Show resolved Hide resolved
@EugeneTorap
Copy link
Contributor Author

@betodealmeida @villebro Can you review again?

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the iterations!

Copy link
Member

@zhaoyongjie zhaoyongjie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @EugeneTorap and @villebro

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks for the work, @EugeneTorap!

Comment on lines +73 to +74
def get_example_url(filepath: str) -> str:
return f"{BASE_URL}{filepath}?raw=true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -49,6 +49,9 @@ def contribution(
"""
contribution_df = df.copy()
numeric_df = contribution_df.select_dtypes(include=["number", Decimal])
# TODO: copy needed due to following regression in 1.4, remove if not needed:
# https://github.com/pandas-dev/pandas/issues/48090
numeric_df = numeric_df.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@betodealmeida betodealmeida merged commit 76d6a9a into apache:master Aug 17, 2022
@EugeneTorap EugeneTorap deleted the feat/support-python3.10 branch August 17, 2022 13:38
@cwegener
Copy link
Contributor

Nice work! Going to test this out very soon.

I know that there used to be the problem of and empty result set from SQLalchemy causing an Exception in pandas when using PyArrow 6.0 and higher, leading to unfriendly error messages in Explore (and charts on dashboards) instead of the friendly "No data" message.

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fails to install on Python 3.10 because of Numpy dependency
7 participants