-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(deps): bump pandas >=2.0 #24705
Conversation
@@ -570,7 +568,7 @@ def get_data(self, df: pd.DataFrame) -> str | list[dict[str, Any]]: | |||
df, index=include_index, **config["CSV_EXPORT"] | |||
) | |||
elif self._query_context.result_format == ChartDataResultFormat.XLSX: | |||
result = excel.df_to_excel(df, **config["EXCEL_EXPORT"]) | |||
result = excel.df_to_excel(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyword arg "encoding" is deprecated since 1.5.0 https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.DataFrame.to_excel.html
@@ -121,17 +122,19 @@ def _get_body(self) -> str: | |||
# need to truncate the data | |||
for i in range(len(df) - 1): | |||
truncated_df = df[: i + 1].fillna("") | |||
truncated_df = truncated_df.append( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df.append deprecated since 1.4 https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.DataFrame.append.html
Codecov Report
@@ Coverage Diff @@
## master #24705 +/- ##
==========================================
- Coverage 68.97% 68.89% -0.08%
==========================================
Files 1901 1901
Lines 74008 73927 -81
Branches 8183 8183
==========================================
- Hits 51047 50932 -115
- Misses 20840 20874 +34
Partials 2121 2121
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 3 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@@ -201,7 +201,6 @@ def form_post(self, form: CsvToDatabaseForm) -> Response: | |||
infer_datetime_format=form.infer_datetime_format.data, | |||
iterator=True, | |||
keep_default_na=not form.null_values.data, | |||
mangle_dupe_cols=form.overwrite_duplicate.data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecated since 1.5 https://pandas.pydata.org/pandas-docs/version/1.5/reference/api/pandas.read_csv.html
@@ -2849,7 +2849,7 @@ def levels_for( | |||
for i in range(0, len(groups) + 1): | |||
agg_df = df.groupby(groups[:i]) if i else df | |||
levels[i] = ( | |||
agg_df.mean() | |||
agg_df.mean(numeric_only=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new default in 2.0 is False
https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#removal-of-prior-version-deprecations-changes
LGTM! Thanks for this nice PR! |
@sebastianliebscher would you mind updating the PR description to include the reason for bumping said package as this would help provide more context when reviewing? Is it purely from a code maintenance/hygiene perspective or is this change a precursor for something else? |
# Excel Options: key/value pairs that will be passed as argument to DataFrame.to_excel | ||
# method. | ||
# note: index option should not be overridden | ||
EXCEL_EXPORT = {"encoding": "utf-8"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EXCEL_EXPORT
could include additional keyword arguments other than encoding
. The challenge is if any of the other DataFrame.to_excel arguments changed, then this PR would be deemed a breaking change and thus we would need to punt on this PR until Superset 4.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only encoding
and verbose
have been removed in 2.0. Both arguments were never used. The other args still have the same default.
I will leave EXCEL_EXPORT
as an empty variable, so users can still set their custom overrides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback!
@john-bodley PR summary updated |
@@ -134,17 +134,15 @@ def get_df_payload( | |||
|
|||
if query_obj and cache_key and not cache.is_loaded: | |||
try: | |||
invalid_columns = [ | |||
if invalid_columns := [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the walrus.
@@ -162,8 +162,8 @@ def test_rolling_after_pivot_with_single_metric(): | |||
pd.DataFrame( | |||
data={ | |||
"dttm": pd.to_datetime(["2019-01-01", "2019-01-02"]), | |||
FLAT_COLUMN_SEPARATOR.join(["sum_metric", "UK"]): [5.0, 12.0], | |||
FLAT_COLUMN_SEPARATOR.join(["sum_metric", "US"]): [6.0, 14.0], | |||
FLAT_COLUMN_SEPARATOR.join(["sum_metric", "UK"]): [5, 12], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIce! I'm glad to see integers remain integers when summed. Would you mind updating the comment above as it still references floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Hey @john-bodley @rusackas! Can we merge the PR? |
Co-authored-by: EugeneTorap <evgenykrutpro@gmail.com>
SUMMARY
In Pandas 2.0 installing the optional but recommended performance dependencies is now possible via pip extras, reference. These deps are also available in 1.5.3 but with the help of pip extras and pip-compile-multi, it should be much cleaner and more obvious as why these deps are installed. I think
pandas[performance]
should be installed in a separate follow-up PR.This PR bumps dependency pandas from
1.5.3
to latest stable version2.0.3
to be able to add these optional dependencies.With Pandas 2.0 we could also potentially
Changes in 2.0: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html
TESTING INSTRUCTIONS
scripts/tests/run.sh --module tests/integration_tests/
pytest ./tests/unit_tests/
ADDITIONAL INFORMATION