Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream fixes #10549

Merged
merged 21 commits into from Oct 18, 2023
Merged

upstream fixes #10549

merged 21 commits into from Oct 18, 2023

Conversation

graingert
Copy link
Member

extends #10539

  • Closes #xxxx
  • Tests added / passed
  • Passes pre-commit run --all-files

@graingert graingert marked this pull request as draft October 9, 2023 14:14
@graingert
Copy link
Member Author

graingert commented Oct 11, 2023

hmm these tests are failing every time in CI but not locally for me

FAILED dask/dataframe/tests/test_multi.py::test_concat_categorical[True-False-True] - AssertionError: Series are different

Series values are different (26.66667 %)
[index]: [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
[left]:  ['a', 'a', 'b', 'b', 'b', ..., 'b', 'b', 'a', 'c', 'c']
Length: 15
Categories (3, object): ['a', 'b', 'c']
[right]: ['a', 'a', 'b', 'b', 'c', ..., 'b', 'b', 'a', 'c', 'c']
Length: 15
Categories (3, object): ['a', 'b', 'c']
At positional index 4, first diff: b != c
FAILED dask/dataframe/tests/test_multi.py::test_concat_categorical[False-False-True] - AssertionError: Series are different

Series values are different (26.66667 %)
[index]: [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4]
[left]:  ['a', 'a', 'b', 'b', 'b', ..., 'b', 'b', 'a', 'c', 'c']
Length: 15
Categories (3, object): ['a', 'b', 'c']
[right]: ['a', 'a', 'b', 'b', 'c', ..., 'b', 'b', 'a', 'c', 'c']
Length: 15
Categories (3, object): ['a', 'b', 'c']
At positional index 4, first diff: b != c

eg:

dask/dataframe/tests/test_multi.py::test_concat_categorical[True-True-False] PASSED                                                                         [ 16%]
dask/dataframe/tests/test_multi.py::test_concat_categorical[True-False-True] PASSED                                                                         [ 33%]
dask/dataframe/tests/test_multi.py::test_concat_categorical[True-False-False] PASSED                                                                        [ 50%]
dask/dataframe/tests/test_multi.py::test_concat_categorical[False-True-False] PASSED                                                                        [ 66%]
dask/dataframe/tests/test_multi.py::test_concat_categorical[False-False-True] PASSED                                                                        [ 83%]
dask/dataframe/tests/test_multi.py::test_concat_categorical[False-False-False] PASSED                                                                       [100%]

all pass for me locally

@graingert
Copy link
Member Author

dask/tests/test_sizeof.py::test_sparse_matrix is failing due to scipy/scipy#18929

@graingert
Copy link
Member Author

diffing my environment gets:

--- ci.txt	2023-10-11 13:28:30.957665048 +0100
+++ mine.txt	2023-10-11 13:26:42.778166612 +0100
@@ -1,4 +1,4 @@
-# packages in environment at /usr/share/miniconda3/envs/test-environment:
+# packages in environment at /home/graingert/mambaforge/envs/test-environment-upstream:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                 conda_forge    conda-forge
@@ -67,7 +67,7 @@
 cryptography              41.0.4          py310h75e40e8_0    conda-forge
 curl                      8.4.0                hca28451_0    conda-forge
 cytoolz                   0.12.2          py310h2372a71_1    conda-forge
-dask                      0+untagged.1.ge3a25a3.dirty          pypi_0    pypi
+dask                      2023.9.3+13.g010b3099.dirty          pypi_0    pypi
 dav1d                     1.2.1                hd590300_0    conda-forge
 debugpy                   1.8.0           py310hc6cd4ac_1    conda-forge
 decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
@@ -220,7 +220,7 @@
 nbformat                  5.9.2              pyhd8ed1ab_0    conda-forge
 ncurses                   6.4                  hcb278e6_0    conda-forge
 ndindex                   1.7                      pypi_0    pypi
-nest-asyncio              1.5.7              pyhd8ed1ab_0    conda-forge
+nest-asyncio              1.5.6              pyhd8ed1ab_0    conda-forge
 networkx                  3.1                pyhd8ed1ab_0    conda-forge
 nodeenv                   1.8.0              pyhd8ed1ab_0    conda-forge
 nomkl                     1.0                  h5ca1d4c_0    conda-forge
@@ -271,6 +271,7 @@
 pyspark                   3.5.0              pyhd8ed1ab_0    conda-forge
 pytest                    7.4.2              pyhd8ed1ab_0    conda-forge
 pytest-cov                4.1.0              pyhd8ed1ab_0    conda-forge
 pytest-reportlog          0.1.2              pyhd8ed1ab_0    conda-forge
 pytest-rerunfailures      12.0               pyhd8ed1ab_0    conda-forge
 pytest-timeout            2.2.0              pyhd8ed1ab_0    conda-forge

@phofl
Copy link
Collaborator

phofl commented Oct 11, 2023

Using the newest Dask release and pandas main I get the following:

Traceback (most recent call last):
  File "/Users/patrick/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-0/232.8660.197/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1500, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Users/patrick/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-0/232.8660.197/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2023.2/scratches/scratch.py", line 552, in <module>
    res = check_and_return(dframes, frames, join)
  File "/Users/patrick/Library/Application Support/JetBrains/PyCharm2023.2/scratches/scratch.py", line 539, in check_and_return
    assert_eq(res, sol)
  File "/Users/patrick/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/dask/dataframe/utils.py", line 602, in assert_eq
    tm.assert_frame_equal(
  File "/Users/patrick/PycharmProjects/pandas/pandas/_testing/asserters.py", line 1210, in assert_frame_equal
    assert_series_equal(
  File "/Users/patrick/PycharmProjects/pandas/pandas/_testing/asserters.py", line 919, in assert_series_equal
    assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
  File "/Users/patrick/PycharmProjects/pandas/pandas/_testing/asserters.py", line 414, in assert_attr_equal
    raise_assert_detail(obj, msg, left_attr, right_attr)
  File "/Users/patrick/PycharmProjects/pandas/pandas/_testing/asserters.py", line 599, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="w") are different

Attribute "dtype" are different
[left]:  CategoricalDtype(categories=['x', 'y', 'z'], ordered=False, categories_dtype=object)
[right]: string[pyarrow]

@graingert graingert marked this pull request as ready for review October 17, 2023 14:12
@graingert
Copy link
Member Author

I've added to the issue here #10558

Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small comments, otherwise lgtm

continuous_integration/scripts/install.sh Outdated Show resolved Hide resolved
dask/dataframe/tests/test_multi.py Outdated Show resolved Hide resolved
@@ -148,6 +148,8 @@ filterwarnings = [
"ignore:'H' is deprecated and will be removed in a future version. Please use 'h' instead of 'H':FutureWarning",
'ignore:DataFrameGroupBy\.apply operated on the grouping columns\. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation\. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning\.:FutureWarning',
"ignore:'BA' is deprecated and will be removed in a future version. Please use 'BY' instead of 'BA':FutureWarning",
'ignore:numpy\.core.* is deprecated and has been renamed to numpy\._core\.*:DeprecationWarning', # needed for pandas to import
'ignore:Passing a BlockManager to DataFrame is deprecated and will raise in a future version. Use public APIs instead:DeprecationWarning',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something we have to address over in partd, can you open an issue there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue opened here dask/partd#77

@@ -150,7 +150,7 @@ filterwarnings = [
"ignore:'BA' is deprecated and will be removed in a future version. Please use 'BY' instead of 'BA':FutureWarning",
'ignore:numpy\.core.* is deprecated and has been renamed to numpy\._core\.*:DeprecationWarning', # needed for pandas to import
'ignore:Passing a BlockManager to DataFrame is deprecated and will raise in a future version. Use public APIs instead:DeprecationWarning',

'ignore:The previous implementation of stack is deprecated and will be removed in a future version of pandas\.:FutureWarning',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you open a follow up issue to test the new implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened here #10568

@phofl
Copy link
Collaborator

phofl commented Oct 18, 2023

2 non blocking comments

feel free to merge when ci is through

@graingert
Copy link
Member Author

@phofl I don't have commit-bit on dask/dask so I'll need you to merge

@phofl phofl merged commit 51210a6 into dask:main Oct 18, 2023
24 checks passed
@phofl
Copy link
Collaborator

phofl commented Oct 18, 2023

thx @graingert

@graingert graingert deleted the upstream-fix branch October 18, 2023 12:32
"ignore:'A' is deprecated and will be removed in a future version. Please use 'Y' instead of 'A':FutureWarning",
"ignore:'A-JUN' is deprecated and will be removed in a future version. Please use 'Y-JUN' instead of 'A-JUN':FutureWarning",
"ignore:'H' is deprecated and will be removed in a future version. Please use 'h' instead of 'H':FutureWarning",
'ignore:DataFrameGroupBy\.apply operated on the grouping columns\. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation\. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning\.:FutureWarning',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs an issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue here: #10573

@@ -139,6 +139,18 @@ filterwarnings = [
# https://pandas.pydata.org/docs/dev/whatsnew/v1.5.0.html#using-group-keys-with-transformers-in-groupby-apply
"ignore:Not prepending group keys:FutureWarning",
"ignore:.*:dask.tests.warning_aliases.RemovedIn20Warning",
"ignore:When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas:FutureWarning",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs an issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue here: #10572

This was referenced Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants