Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][Python] Pandas upstream_devel and nightlies are failing #35235

Closed
raulcd opened this issue Apr 19, 2023 · 5 comments · Fixed by #35248
Closed

[CI][Python] Pandas upstream_devel and nightlies are failing #35235

raulcd opened this issue Apr 19, 2023 · 5 comments · Fixed by #35248

Comments

@raulcd
Copy link
Member

raulcd commented Apr 19, 2023

Describe the bug, including details regarding any error messages, version, and platform.

The following pandas testing jobs have started failing:

There are a couple of test failures:

 =================================== FAILURES ===================================
_____ TestConvertDateTimeLikeTypes.test_timestamp_to_pandas_out_of_bounds ______

self = <pyarrow.tests.test_pandas.TestConvertDateTimeLikeTypes object at 0x7fdc7d41bb40>

    def test_timestamp_to_pandas_out_of_bounds(self):
        # ARROW-7758 check for out of bounds timestamps for non-ns timestamps
    
        for unit in ['s', 'ms', 'us']:
            for tz in [None, 'America/New_York']:
                arr = pa.array([datetime(1, 1, 1)], pa.timestamp(unit, tz=tz))
                table = pa.table({'a': arr})
    
                msg = "would result in out of bounds timestamp"
                with pytest.raises(ValueError, match=msg):
                    arr.to_pandas()
    
                with pytest.raises(ValueError, match=msg):
                    table.to_pandas()
    
                with pytest.raises(ValueError, match=msg):
                    # chunked array
                    table.column('a').to_pandas()
    
                # just ensure those don't give an error, but do not
                # check actual garbage output
                arr.to_pandas(safe=False)
                table.to_pandas(safe=False)
>               table.column('a').to_pandas(safe=False)

opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/tests/test_pandas.py:1461: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyarrow/array.pxi:837: in pyarrow.lib._PandasConvertible.to_pandas
    ???
pyarrow/table.pxi:469: in pyarrow.lib.ChunkedArray._to_pandas
    ???
opt/conda/envs/arrow/lib/python3.8/site-packages/pandas/core/dtypes/dtypes.py:848: in __from_arrow__
    array = array.cast(pyarrow.timestamp(unit=self._unit), safe=True)
pyarrow/table.pxi:551: in pyarrow.lib.ChunkedArray.cast
    ???
opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/compute.py:400: in cast
    return call_function("cast", [arr], options, memory_pool)
pyarrow/_compute.pyx:572: in pyarrow._compute.call_function
    ???
pyarrow/_compute.pyx:367: in pyarrow._compute.Function.call
    ???
pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   pyarrow.lib.ArrowInvalid: Casting from timestamp[s, tz=America/New_York] to timestamp[ns] would result in out of bounds timestamp: -62135596800

pyarrow/error.pxi:100: ArrowInvalid
__________________ test_chunked_array_to_pandas_preserve_name __________________

    @pytest.mark.pandas
    def test_chunked_array_to_pandas_preserve_name():
        # https://issues.apache.org/jira/browse/ARROW-7709
        import pandas as pd
        import pandas.testing as tm
    
        for data in [
                pa.array([1, 2, 3]),
                pa.array(pd.Categorical(["a", "b", "a"])),
                pa.array(pd.date_range("2012", periods=3)),
                pa.array(pd.date_range("2012", periods=3, tz="Europe/Brussels")),
                pa.array([1, 2, 3], pa.timestamp("ms")),
                pa.array([1, 2, 3], pa.timestamp("ms", "Europe/Brussels"))]:
            table = pa.table({"name": data})
            result = table.column("name").to_pandas()
>           assert result.name == "name"
E           AssertionError: assert None == 'name'
E            +  where None = 0   2012-01-01 00:00:00+01:00\n1   2012-01-02 00:00:00+01:00\n2   2012-01-03 00:00:00+01:00\ndtype: datetime64[ns, Europe/Brussels].name

opt/conda/envs/arrow/lib/python3.8/site-packages/pyarrow/tests/test_table.py:348: AssertionError

This is happening on the 12.0.0 maintenance branch, meaning there has been something introduced on Pandas.

Component(s)

Continuous Integration, Python

@raulcd
Copy link
Member Author

raulcd commented Apr 19, 2023

@jorisvandenbossche @AlenkaF I don't think this would require a fix on our side for the 12.0.0 but I would appreciate your thoughts on this one.

@raulcd
Copy link
Member Author

raulcd commented Apr 19, 2023

The failure on pyarrow/tests/test_pandas.py::TestConvertDateTimeLikeTypes::test_timestamp_to_pandas_out_of_bounds seems related with Timezone, because it passes locally with pandas dev if I just do this:

$ git diff
diff --git a/python/pyarrow/tests/test_pandas.py b/python/pyarrow/tests/test_pandas.py
index 60c4831..129502b 100644
--- a/python/pyarrow/tests/test_pandas.py
+++ b/python/pyarrow/tests/test_pandas.py
@@ -1458,7 +1458,11 @@ class TestConvertDateTimeLikeTypes:
                 # check actual garbage output
                 arr.to_pandas(safe=False)
                 table.to_pandas(safe=False)
-                table.column('a').to_pandas(safe=False)
+                if not tz:
+                    table.column('a').to_pandas(safe=False)
+                else:
+                    with pytest.raises(pa.ArrowInvalid, match=msg):
+                        table.column('a').to_pandas(safe=False)
 
     def test_timestamp_to_pandas_empty_chunked(self):
         # ARROW-7907 table with chunked array with 0 chunks

Taking a look on the latest pandas commits: https://github.com/pandas-dev/pandas/commits/main
it might be related to: pandas-dev/pandas#52677
but I am unsure

@AlenkaF
Copy link
Member

AlenkaF commented Apr 19, 2023

I have came to the same conclusion. Both tests are failing if tz is not None.
There have been two PRs merged in the last day on the Pandas side:

and the PR that you linked, Raul, seems the closest to being the reason for this. But I haven't managed to figure out the connection.

@AlenkaF
Copy link
Member

AlenkaF commented Apr 20, 2023

Here is an update after going through the issue with Joris:

I will create a PR asap to skip/fix the tests.

@raulcd
Copy link
Member Author

raulcd commented Apr 20, 2023

Thanks @AlenkaF ! I am marking this issue for 12.0.0 and as a blocker. Will hold to create the RC until this is merged.

@raulcd raulcd added this to the 12.0.0 milestone Apr 20, 2023
@raulcd raulcd added the Priority: Blocker Marks a blocker for the release label Apr 20, 2023
raulcd pushed a commit that referenced this issue Apr 20, 2023
#35248)

* Closes: #35235

Lead-authored-by: Alenka Frim <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
raulcd pushed a commit that referenced this issue Apr 21, 2023
#35248)

* Closes: #35235

Lead-authored-by: Alenka Frim <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
…failing (apache#35248)

* Closes: apache#35235

Lead-authored-by: Alenka Frim <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…failing (apache#35248)

* Closes: apache#35235

Lead-authored-by: Alenka Frim <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…failing (apache#35248)

* Closes: apache#35235

Lead-authored-by: Alenka Frim <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants