Skip to content

Conversation

@wesm
Copy link
Member

@wesm wesm commented Apr 21, 2017

No description provided.

@wesm
Copy link
Member Author

wesm commented Apr 21, 2017

We're down to 4 test failures

================================== FAILURES ===================================
_______________ TestConvertSequence.test_decimal_large_integer ________________

self = <pyarrow.tests.test_convert_builtin.TestConvertSequence testMethod=test_decimal_large_integer>

    def test_decimal_large_integer(self):
        data = [decimal.Decimal('-394029506937548693.42983'),
                decimal.Decimal('32358695912932.01033')]
        type = pa.decimal(precision=23, scale=5)
        arr = pa.array(data, type=type)
>       assert arr.to_pylist() == data
E       AssertionError: assert [Decimal('-44...95161.12521')] == [Decimal('-394...12932.01033')]
E         At index 0 diff: Decimal('-443025006214094900.02695') != Decimal('-394029506937548693.42983')
E         Use -v to get the full diff

pyarrow\tests\test_convert_builtin.py:210: AssertionError
---------------------------- Captured stdout call -----------------------------
parsing decimal: -394029506937548693.42983
parsing decimal: 32358695912932.01033
_______________ TestPandasConversion.test_decimal_128_to_pandas _______________

self = <pyarrow.tests.test_convert_pandas.TestPandasConversion testMethod=test_decimal_128_to_pandas>

    def test_decimal_128_to_pandas(self):
        expected = pd.DataFrame({
            'decimals': [
                decimal.Decimal('394092382910493.12341234678'),
                -decimal.Decimal('314292388910493.12343437128'),
            ]
        })
        converted = pa.Table.from_pandas(expected)
        df = converted.to_pandas()
>       tm.assert_frame_equal(df, expected)

pyarrow\tests\test_convert_pandas.py:588:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
..\..\..\Miniconda\envs\arrow-test\lib\site-packages\pandas\util\testing.py:1313: in assert_frame_equal
    obj='DataFrame.iloc[:, {0}]'.format(i))
..\..\..\Miniconda\envs\arrow-test\lib\site-packages\pandas\util\testing.py:1181: in assert_series_equal
    obj='{0}'.format(obj))
pandas\src\testing.pyx:59: in pandas._testing.assert_almost_equal (pandas\src\testing.c:4156)
    ???
pandas\src\testing.pyx:173: in pandas._testing.assert_almost_equal (pandas\src\testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

obj = 'DataFrame.iloc[:, 0]'
message = 'DataFrame.iloc[:, 0] values are different (100.0 %)'
left = '[241742630788033.69620444150, -161942636788033.69622646600]'
right = '[394092382910493.12341234678, -314292388910493.12343437128]'
diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)

        msg = """{0} are different

    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)

        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)

>       raise AssertionError(msg)
E       AssertionError: DataFrame.iloc[:, 0] are different
E
E       DataFrame.iloc[:, 0] values are different (100.0 %)
E       [left]:  [241742630788033.69620444150, -161942636788033.69622646600]
E       [right]: [394092382910493.12341234678, -314292388910493.12343437128]

..\..\..\Miniconda\envs\arrow-test\lib\site-packages\pandas\util\testing.py:1018: AssertionError
---------------------------- Captured stdout call -----------------------------
parsing decimal: 394092382910493.12341234678
parsing decimal: -314292388910493.12343437128
_________________ TestPandasConversion.test_integer_no_nulls __________________

self = <pyarrow.tests.test_convert_pandas.TestPandasConversion testMethod=test_integer_no_nulls>

    def test_integer_no_nulls(self):
        data = OrderedDict()
        fields = []

        numpy_dtypes = [
            ('i1', pa.int8()), ('i2', pa.int16()),
            ('i4', pa.int32()), ('i8', pa.int64()),
            ('u1', pa.uint8()), ('u2', pa.uint16()),
            ('u4', pa.uint32()), ('u8', pa.uint64()),
            ('longlong', pa.int64()), ('ulonglong', pa.uint64())
        ]
        num_values = 100

        for dtype, arrow_dtype in numpy_dtypes:
            info = np.iinfo(dtype)
            values = np.random.randint(info.min,
                                       min(info.max, np.iinfo('i8').max),
>                                      size=num_values)

pyarrow\tests\test_convert_pandas.py:160:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ValueError: low is out of bounds for int32

mtrand.pyx:973: ValueError
_____________ TestFeatherReader.test_delete_partial_file_on_error _____________

self = <pyarrow.tests.test_feather.TestFeatherReader testMethod=test_delete_partial_file_on_error>

    def test_delete_partial_file_on_error(self):
        # strings will fail
        df = pd.DataFrame(
            {
                'numbers': range(5),
                'strings': [b'foo', None, u'bar', 'qux', np.nan]},
            columns=['numbers', 'strings'])

        path = random_path()
        try:
            write_feather(df, path)
        except:
            pass

>       assert not os.path.exists(path)
E       AssertionError: assert not True
E        +  where True = <function exists at 0x0000020693214400>('feather_90d59fbf6104454bbb9f777ca849b132')
E        +    where <function exists at 0x0000020693214400> = <module 'ntpath' from 'C:\\Users\\wesm\\Miniconda\\envs\\arrow-test\\lib\\ntpath.py'>.exists
E        +      where <module 'ntpath' from 'C:\\Users\\wesm\\Miniconda\\envs\\arrow-test\\lib\\ntpath.py'> = os.path

pyarrow\tests\test_feather.py:267: AssertionError
========= 4 failed, 163 passed, 37 skipped, 2 xfailed in 2.30 seconds =========

It seems that the 128-bit string-to-integer function used in decimal parsing maybe doesn't work on windows

void StringToInteger(
    const std::string& whole, const std::string& fractional, int8_t sign, int128_t* out) {
  DCHECK(sign == -1 || sign == 1);
  DCHECK_NE(out, nullptr);
  DCHECK(!whole.empty() || !fractional.empty());
  *out = int128_t(whole + fractional) * sign;
}

I'm looking into the two other test failures

@wesm
Copy link
Member Author

wesm commented Apr 21, 2017

here's print statements from inside StringToInteger

pyarrow/tests/test_convert_builtin.py::TestConvertSequence::test_decimal_large_integer whole: 394029506937548693 fractional: 42983 out: -39402950693754869342983
whole: 32358695912932 fractional: 01033 out: 3235869591293201033
FAILED

digging down into the conversion back to pydecimal

@wesm
Copy link
Member Author

wesm commented Apr 21, 2017

Here's telemetry from inside decimal::ToString

(Pdb) data[0]                         
Decimal('-394029506937548693.42983')  
(Pdb) arr[0]                          
0000000000000000000000000             
0000000000000000000000005             
0000000000000000000000015             
0000000000000000000000015             
0000000000000000000009015             
0000000000000000000.39015             
0000000000000000005.39015             
0000000000000000005.39015             
0000000000000000905.39015             
0000000000000003905.39015             
0000000000000003905.39015             
0000000000000703905.39015             
0000000000000703905.39015             
0000000000040703905.39015             
0000000000440703905.39015             
0000000009440703905.39015             
0000000049440703905.39015             
0000000649440703905.39015             
0000001649440703905.39015             
0000071649440703905.39015             
0000571649440703905.39015             
0008571649440703905.39015             
0038571649440703905.39015             
-038571649440703905.39015             
Decimal('-38571649440703905.39015')   

@wesm
Copy link
Member Author

wesm commented Apr 21, 2017

Alright, all tests are passing now except for the Decimal128 issue

@wesm
Copy link
Member Author

wesm commented Apr 21, 2017

Looks like there might be more issues than just the decimal thing

(C:\arrow-conda-env) C:\projects\arrow\python>py.test pyarrow -v   || exit /B 
============================= test session starts =============================
platform win32 -- Python 3.5.3, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -- C:\arrow-conda-env\python.exe
cachedir: .cache
rootdir: C:\projects\arrow\python, inifile:
collecting ... collected 206 items
pyarrow/tests/test_array.py::test_total_bytes_allocated PASSED
pyarrow/tests/test_array.py::test_repr_on_pre_init_array PASSED
pyarrow/tests/test_array.py::test_getitem_NA PASSED
pyarrow/tests/test_array.py::test_list_format PASSED
pyarrow/tests/test_array.py::test_string_format PASSED
pyarrow/tests/test_array.py::test_long_array_format PASSED
pyarrow/tests/test_array.py::test_to_pandas_zero_copy 
(C:\arrow-conda-env) C:\projects\arrow\python>set lastexitcode=-1073741819 
(C:\arrow-conda-env) C:\projects\arrow\python>set  1>C:\Users\appveyor\AppData\Local\Temp\1\tmp6968.tmp 
(C:\arrow-conda-env) C:\projects\arrow\python>echo C:\projects\arrow\python  1>C:\Users\appveyor\AppData\Local\Temp\1\tmp6969.tmp 
(C:\arrow-conda-env) C:\projects\arrow\python>exit /b -1073741819 
Command exited with code -1073741819

Will run test suite under valgrind on Linux to see if there's a bad memory access that we've been ignoring

@wesm
Copy link
Member Author

wesm commented Apr 21, 2017

Valgrind output on master branch. Seems like some of the typical spurious Python valgrind warnings, don't see anything overtly worrisome:

https://gist.github.com/wesm/fbbd9ccf19267a6bbd6872a6e5dc31dc

@wesm wesm force-pushed the ARROW-867 branch 3 times, most recently from 6854e36 to af6125b Compare April 21, 2017 22:52
@wesm
Copy link
Member Author

wesm commented Apr 21, 2017

So it seems there's a segfault the first time using the NumPy C API is attempted. Not sure why it works locally and fails in Appveyor.

@wesm
Copy link
Member Author

wesm commented Apr 22, 2017

I'm completely stumped about this Appveyor issue with the NumPy C API. Cannot reproduce locally on Windows 10. If we can fix the Decimal unit tests (locally), then I suggest we merge this without running the unit tests in Appveyor until we can figure out a way to debug

@maxhora
Copy link
Contributor

maxhora commented Apr 24, 2017

@wesm , wrong shared library .lib file looks like might be really the reason of NumPy C API issue, but I have tried to run your changes locally and rebased against latest commits, in both cases got python test session errors, like following:

________________ ERROR collecting pyarrow/tests/test_array.py _________________
ImportError while importing test module 'C:\Work\TwoSigma\wesm-arrow-src\python\pyarrow\tests\test_array.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
C:\Users\Max\Miniconda3\lib\site-packages\pandas\__init__.py:25: in <module>
    from pandas import hashtable, tslib, lib
E   ImportError: DLL load failed: The specified procedure could not be found.

During handling of the above exception, another exception occurred:
pyarrow\tests\test_array.py:22: in <module>
    import pandas as pd
C:\Users\Max\Miniconda3\lib\site-packages\pandas\__init__.py:31: in <module>
    "the C extensions first.".format(module))

@wesm
Copy link
Member Author

wesm commented Apr 24, 2017

@Maxris can you create a clean conda environment and follow the build instructions in

https://github.com/apache/arrow/blob/master/ci/msvc-build.bat

The pandas import error looks unrelated

@wesm wesm changed the title WIP ARROW-867: [Python] Add pyarrow unit tests to Appveyor, fixes ARROW-867: [Python] Add pyarrow unit tests to Appveyor, fixes Apr 25, 2017
@wesm wesm changed the title ARROW-867: [Python] Add pyarrow unit tests to Appveyor, fixes ARROW-867: [Python] pyarrow MSVC fixes Apr 25, 2017
@wesm
Copy link
Member Author

wesm commented Apr 25, 2017

@Maxris indeed the static lib issue was the cause of the problems I was having. Now we just have test failures remaining

@cpcloud
Copy link
Contributor

cpcloud commented Apr 25, 2017

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these identical?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

@wesm
Copy link
Member Author

wesm commented Apr 26, 2017

Yes, that line is required to import numpy

…sion,

platform ints. Add release/acquire methods to PyAcquireGIL lock object. Remove
a couple unneeded GIL acquisitions
@wesm
Copy link
Member Author

wesm commented Apr 27, 2017

I disable the py.test call in ci/msvc-build.bat. Let's merge this when the build is green, then when we fix the decimal tests we can turn back on the unit tests

@wesm
Copy link
Member Author

wesm commented Apr 27, 2017

+1

@asfgit asfgit closed this in 909f826 Apr 27, 2017
jeffknupp pushed a commit to jeffknupp/arrow that referenced this pull request Jun 3, 2017
Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#575 from wesm/ARROW-867 and squashes the following commits:

0483cfb [Wes McKinney] Do not encode file paths to utf-16le on Windows. Fix date/time conversion, platform ints. Add release/acquire methods to PyAcquireGIL lock object. Remove a couple unneeded GIL acquisitions
@wesm wesm deleted the ARROW-867 branch July 29, 2017 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants