Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coverage Plugin causes segfault #2879

Closed
jbrockmendel opened this issue Mar 3, 2019 · 4 comments
Closed

Coverage Plugin causes segfault #2879

jbrockmendel opened this issue Mar 3, 2019 · 4 comments

Comments

@jbrockmendel
Copy link
Contributor

Just compiled pandas with cython coverage enabled to test #2871 and got a segfault during the test suite. This occurs in both master and in 0.29.1, so is not unique to #2871.

The test that segfaults is tests/arithmetic/test_datetime64.py TestDatetime64DateOffsetArithmetictest_dt64arr_add_sub_DateOffsets[Index-0-True-BusinessDay]. I'll see if I can track down a smaller piece of code that causes the segfault. I can reproduce it with

import pandas as pd
from pandas._libs.tslibs import period as libperiod

ts = pd.Timestamp('2000-01-05 00:15:00')
vec = pd.DatetimeIndex([ts])
asper = vec.to_period('B')._data
libperiod.periodarr_to_dt64arr(asper.asi8, 5000)

period_to_dt64arr is mostly a for-loop calling period_ordinal_to_dt64, but the latter does not segfault when called on its own:

libperiod.period_ordinal_to_dt64(asper.asi8[0], 5000)

The offending function:

@cython.wraparound(False)
@cython.boundscheck(False)
def periodarr_to_dt64arr(int64_t[:] periodarr, int freq):
    """
    Convert array to datetime64 values from a set of ordinals corresponding to
    periods per period convention.
    """
    cdef:
        int64_t[:] out
        Py_ssize_t i, l

    l = len(periodarr)

    out = np.empty(l, dtype='i8')

    with nogil:
        for i in range(l):
            if periodarr[i] == NPY_NAT:
                out[i] = NPY_NAT
                continue
            out[i] = period_ordinal_to_dt64(periodarr[i], freq)

    return out.base  # .base to access underlying np.ndarray
scoder added a commit that referenced this issue Mar 3, 2019
@scoder
Copy link
Contributor

scoder commented Mar 3, 2019

Thanks for the report. Looking through the generated code, I could apply a tiny fix to the error handling in the nogil tracing code, although it's unlikely that this was causing your crash since it should only apply to errors raised by the trace function itself.

In general, tracing inside of nogil blocks is rather heavy since it needs to acquire and release the GIL around each trace call, which, in the worst case, might also have a behavioural impact. Again, not likely in the given code. Also, it looks like you are not using the fast_gil directive in pandas, so that's probably unrelated as well.

I don't see anything else that might be wrong here, so stripping down the example to a self-contained reproducer would help to understand the problem.

@jbrockmendel
Copy link
Contributor Author

so stripping down the example to a self-contained reproducer would help to understand the problem.

I'll give this a go.

@jbrockmendel
Copy link
Contributor Author

OK, I've got a partially stripped-down example and a decent guess as to where the problem lies.

TL;DR: several functions in period.pyx that should be marked as nogil are not, and somehow this isn't noticed unless coverage is enabled.

Breaking down period_to_dt64arr into its constituent pieces and tearing out pieces not relevant to the example, we can use:

def check(int64_t[:] periodarr):
    cdef:
        ndarray[int64_t] out
        Py_ssize_t i, l
        asfreq_info af_info
        freq_conv_func toDaily = (<freq_conv_func>asfreq_BtoDT)  # <-- the important part, I think

    l = len(periodarray)
    out = np.empty(l, dtype='i8')

    with nogil:
       get_asfreq_info(5000, FR_DAY, True, &af_info)
       for i in range(l):
            out[i] = toDaily(periodarr[i], &af_info)

    return out

and to trigger the segfault we just need libperiod.check(np.array([0], dtype=np.int64))

freq_conv_func is declared as ctypedef int64_t (*freq_conv_func)(int64_t, asfreq_info*) nogil, but asfreq_BtoDT (and the one function it calls) aren't marked with nogil. Adding those marks tentatively fixes the segfault.

is this something we would expect cython to have noticed? Or should this be considered purely user (me) error?

@scoder scoder closed this as completed in 61505ce Mar 3, 2019
@scoder
Copy link
Contributor

scoder commented Mar 3, 2019

Thanks for your investigations. Then it's the cast of a function that requires the GIL into a `nogil' function. That is obviously a user error, but Cython should at least warn about it. I'm not sure if it should be an error, because a cast is the intended way for users to say "I know better", however unlikely that is in this case. So, I've made it a visible warning for now, with the perspective of making it an error in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants