New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: python-visible namespace for debugging #1875

Closed
jbrockmendel opened this Issue Sep 20, 2017 · 8 comments

Comments

Projects
None yet
4 participants
@jbrockmendel

jbrockmendel commented Sep 20, 2017

I've spent untold hours in the following loop:

  1. python setup.py build_ext --inplace
  2. import pandas as pd
  3. ImportError: pandas/_libs/tslibs/_testing.so: undefined symbol: parse_iso_8601_datetime
  4. Yell profanity at my keyboard

So the feature I'm requesting is a way to determine what symbols are defined. Or near-equivalently: what is available for cimport?

For a specific example, take this extern from pandas._libs.hashtable:

cdef extern from "numpy/npy_math.h":
    double NAN "NPY_NAN"

I'd like to be able to accomplish this with cimport, and a way to figure out where to cimport it from.

I'll volunteer to put together a PR, need some guidance on where to start.

@robertwb

This comment has been minimized.

Show comment
Hide comment
@robertwb

robertwb Sep 20, 2017

Contributor
Contributor

robertwb commented Sep 20, 2017

@jbrockmendel

This comment has been minimized.

Show comment
Hide comment
@jbrockmendel

jbrockmendel Sep 20, 2017

What symbols are defined in where? Where do you expect parse_iso_8601_datetime to be defined?

Trial and error with many variants and lots of very similar names means I'm going to have a hard time exactly replicating the exact errors from earlier. Rather than give a kinda-sorta example, I'll hold onto this until next time this comes up and I can produce a real case.

Update Real life example: I cannot for the life of me figure out how to cimport PyExc_ValueError

though for numpy in particular we ship our own numpy.pxd so you can just do "from numpy cimport NAN" or whatever

The "or whatever" bit is the part that could be made more python-friendly. This might be an easy case of what I have in mind. Imagine an interactive session:

>>> import cython
>>> dir(cython.numpy)
["import_array", "ndarray", "int64_t", ...]

FWIW, it's also recommended that one use cythonize so that when pxd files [...]

Yah, there's been a couple of threads over at pandas recently regarding the specification of dependencies in the setup.py file. The cython files seem pretty well-behaved in terms of cache-invalidation, but there are some .c and .h files that seem fragile.

(Incidentally, that's why I was tinkering with parse8601, trying to implement https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/datetime/np_datetime_strings.c#L377 in cython. It came out something like 10x slower than the original, which probably indicates a good stretch of learning curve ahead)

jbrockmendel commented Sep 20, 2017

What symbols are defined in where? Where do you expect parse_iso_8601_datetime to be defined?

Trial and error with many variants and lots of very similar names means I'm going to have a hard time exactly replicating the exact errors from earlier. Rather than give a kinda-sorta example, I'll hold onto this until next time this comes up and I can produce a real case.

Update Real life example: I cannot for the life of me figure out how to cimport PyExc_ValueError

though for numpy in particular we ship our own numpy.pxd so you can just do "from numpy cimport NAN" or whatever

The "or whatever" bit is the part that could be made more python-friendly. This might be an easy case of what I have in mind. Imagine an interactive session:

>>> import cython
>>> dir(cython.numpy)
["import_array", "ndarray", "int64_t", ...]

FWIW, it's also recommended that one use cythonize so that when pxd files [...]

Yah, there's been a couple of threads over at pandas recently regarding the specification of dependencies in the setup.py file. The cython files seem pretty well-behaved in terms of cache-invalidation, but there are some .c and .h files that seem fragile.

(Incidentally, that's why I was tinkering with parse8601, trying to implement https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/datetime/np_datetime_strings.c#L377 in cython. It came out something like 10x slower than the original, which probably indicates a good stretch of learning curve ahead)

@cython-notifications

This comment has been minimized.

Show comment
Hide comment
@cython-notifications

cython-notifications Sep 20, 2017

cython-notifications commented Sep 20, 2017

@jbrockmendel

This comment has been minimized.

Show comment
Hide comment
@jbrockmendel

jbrockmendel Sep 20, 2017

You shouldn't ever need [PyExc_ValueError] from Cython, just use the builtin name ValueError

The context of this was trying to match the original parse_iso_8601_datetime more closely. It has:

parse_error:
    PyErr_Format(PyExc_ValueError,
                 "Error parsing datetime string \"%s\" at position %d", str,
                 (int)(substr - str));
    return -1;

So i tried defining:

cdef int parse_error(unsigned char* dstr, int position):
    raise ValueError("Error parsing datetime string \"%s\" at position %d",
                     <str>dstr,
                     position)
    #return -1

(Tried various other types for dstr, let's skip over that for the moment) This works fine until I try to tack nogil onto the definition, at which point I get compile-time errors Converting to Python object not allowed without gil. So the thought was more closely matching the original might help.

Yeah, that'd probably be a tough one to replicate or beat starting from
scratch in plain old C.

The actual goal (ease of packaging and dependency specification) would be accomplished if I could just drop the original C version into a pyx file. Is that an option?

jbrockmendel commented Sep 20, 2017

You shouldn't ever need [PyExc_ValueError] from Cython, just use the builtin name ValueError

The context of this was trying to match the original parse_iso_8601_datetime more closely. It has:

parse_error:
    PyErr_Format(PyExc_ValueError,
                 "Error parsing datetime string \"%s\" at position %d", str,
                 (int)(substr - str));
    return -1;

So i tried defining:

cdef int parse_error(unsigned char* dstr, int position):
    raise ValueError("Error parsing datetime string \"%s\" at position %d",
                     <str>dstr,
                     position)
    #return -1

(Tried various other types for dstr, let's skip over that for the moment) This works fine until I try to tack nogil onto the definition, at which point I get compile-time errors Converting to Python object not allowed without gil. So the thought was more closely matching the original might help.

Yeah, that'd probably be a tough one to replicate or beat starting from
scratch in plain old C.

The actual goal (ease of packaging and dependency specification) would be accomplished if I could just drop the original C version into a pyx file. Is that an option?

@scoder

This comment has been minimized.

Show comment
Hide comment
@scoder

scoder Sep 20, 2017

Contributor

This discussion seems to derail a little, although it would make an excellent thread on the cython-users list. I would suggest to migrate it there, and if anything suitable comes out of it that motivates this ticket, copy the relevant comments back here.

Contributor

scoder commented Sep 20, 2017

This discussion seems to derail a little, although it would make an excellent thread on the cython-users list. I would suggest to migrate it there, and if anything suitable comes out of it that motivates this ticket, copy the relevant comments back here.

@jbrockmendel

This comment has been minimized.

Show comment
Hide comment
@jbrockmendel

jbrockmendel Sep 20, 2017

Good call, will do.

jbrockmendel commented Sep 20, 2017

Good call, will do.

@jbrockmendel

This comment has been minimized.

Show comment
Hide comment
@jbrockmendel

jbrockmendel Sep 21, 2017

Not quite following what you mean by cython.numpy.

This part is actually pertinent to the original topic. cython.numpy was an example, the idea being that it would basically just implement __dir__ to make it easy for beginners to figure out from the interpreter what can be cimported.

With things like np.get_include() it can be non-obvious what is available in the namespace, and errors can pass silently.

An example of a silent namespace error: For quite a while I had a branch of datetime.pxd where I tried to extern-import "get_datetimestruct_days" as:

cdef extern from "numpy/ndarrayobject.h":
    npy_int64 get_datetimestruct_days(const npy_datetimestruct *dts)

... a name which I now know is not exported in that namespace (are "exported" and "namespace" the right terminology here? Let me know if the meaning is not clear). It did not cause an error because the same function was exported by pandas' src/datetime/np_datetime.h, and while I still had a cdef extern from "datetime/np_datetime.h": block (even though get_datetimestruct_days was not referenced in this block).

To recap the original issue: it would be awesome to have a python-facing way of grokking namespaces. I'll offer labor if there's an interest in making this happen.

jbrockmendel commented Sep 21, 2017

Not quite following what you mean by cython.numpy.

This part is actually pertinent to the original topic. cython.numpy was an example, the idea being that it would basically just implement __dir__ to make it easy for beginners to figure out from the interpreter what can be cimported.

With things like np.get_include() it can be non-obvious what is available in the namespace, and errors can pass silently.

An example of a silent namespace error: For quite a while I had a branch of datetime.pxd where I tried to extern-import "get_datetimestruct_days" as:

cdef extern from "numpy/ndarrayobject.h":
    npy_int64 get_datetimestruct_days(const npy_datetimestruct *dts)

... a name which I now know is not exported in that namespace (are "exported" and "namespace" the right terminology here? Let me know if the meaning is not clear). It did not cause an error because the same function was exported by pandas' src/datetime/np_datetime.h, and while I still had a cdef extern from "datetime/np_datetime.h": block (even though get_datetimestruct_days was not referenced in this block).

To recap the original issue: it would be awesome to have a python-facing way of grokking namespaces. I'll offer labor if there's an interest in making this happen.

@robertwb

This comment has been minimized.

Show comment
Hide comment
@robertwb

robertwb Oct 13, 2017

Contributor

If you try to cimport something that doesn't exist, you'll get a Cython compile error. Likewise, if you try to cdef extern declare something that doesn't exist in the .c or .h files, you'll get a C compile error (unless of course the .h file declared a symbol but you didn't link against the library providing that symbol, but there's nothing you can do about that, that's how C works).

I'm going to close this because it's still not clear what you're asking for. Feel free to take it up again on the list.

Contributor

robertwb commented Oct 13, 2017

If you try to cimport something that doesn't exist, you'll get a Cython compile error. Likewise, if you try to cdef extern declare something that doesn't exist in the .c or .h files, you'll get a C compile error (unless of course the .h file declared a symbol but you didn't link against the library providing that symbol, but there's nothing you can do about that, that's how C works).

I'm going to close this because it's still not clear what you're asking for. Feel free to take it up again on the list.

@robertwb robertwb closed this Oct 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment