Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running multiarray with JyNI #6

Closed
lokuz opened this issue Mar 23, 2016 · 13 comments
Closed

Running multiarray with JyNI #6

lokuz opened this issue Mar 23, 2016 · 13 comments

Comments

@lokuz
Copy link

lokuz commented Mar 23, 2016

Hi,

I am trying to import multiarray as a next step towards numpy support. As test I use the following snippet:


import sys

sys.path.append('/usr/lib64/python2.7/lib-dynload')
sys.path.append('/usr/lib64/python2.7/site-packages/numpy')
sys.path.append('/usr/lib64/python2.7/site-packages/numpy/core')

print 'before multiarray import'

import datetime

import multiarray

print 'after multiarray import'

a = multiarray.array([1, 2, 3])

print a

Running this script usually fails with something like the following besides some rare cases where it actually works: (I can also provide the hs_err files if needed.)

> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x00007f4998fe073c, pid=11479, tid=139954600363776
> #
> # JRE version: OpenJDK Runtime Environment (8.0_72-b15) (build 1.8.0_72-b15)
> # Java VM: OpenJDK 64-Bit Server VM (25.72-b15 mixed mode linux-amd64 compressed oops)
> # Problematic frame:
> # C  [libJyNI.so+0xe073c]  pushExStack+0xa9
> #
> # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /home/<user>/programming/jyni/JyNI/hs_err_pid11479.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
> 

This looks like some side effect due to some memory, which is overwritten. It looks like this is caused in the _PyImport_LoadDynamicModuleJy, when the "dyn load func" is called. I would appreciate any hints about this before I dig deeper into this.

Thanks,

Lothar

@Stewori
Copy link
Owner

Stewori commented Mar 23, 2016

Hello Lothar,
thanks for your interest in this project!

The error message states it fails in the method pushExStack, which relates to garbage collection.
"exStack" is short for exploration-stack and is a simple ad-hoc written extensible stack to store
exploration position when JyNI explores the native reference graph to mirror its connectivity in
Java. You can find this stuff in gcmodule.c. You might also want to take a look at my upcoming
EuroSciPy-paper "Garbage Collection in JyNI – How to bridge Mark/Sweep and Reference Counting GC"
which gives an overview of what is happening here including background motivation why such a complex
process is needed to support garbage collection properly. A preprint is available at
https://dl.dropboxusercontent.com/u/50299996/paper.pdf

The relation to gc is also an explanation to why it works in rare cases. I suppose it works if no
gc-run pops up during initialization (which is a somewhat non-deterministic event in the JVM).
So why are there still memory issues in JyNI's gc process you might wonder. A hot candidate for
failure is the clumpsy ability to distinguish between heap- and non-heap PyObjects, which needs
improvement. Currently this is just decided via a white-list approach in the macro
Is_Static_PyObject in JyNI.h. see the standing todo-comment there:

/* Todo Invent a flag or something to explicitly mark heap-allocated PyObjects.
 * The current solution (i.e. the macro below) fails if an extension defines its
 * own static singletons or whatever.
 * Checking for tp_dealloc == NULL is a good indicator for static objects, but
 * is not entirely safe - the extension might use some equivalent of none_dealloc.
 */
#define Is_Static_PyObject(pyObject) \
    (Is_StaticSingleton(pyObject) || Is_StaticTypeObject(pyObject) || !Has_Dealloc(pyObject))

In this sense the term "static" refers to "not heap-allocated" in the sense of static opposed to dynamic.
Maybe NumPy introduces some object where this macro fails or the issue is due to something completely different.

To continue digging I recommend to set up your own numpy-build, such that you can insert logging,
additional verbosity, out-comment stuff etc. In NumPy look at multiarraymodule.c, it contains a
method initmultiarray, which is the entry-method. A good first step would be to out-comment the entire content of the method and then in-comment it line by line or in a binary-search manner until you find the line where it starts getting unstable. (In case you can identify such a point, first come back here for discussion!)
Another approach is to add verbosity to pushExStack, e.g. print tp_name of the (object being pushed)'s type.

Currently I am busy with designing new-style class and custom Java-PyObject support for native side. I suppose this will be needed anyway to fully support the Python-part of NumPy. So I won't invest much time into multiarray front before JyNI alpha 4 is released, but of course you are more than welcome to dig here, do preparational work or maybe even get it working already.

@Stewori
Copy link
Owner

Stewori commented Mar 23, 2016

I also recommend to remove the last part

a = multiarray.array([1, 2, 3])

print a

from your test snippet until it reliably survives the import part, or does it already?

@lokuz
Copy link
Author

lokuz commented Mar 23, 2016

Hi Stefan,

thanks for your fast and detailed reply. It does actually finishes normal in 1% of the starts and outputs the given array. But the other cases it fails at different stages, but mostly with the posted error message.

I will compile my own numpy version and go this direction with the additional background knowledge from your paper.

Totally understand your priorities for new-style classes as it is clearly stated on the website as next goal.

@lokuz
Copy link
Author

lokuz commented Mar 24, 2016

Hi Stefan,

can you please take a look at the following function in gcmodule.c:

static void pushExStack(PyObject* op) {
    if (explorationStack.position < EX_STACK_BLOCK_SIZE) {
        explorationStack.stack[explorationStack.position++] = op;
    } else {
        Ex_Stack_Block* stack = explorationStack.next;
        if (stack && stack->position < EX_STACK_BLOCK_SIZE) {
            stack->stack[stack->position++] = op;
        } else {
            Ex_Stack_Block* newNext = (Ex_Stack_Block*) malloc(sizeof(Ex_Stack_Block));
            newNext->stack[newNext->position++] = op;
            newNext->next = stack;
            explorationStack.next = newNext;
        }
    }
}

I think after the memory allocation of newNext, it is not guranteed that newNext->position is zero. So this variable is not initialized at that point. Inserting a newNext->position = 0; oder something similar should fix this. After this change the multiarray python snippet above seems to be running ok.

But the topic you mentioned may still be an issue. I uncommented the line:
JyNI-Warning: JyNI_GC_Explore found non-heap object.
and get plenty of these warnings. But I have to further read your paper to understand this topic better.

@Stewori
Copy link
Owner

Stewori commented Mar 24, 2016

it is not guranteed that newNext->position is zero

Argh, right. Stuff like this was also main reason why JyNI did not work on OSX out of the box.
My setup (LMDE with gcc) happens to usually supply zeroed memory in malloc, which is why I easily overlook such issues. Are you using OSX?

Feel encouraged to file a pull request for this fix.

Regarding the non-heap object warning:

Check whether you restored the warning properly. The context is:

if (!Is_Static_PyObject(toExplore)) {
//  jputs("JyNI-Warning: JyNI_GC_Explore found non-heap object.");
//  jputs(Py_TYPE(toExplore)->tp_name);
//} else {
    if (IS_UNEXPLORED(toExplore)) {
        JyNI_GC_ExploreObject(toExplore);
    }
}

You might notice the out-commented "else", which indicates that when I removed the warning I also added the '!' to Is_Static_PyObject(toExplore). With restored warning the code must be as follows:

if (Is_Static_PyObject(toExplore)) {
    jputs("JyNI-Warning: JyNI_GC_Explore found non-heap object.");
    jputs(Py_TYPE(toExplore)->tp_name);
} else {
    if (IS_UNEXPLORED(toExplore)) {
        JyNI_GC_ExploreObject(toExplore);
    }
}

(Yes (and sorry for that), code clean-up is overdue; I target this for beta-phase.)

If you already did it this way, note that it is actually okay to explore non-heap objects as long as the macro Is_Static_PyObject is recognizing them correctly. The warning is intended to investigate issues that might ground on failure of Is_Static_PyObject.

There is no reliable way in C to detect whether a pointer points to heap or static memory. I thought about making PyObject_Malloc flag all heap objects, but where to place the flag? PyObject headers don't contain space for flags, the type features some CPython specific flags. Maybe there could be room for a JyNI-specific flag (looking at object.h of Python 2.7 there are some unused flag positions, e.g. 1L<<11, 1L<<16, 1L<<22 and in some sense also 1L<<2). But remember that extensions might do really odd stuff. What if the same type is used for heap- and non-heap objects. Then a flag in the type would never be sufficient.
Another idea would build on the assumption that satic memory positions are always before heap positions. Then PyObject_Malloc could just store the lowest memory position ever created in a global field and Is_Static_PyObject could check the given pointer against it. However in general it is discouraged to make any assumptions about malloc behavior since it is implementation specific. Storing all created heap-pointers in a hash map or something would be a safe solution, but I consider this too expensive.
I guess I will go for the type-flag once I resume work on this front and then look whether an extension occurs that manages to break this approach.

@lokuz
Copy link
Author

lokuz commented Mar 24, 2016

The way of uncommenting was wrong, thanks for pointing that out! The warnings are now gone, but I will keep this open issue in general in my head.

My setup is "openSUSE Leap 42.1 (x86_64)".

As next step I will check, if the methods of multiarray work.

@lokuz
Copy link
Author

lokuz commented Mar 29, 2016

Just an update on this. Multiarray seems to be working fine with the current release of multiarray in numpy, but the precompiled package in Suse Linux does not work together with JyNI. In the latter case severe memory crashes are happening at random locations. The origin (JyNI vs. numpy) for these could not be identified. Some broken strings might indicate that at some point string arrays might be handled with insufficient memory.

An analysis with valgrind is confirming "conditional jump or move depends on uninitialized value(s)”. Since there are many other valgrind messages from the jvm, further analysis is not pursued until the errors are confirmed with a different setup.

These topic seems to be independent of the multiarray library. Therefore one could close the issue regarding its original intend.

@lokuz lokuz closed this as completed Mar 29, 2016
@Stewori
Copy link
Owner

Stewori commented Apr 13, 2016

So; finally I found time to actually focus on this issue. First note that "today's" JyNI incorporates some fixes (also regarding memory flaws) compared to the version you used some weeks ago. So it's actually a new game.

On my system (LMDE2, 64 bit) importing the multiarray module from numpy from package management works fine. (There is an issue with datetime though, because it is statically linked into bundled Python 2. So I had to provide datetime.so form a self-built Python 2. Given that Python 2 distributions with statically linked datetime are around, maybe JyNI should link it statically too, but that's another story.)

However the next line
a = multiarray.array([1, 2, 3])
fails with symbol lookup error:
undefined symbol: PyMemoryView_FromObject

This essentially means support for PyMemoryView builtin type is required, which in turn isn't feasible without supporting the buffer protocol. We will get there, but it will target JyNI 2.7-alpha.5.

Interesting that you didn't get that error. The package manager reports my installed NumPy as python-numpy 1:1.8.2-2.

@Stewori Stewori reopened this Apr 13, 2016
@Stewori
Copy link
Owner

Stewori commented Apr 13, 2016

Just repeated it with self-built current NumPy cloned from github/numpy.
This had another symbol error on importing multiarray. It was missing Py_OptimizeFlag, which is easy to fix. Will commit this fix soon. Adding this flag results in the same behavior I described before with NumPy from package manager.

@lokuz
Copy link
Author

lokuz commented Apr 18, 2016

The multiarray works the same way as before for me i.e. it is working with the self built NumPy version from github/numpy.

@Stewori
Copy link
Owner

Stewori commented Apr 18, 2016

Interesting. I investigated where PyMemoryView_FromObject is used and also added a dummy-implementation to JyNI like this:

PyObject *
PyMemoryView_FromObject(PyObject *base)
{
    return NULL;
}

With this dummy implementation the demo script works fine. However I still started to trace where NumPy requires this method on my system and it is called indirectly from PyArray_FromAny (ctors.c, 1673) via PyArray_GetArrayParamsFromObject (ctors.c, 1427) and _array_from_buffer_3118 (ctors.c, 1264).

All of these methods contain various if/else etc branching statements so maybe on your system the call just doesn't happen for some reason. I think it's tedious to find the exact cause for this and I'd better spend my time on providing a proper implementation of PyMemoryView_FromObject. Support for the buffer protocol will be needed sooner or later anyway.

@Stewori Stewori added this to the BufferProtocol milestone May 25, 2016
@Stewori
Copy link
Owner

Stewori commented Jun 30, 2016

See #9 (comment) for this issue too.

@Stewori Stewori added the fixed label Aug 23, 2016
@Stewori
Copy link
Owner

Stewori commented Aug 23, 2016

Given that JyNI significantly changed since the last activity here and I cannot reproduce this issue any more for two months or so now, I will close it.
If you encounter problems again, feel free to re-open.

@Stewori Stewori closed this as completed Aug 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants