Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V1.63 boost::python::numpy ndarray from_data method 'own' parameter usage #97

Open
abemammen opened this issue Jan 4, 2017 · 10 comments
Labels

Comments

@abemammen
Copy link

Not sure how to use the 'own' parameter.
End result is that on the python side when the ndarray 'flags' are printed, I see:
OWNDATE: False
The returned array on python side is garbled (though works sometimes!).
Data on the C++ side of the ndarray (that is being returned to Python) is correct always.
Is this related to the 'own' parameter. Or some other scope issue?
Currently I have 'own' set to: py::object()

@stefanseefeld
Copy link
Member

If you mean the owner parameter, please see the docs in http://boostorg.github.io/python/doc/html/numpy/reference/ndarray.html.
If you suspect an issue please provide a self-contained & minimal test case (including detailed instructions on how you built it and what you observed) that allows others to reproduce your findings.

@abemammen
Copy link
Author

abemammen commented Jan 4, 2017

I've read that documentation but it isn't very clear what own should be or the side effects if 'ownership' is messed up. So I thought I would ask a high level question to see if the flag OWNDATA: False on the Python side would contribute to corrupt data.
I realize that is somewhat vague but figured I would ask a high level question before creating a test case that can be shared etc.

@abemammen
Copy link
Author

In case some one is interested...
It appears that the issue is with boost::python::bumpy::from_data().
I changed it using: boost::python::bumpy::empty() and then a std::copy from a vector<> and that fixed the problem with respect to Python being able to interpret the ndarray correctly.
Interestingly, for what it's worth:
for the from_data() case, on the Python side, Numpy reports OWNDATA: False while it is OWNDATE: True for the empty() case so perhaps data ownership has something to do with it.

I'll have to strip out a test case that's small enough (it will take a bit of time that I don't currently have) but if this is useful for someone, I'm attaching some of the relevant methods:

struct record {
char id[32];
double score;
};

BOOST_PYTHON_MODULE(_distancerank) {
using namespace boost::python;

Py_Initialize();
boost::python::numpy::initialize();

class_<CdistanceRank>("CdistanceRank", init<bool>())
	.def("ndistancerank", &CdistanceRank::NCosineRank)
	.enable_pickling()
;

}
void CdistanceRank::VectorToNDArray(vector& vec) {
py::tuple str1dtype = py::make_tuple("id", "S32");
py::tuple dbldtype = py::make_tuple("score", "f8");
py::list list_dtype;
list_dtype.append(str1dtype);
list_dtype.append(dbldtype);
npy::dtype dtype = npy::dtype(list_dtype);

/*
     // Failing case
this->mArray = npy::from_data(data,
							dtype,
							py::make_tuple(svec.size()),
							py::make_tuple(sizeof(double)),
							py::object());

*/

    // This works
this->mArray = npy::empty(py::make_tuple(vec.size()), dtype);
std::copy(vec.begin(), vec.end(), reinterpret_cast<record*>(this->mArray.get_data()));

}

npy::ndarray CdistanceRank::NCosineRank(npy::ndarray& pyfeaturematrix, py::list& pyidvector, npy::ndarray& pyrefvector) {
py::list slist;
vector scoreVector;

this->NPyMatrixToDistanceVector(pyfeaturematrix, pyidvector, pyrefvector, scoreVector);
sort(scoreVector.begin(), scoreVector.end(), greaterthan());
this->VectorToNDArray(scoreVector);

return this->mArray;

}

Environment:
Using Python 3.4
On Mac OS 10.11
Using Boost::Python v1.63.0

@stefanseefeld
Copy link
Member

Thanks ! I'll likewise try to look into this. If you pass object() as owner argument the array should definitely own its data (and thus report OWNDATA=True), so something isn't quite right...

@LukasBommes
Copy link

I am experiencing the exact same issue. Is there already a fix for this? Or does someone know a workaround for it?

@anands-repo
Copy link

I wonder what it means for the array to own its own data. Does it mean that if the array is passed to python, then I do not need to manage the data associated with the array in C++, and the array and the associated data will be correctly deleted without memory leaks when it goes out of scope in python?

@PhilippeCarphin
Copy link

PhilippeCarphin commented Aug 8, 2019

I have the cure for what ails you!

I will make a post on stack overflow asking the question and then answering it, but for now, here is a straightforward way to return an ndarray to python whose data has been allocated in C++, and it will own it's data (in the sense that it will be deallocated when nothing references the python object). And furthermore, it doesn't use the numpy C api, only Boost.Numpy.

This is pretty much exactly the same as the basic example usage of boost::python::numpy::ndarray::from_data() except for the capsule thing which was taken directly from 7starsea (whom I refer to as The Messiah). ndarray/Boost.NumPy#28 (comment)

typedef long int my_data_type;
inline void destroyManagerCObject(PyObject* self) {
    auto * b = reinterpret_cast<my_data_type*>( PyCapsule_GetPointer(self, NULL) );
    std::cout << "C++      : " << __PRETTY_FUNCTION__ << " delete [] " << b << std::endl;
    delete [] b;
}

boost::python::numpy::ndarray get_array_that_owns_through_capsule()
{
    // Change this to see how the adresses change.
    unsigned int last_dim = 6000;
    boost::python::object shape = boost::python::make_tuple(4, 5, last_dim);

    boost::python::numpy::dtype dt = boost::python::numpy::dtype::get_builtin<my_data_type>();

    auto * const data_ptr = new my_data_type[4*5*last_dim];

    const size_t s = sizeof(my_data_type);
    boost::python::object strides = boost::python::make_tuple(5*last_dim*s, last_dim*s, s);

    for(int i = 1; i <= 4*5*last_dim; ++i){ data_ptr[i-1] = i; }

    // This sets up a python object whose destruction will free data_ptr
    PyObject *capsule = ::PyCapsule_New((void *)data_ptr, NULL, (PyCapsule_Destructor)&destroyManagerCObject);
    boost::python::handle<> h_capsule{capsule};
    boost::python::object owner_capsule{h_capsule};

    std::cout << "C++      : " << __PRETTY_FUNCTION__ << "data_ptr = " << data_ptr << std::endl;

    return boost::python::numpy::from_data( data_ptr, dt, shape, strides, owner_capsule);
}

Export the function with boost and use it like this:

i = 0
while i < 4:
    print("PYTHON   : ---------------- While iteration ------------------- ({})".format(i))
    print("PYTHON   : BEFORE calling test_capsule_way()")
    arr = interface.get_array_that_owns_through_capsule()
    print("PYTHON   : AFTER calling test_capsule_way()")
    i += 1
    if i % 1000 == 0:
        print("PYTHON   : Nb arrays created and destroyed : {}".format(i))

    print("PYTHON   : ----------- End while iteration")
print("PYTHON   : SCRIPT END")

And it will produce the following output

PYTHON   : ---------------- While iteration ------------------- (0)
PYTHON   : BEFORE calling test_capsule_way()
C++      : boost::python::numpy::ndarray get_array_that_owns_through_capsule()data_ptr = 0x7fb7c9831010
PYTHON   : AFTER calling test_capsule_way()
PYTHON   : ----------- End while iteration
PYTHON   : ---------------- While iteration ------------------- (1)
PYTHON   : BEFORE calling test_capsule_way()
C++      : boost::python::numpy::ndarray get_array_that_owns_through_capsule()data_ptr = 0x7fb7c9746010
C++      : void destroyManagerCObject(PyObject*) free(0x7fb7c9831010)
PYTHON   : AFTER calling test_capsule_way()
PYTHON   : ----------- End while iteration
PYTHON   : ---------------- While iteration ------------------- (2)
PYTHON   : BEFORE calling test_capsule_way()
C++      : boost::python::numpy::ndarray get_array_that_owns_through_capsule()data_ptr = 0x14c9f20
C++      : void destroyManagerCObject(PyObject*) free(0x7fb7c9746010)
PYTHON   : AFTER calling test_capsule_way()
PYTHON   : ----------- End while iteration
PYTHON   : ---------------- While iteration ------------------- (3)
PYTHON   : BEFORE calling test_capsule_way()
C++      : boost::python::numpy::ndarray get_array_that_owns_through_capsule()data_ptr = 0x15b4530
C++      : void destroyManagerCObject(PyObject*) free(0x14c9f20)
PYTHON   : AFTER calling test_capsule_way()
PYTHON   : ----------- End while iteration
PYTHON   : SCRIPT END
C++      : void destroyManagerCObject(PyObject*) free(0x15b4530)

Notice that during the first iteration, the PyCapsule_Destructor is not called, it's when the second iteration happens, when we assign a new thing to the variable arr that the destructor is called, and notice that it frees the memory that was allocated for the previous array.

@PhilippeCarphin
Copy link

PhilippeCarphin commented Aug 11, 2019

If take the exact same code but we don't do the capsule thing and pass boost::python::object(), then doing the python code above with True instead of 4 in the while will cause the process to take up more and more memory until you C-c it.

Put the capsule back in and then we're back to having the data owned by the array.

Also this works but I wanted to interact only through the boost.Numpy objects if possible

boost::python::object get_numpy_array_owning_data()
{
    std::cout << "C++      : " << __PRETTY_FUNCTION__ << std::endl;
    int nd = 4;
    npy_intp npy_dims[] = {20,30,40,50};

    int * data_ptr = (int*)(malloc(20 * 30 * 40 * 50 * sizeof(int)));
    for(int i = 0; i < 20*30*40*50; ++i){
        data_ptr[i] = i;
    }

    std::cout << "C++      : " << __PRETTY_FUNCTION__ << "   Calling PyArray_SimpleNewFromData()" << std::endl;
    PyObject *array = PyArray_SimpleNewFromData(nd, npy_dims, NPY_INT32, (void*)(data_ptr));
    PyArray_ENABLEFLAGS((PyArrayObject *)array, NPY_ARRAY_OWNDATA);

    return boost::python::object(boost::python::handle<>(array));
}

int import_array_wrapper(){
    import_array();
    return 0;
}

using namespace boost::python;
BOOST_PYTHON_MODULE(THIS_PYTHON_MODULE_NAME)
{
    import_array_wrapper();
    def("get_numpy_array_owning_data", get_numpy_array_owning_data);
}

@PhilippeCarphin
Copy link

PhilippeCarphin commented Aug 12, 2019

This is a demonstration that passing boost::python::object() as the owner parameter (not own) of the from_data() function doesn't cause the array to own the data given to the function (in the sense that when the array is no longer referenced, nothing is freed.

#include <boost/python.hpp>
#include <iostream>
#include <boost/python/numpy.hpp>

typedef long int my_data_type;
boost::python::numpy::ndarray get_array_that_owns_through_default_object()
{
    // Change this to see how the adresses change.
    unsigned int last_dim = 6000;
    boost::python::object shape = boost::python::make_tuple(4, 5, last_dim);

    boost::python::numpy::dtype dt = boost::python::numpy::dtype::get_builtin<my_data_type>();

    auto * const data_ptr = new my_data_type[4*5*last_dim];

    const size_t s = sizeof(my_data_type);
    boost::python::object strides = boost::python::make_tuple(5*last_dim*s, last_dim*s, s);

    for(int i = 1; i <= 4*5*last_dim; ++i){ data_ptr[i-1] = i; }

    std::cout << "C++      : " << __PRETTY_FUNCTION__ << "data_ptr = " << data_ptr << std::endl;

    return boost::python::numpy::from_data( data_ptr, dt, shape, strides, boost::python::object());
}

using namespace boost::python;
BOOST_PYTHON_MODULE(THIS_PYTHON_MODULE_NAME)
{
    boost::python::numpy::initialize();
    def("get_array_that_owns_through_default_object", get_array_that_owns_through_default_object);
}
import numpy_default_object_way as interface
import os
import psutil

def get_process_memory_usage():
    process = psutil.Process(os.getpid())
    return process.memory_info().rss

one_mb = 1000000

MEMORY_MAX = 100 * one_mb
i = 0
while True:
    j = 4
    print("PYTHON   : ---------------- While iteration ------------------- ({})".format(i))
    print("PYTHON   : BEFORE calling test_capsule_way()")
    arr = interface.get_array_that_owns_through_default_object()
    print("PYTHON   : AFTER calling test_capsule_way()")
    i += 1
    if i % 1000 == 0:
        mem = get_process_memory_usage()
        if mem > MEMORY_MAX:
            print("PYTHON   : Bro chill with the memory, you're using {}MB over here!".format(mem/one_mb))
            quit()
        print("PYTHON   : Nb arrays created (and pretty sure not destroyed) : {}".format(i))

    print("PYTHON   : ----------- End while iteration\n")

print("PYTHON   : SCRIPT END")

And corresponding output from running the python script


PYTHON   : ---------------- While iteration ------------------- (998)
PYTHON   : BEFORE calling test_capsule_way()
C++      : boost::python::numpy::ndarray get_array_that_owns_through_default_object()data_ptr = 0x7f4a5dba3010
PYTHON   : AFTER calling test_capsule_way()
PYTHON   : ----------- End while iteration

PYTHON   : ---------------- While iteration ------------------- (999)
PYTHON   : BEFORE calling test_capsule_way()
C++      : boost::python::numpy::ndarray get_array_that_owns_through_default_object()data_ptr = 0x7f4a5dab8010
PYTHON   : AFTER calling test_capsule_way()
PYTHON   : Bro chill with the memory, you're using 993.378304MB over here!

@PastelDew
Copy link

PastelDew commented Jun 29, 2021

@PhilippeCarphin Thank you so much!! I solved my problem thanks to you!
Actually, I faced to the problem converting Mat of OpenCV to numpy of boost.
If somebody has same problem with me, the following codes will help you.

namespace py = boost::python;
namespace np = boost::python::numpy;

template<typename T>
inline void destroyManagerCObject(PyObject* self) {
	auto * b = reinterpret_cast<T*>( PyCapsule_GetPointer(self, NULL) );
	delete [] b;
}
	
np::ndarray ConvertMatToNDArray(const cv::Mat& mat) {
	py::tuple shape = py::make_tuple(mat.rows, mat.cols, mat.channels());
	py::make_tuple(mat.rows, mat.cols, mat.channels());
	np::dtype dt = np::dtype::get_builtin<uchar>();
	size_t depth = sizeof(uchar);
	PyObject *capsule = nullptr;
	switch(mat.depth()){
		case CV_8U:
			capsule = ::PyCapsule_New((void *)mat.data, NULL, (PyCapsule_Destructor)&destroyManagerCObject<uchar>);
			break;
		case CV_8S:
			dt = np::dtype::get_builtin<char>();
			depth = sizeof(char);
			capsule = ::PyCapsule_New((void *)mat.data, NULL, (PyCapsule_Destructor)&destroyManagerCObject<char>);
			break;
		case CV_16U:
			dt = np::dtype::get_builtin<uint16_t>();
			depth = sizeof(uint16_t);
			capsule = ::PyCapsule_New((void *)mat.data, NULL, (PyCapsule_Destructor)&destroyManagerCObject<uint16_t>);
			break;
		case CV_16S:
			dt = np::dtype::get_builtin<int16_t>();
			depth = sizeof(int16_t);
			capsule = ::PyCapsule_New((void *)mat.data, NULL, (PyCapsule_Destructor)&destroyManagerCObject<int16_t>);
			break;
		default: std::cout << "wrong dtype error" << std::endl; break;
	}
	py::handle<> h_capsule{capsule};
	py::object owner_capsule{h_capsule};
	
	py::tuple stride = py::make_tuple(mat.channels() * mat.cols * depth, mat.channels() * depth, depth);
	np::ndarray ndImg = np::from_data(mat.data, dt, shape, stride, owner_capsule);
	return ndImg;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants