Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][C++] Memory leak when converting to arrow dtype #39599

Closed
seberg opened this issue Jan 14, 2024 · 2 comments · Fixed by #39636
Closed

[Python][C++] Memory leak when converting to arrow dtype #39599

seberg opened this issue Jan 14, 2024 · 2 comments · Fixed by #39636

Comments

@seberg
Copy link
Contributor

seberg commented Jan 14, 2024

Describe the bug, including details regarding any error messages, version, and platform.

I was running valgrind on some code which uses pyarrow as a dependency. This was leaking NumPy dtypes. Now the issue could be downstream, but this line (and similar ones) to me look like they will leak the NumPy descriptors:

RETURN_NOT_OK(NumPyDtypeToArrow(PyArray_DescrFromScalar(obj), &numpy_type));

I don't see how the NumPy descriptor would ever be decref'd. The fix would seem to first assign it to an ownedRef? For some dtypes that may just leak a reference, but not for all (reference leak being only a nuisance when debugging references).

This like not often a problem deal in practice, although it could imagine it being a nuisance some in some very long running code. I am not planning on running such leaks checks regularly right now, so it shouldn't affect me really.

The reported origin of the leaked memory is:
==1392== 136 (+136) (96 (+96) direct, 40 (+40) indirect) bytes in 1 (+1) blocks are definitely lost in new loss record 146,562 of 186,418
==1392==    at 0x4849724: malloc (vg_replace_malloc.c:442)
==1392==    by 0x23186A: UnknownInlinedFun (obmalloc.c:685)
==1392==    by 0x23186A: _PyObject_New (object.c:183)
==1392==    by 0xA69EA92: PyArray_DescrNew (in .../python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
==1392==    by 0xA69EBEA: PyArray_DescrNewFromType (in .../python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
==1392==    by 0xA7461A9: PyArray_DescrFromScalar.part.0 (in .../python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
==1392==    by 0x1692324F: arrow::py::(anonymous namespace)::PyPrimitiveConverter<arrow::TimestampType, void>::Append(_object*) (in .../python3.10/site-packages/pyarrow/libarrow_python.so)
==1392==    by 0x16901CF8: arrow::Status arrow::py::internal::VisitSequenceGeneric<arrow::py::internal::VisitSequence<arrow::py::(anonymous namespace)::PyConverter::Extend(_object*, long, long)::{lambda(_object*, bool*)#1}>(_object*, long, arrow::py::(anonymous namespace)::PyConverter::Extend(_object*, long, long)::{lambda(_object*, bool*)#1}&&)::{lambda(_object*, long, bool*)#1}>(_object*, long, arrow::py::(anonymous namespace)::PyConverter::Extend(_object*, long, long)::{lambda(_object*, bool*)#1}&&) (in .../python3.10/site-packages/pyarrow/libarrow_python.so)
==1392==    by 0x16902034: arrow::py::(anonymous namespace)::PyConverter::Extend(_object*, long, long) (in .../python3.10/site-packages/pyarrow/libarrow_python.so)
==1392==    by 0x16928A0B: arrow::py::ConvertPySequence(_object*, _object*, arrow::py::PyConversionOptions, arrow::MemoryPool*) (in .../python3.10/site-packages/pyarrow/libarrow_python.so)
==1392==    by 0x16686BD1: __pyx_pw_7pyarrow_3lib_163scalar(_object*, _object* const*, long, _object*) (in .../python3.10/site-packages/pyarrow/lib.cpython-310-x86_64-linux-gnu.so)
==1392==    by 0x16550B54: __Pyx_CyFunction_CallAsMethod(_object*, _object*, _object*) (in .../python3.10/site-packages/pyarrow/lib.cpython-310-x86_64-linux-gnu.so)
==1392==    by 0x9F91098E: __Pyx_PyObject_Call(_object*, _object*, _object*) (in .../python3.10/site-packages/cudf/_lib/scalar.cpython-310-x86_64-linux-gnu.so)

Component(s)

Python

@raulcd raulcd changed the title Memory leak when converting to arrow dtype [Python][C++] Memory leak when converting to arrow dtype Jan 15, 2024
@raulcd
Copy link
Member

raulcd commented Jan 15, 2024

cc @jorisvandenbossche @pitrou

@pitrou
Copy link
Member

pitrou commented Jan 16, 2024

Thanks for the report @seberg . This does look like an issue indeed.

pitrou added a commit to pitrou/arrow that referenced this issue Jan 16, 2024
`PyArray_DescrFromScalar` returns a new reference, so we should be careful to decref it when we don't use it anymore.
jorisvandenbossche pushed a commit that referenced this issue Jan 17, 2024
### Rationale for this change

`PyArray_DescrFromScalar` returns a new reference, so we should be careful to decref it when we don't use it anymore.

### Are these changes tested?

No.

### Are there any user-facing changes?

No.
* Closes: #39599

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche added this to the 16.0.0 milestone Jan 17, 2024
idailylife pushed a commit to idailylife/arrow that referenced this issue Jan 18, 2024
…ache#39636)

### Rationale for this change

`PyArray_DescrFromScalar` returns a new reference, so we should be careful to decref it when we don't use it anymore.

### Are these changes tested?

No.

### Are there any user-facing changes?

No.
* Closes: apache#39599

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
clayburn pushed a commit to clayburn/arrow that referenced this issue Jan 23, 2024
…ache#39636)

### Rationale for this change

`PyArray_DescrFromScalar` returns a new reference, so we should be careful to decref it when we don't use it anymore.

### Are these changes tested?

No.

### Are there any user-facing changes?

No.
* Closes: apache#39599

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@pitrou pitrou modified the milestones: 16.0.0, 15.0.1 Feb 14, 2024
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…ache#39636)

### Rationale for this change

`PyArray_DescrFromScalar` returns a new reference, so we should be careful to decref it when we don't use it anymore.

### Are these changes tested?

No.

### Are there any user-facing changes?

No.
* Closes: apache#39599

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
raulcd pushed a commit that referenced this issue Feb 20, 2024
### Rationale for this change

`PyArray_DescrFromScalar` returns a new reference, so we should be careful to decref it when we don't use it anymore.

### Are these changes tested?

No.

### Are there any user-facing changes?

No.
* Closes: #39599

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Feb 28, 2024
…ache#39636)

### Rationale for this change

`PyArray_DescrFromScalar` returns a new reference, so we should be careful to decref it when we don't use it anymore.

### Are these changes tested?

No.

### Are there any user-facing changes?

No.
* Closes: apache#39599

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
…ache#39636)

### Rationale for this change

`PyArray_DescrFromScalar` returns a new reference, so we should be careful to decref it when we don't use it anymore.

### Are these changes tested?

No.

### Are there any user-facing changes?

No.
* Closes: apache#39599

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants