What happens?
FunctionCall() passes a newly created tuple directly to PyObject_CallObject():
File: src/map.cpp
Function: FunctionCall
auto *df_obj = PyObject_CallObject(function, PyTuple_Pack(1, in_df.ptr()));
PyTuple_Pack() returns a new reference, while PyObject_CallObject() does
not steal its args reference. Because the tuple is not stored in a local
variable, it is never passed to Py_DECREF().
As a result, every invocation leaks one tuple. The tuple also owns a reference
to in_df, so the input pandas DataFrame remains alive after FunctionCall()
returns. This occurs on both successful and failed calls.
The function is used during bind-time schema inference and query execution, so
the leak is reachable through ordinary DuckDBPyRelation.map() operations.
The handling of df_obj is unrelated and correct:
auto df = py::reinterpret_steal<py::object>(df_obj);
PyObject_CallObject() returns a new reference on success, which
reinterpret_steal() adopts.
To Reproduce
This issue can be confirmed directly from the reference ownership in
src/map.cpp.
In FunctionCall(), the argument tuple is created inline:
auto *df_obj = PyObject_CallObject(function, PyTuple_Pack(1, in_df.ptr()));
According to the CPython C API reference ownership rules:
PyTuple_Pack() returns a new reference.
PyObject_CallObject() does not steal the reference passed as args.
- The tuple pointer is not stored, so there is no subsequent
Py_DECREF() for that new reference.
- The tuple therefore leaks on every call and retains its reference to
in_df.
This issue is specific to the Python API and is not reproducible through plain
SQL in the DuckDB CLI.
OS:
x86_64
DuckDB Package Version:
latest version
Python Version:
3.12
Full Name:
Ksx
Affiliation:
SMU
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have not tested with any build
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
Did you include all relevant configuration to reproduce the issue?
What happens?
FunctionCall()passes a newly created tuple directly toPyObject_CallObject():File:
src/map.cppFunction:
FunctionCallPyTuple_Pack()returns a new reference, whilePyObject_CallObject()doesnot steal its
argsreference. Because the tuple is not stored in a localvariable, it is never passed to
Py_DECREF().As a result, every invocation leaks one tuple. The tuple also owns a reference
to
in_df, so the input pandas DataFrame remains alive afterFunctionCall()returns. This occurs on both successful and failed calls.
The function is used during bind-time schema inference and query execution, so
the leak is reachable through ordinary
DuckDBPyRelation.map()operations.The handling of
df_objis unrelated and correct:auto df = py::reinterpret_steal<py::object>(df_obj);PyObject_CallObject()returns a new reference on success, whichreinterpret_steal()adopts.To Reproduce
This issue can be confirmed directly from the reference ownership in
src/map.cpp.In
FunctionCall(), the argument tuple is created inline:According to the CPython C API reference ownership rules:
PyTuple_Pack()returns a new reference.PyObject_CallObject()does not steal the reference passed asargs.Py_DECREF()for that new reference.in_df.This issue is specific to the Python API and is not reproducible through plain
SQL in the DuckDB CLI.
OS:
x86_64
DuckDB Package Version:
latest version
Python Version:
3.12
Full Name:
Ksx
Affiliation:
SMU
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have not tested with any build
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
Did you include all relevant configuration to reproduce the issue?