Skip to content

Data import methods free memory #763

@pepijnve

Description

@pepijnve

The various import methods in the Data class are documented to transfer ownership of the Arrow data from the input object to internal values as per the C FFI. If I'm reading https://arrow.apache.org/docs/format/CDataInterface.html#moving-an-array correctly, implementations are supposed to hollow out the input object in that case by nulling the release field. In practice a class like ArrayImporter does this

void importArray(ArrowArray src) {
  ...
  // Move imported array
  ArrowArray ownedArray = ArrowArray.allocateNew(allocator);
  ownedArray.save(snapshot);
  src.markReleased();
  src.close();
  ...
}

So not only is the array marked as released, it is also closed so the ArrowArray instance is no longer usable. This may or may not happen when you call Data#importIntoVector for instance, since that might abort early when given a null Allocator which makes the post conditions of Data#importIntoVector a bit unclear. You kind of expect the ArrowArray to have been closed, but you still need to call close yourself as well or risk having a memory leak.

What's the rationale behind the choice to also call close on objects passed to Data rather than expecting people to use try-with-resources/try-finally themselves?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions