Conversation
Greptile SummaryThis PR adds an Previous review threads raised three concerns that remain open in this revision: Confidence Score: 3/5Merge is blocked by the unresolved managed_ptr memory leak (every owning tensor creation leaks the heap-copied DLManagedTensor struct), which was raised in previous review threads but is not addressed here. The dangling-pointer bug from the first review round is fixed by heap-copying the struct, and the scoped-lifetime test correctly exercises that fix. However, two memory-safety issues from prior threads are still open: the allocation-before-assertion path leaks managed_ptr on exception, and the shared_ptr deleter never calls include/matx/core/make_tensor.h lines 927–936: managed_ptr allocation, assertion order, and missing delete in the shared_ptr deleter lambda. Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant make_tensor
participant HeapCopy as "DLManagedTensor (heap)"
participant SharedPtr as "shared_ptr deleter"
participant DLDeleter as "DLPack deleter"
Caller->>make_tensor: make_tensor(tensor, dlp, owning=true)
make_tensor->>HeapCopy: new DLManagedTensor{dlp}
make_tensor->>SharedPtr: shared_ptr(data_ptr, lambda[managed_ptr])
make_tensor->>Caller: tensor.Shallow(tmp)
Note over Caller: original dlp may go out of scope
Caller->>SharedPtr: tensor destroyed (refcount → 0)
SharedPtr->>DLDeleter: managed_ptr->deleter(managed_ptr)
DLDeleter-->>SharedPtr: frees CUDA memory, shape, strides, ctx
Note over HeapCopy: managed_ptr struct itself is NOT deleted (leak)
Reviews (3): Last reviewed commit: "Redundant arrays" | Re-trigger Greptile |
| index_t strides[TensorType::Rank()]; | ||
| index_t shape[TensorType::Rank()]; | ||
| cuda::std::array<index_t, TensorType::Rank()> shape_arr{}; | ||
| cuda::std::array<index_t, TensorType::Rank()> strides_arr{}; | ||
|
|
||
| for (int r = 0; r < TensorType::Rank(); r++) { | ||
| strides[r] = dt.strides[r]; | ||
| shape[r] = dt.shape[r]; | ||
| strides_arr[r] = strides[r]; | ||
| shape_arr[r] = shape[r]; | ||
| } |
There was a problem hiding this comment.
strides[]/shape[] (raw arrays) and strides_arr/shape_arr (cuda::std::array) hold identical data. The raw arrays are only consumed by the non-owning branch (line 922–923) and the cuda::std::arrays only by the owning branch (line 928). You can simplify by populating just one set per branch, e.g., initialize shape_arr/strides_arr once and use them for both paths, dropping the duplicate raw arrays.
| auto managed_ptr = new DLManagedTensor{dlp_tensor}; | ||
| MATX_ASSERT_STR(managed_ptr->deleter != nullptr, matxInvalidParameter, | ||
| "Owning DLPack tensor requires a non-null deleter"); |
There was a problem hiding this comment.
Memory leak of
managed_ptr on assertion failure
managed_ptr is heap-allocated before the null check. In debug builds where MATX_THROW raises an exception, if managed_ptr->deleter == nullptr, the exception unwinds the stack and managed_ptr is never freed. Moving the check before the allocation avoids the leak entirely.
| auto managed_ptr = new DLManagedTensor{dlp_tensor}; | |
| MATX_ASSERT_STR(managed_ptr->deleter != nullptr, matxInvalidParameter, | |
| "Owning DLPack tensor requires a non-null deleter"); | |
| MATX_ASSERT_STR(dlp_tensor.deleter != nullptr, matxInvalidParameter, | |
| "Owning DLPack tensor requires a non-null deleter"); | |
| auto managed_ptr = new DLManagedTensor{dlp_tensor}; |
| [managed_ptr](typename TensorType::value_type *) { | ||
| if (managed_ptr->deleter != nullptr) { | ||
| managed_ptr->deleter(managed_ptr); | ||
| } | ||
| }), |
There was a problem hiding this comment.
managed_ptr is never freed — persistent memory leak
The shared-pointer deleter calls managed_ptr->deleter(managed_ptr) but never delete managed_ptr. For DLPack producers whose deleters only clean up manager_ctx and nested allocations (e.g. PyTorch's ATenDLMTensor pattern, the test's DlpackOwnershipDeleter), the heap-allocated copy created on line 930 is never returned to the allocator. The test itself demonstrates the problem: DlpackOwnershipDeleter frees shape, strides, and manager_ctx but does not delete self, so managed_ptr leaks on every owning tensor lifetime. To prevent this without requiring every downstream producer to call delete self, wrap the copy in a struct that invokes the deleter from its destructor, then let the lambda delete the wrapper:
struct OwnedDLMTensor {
DLManagedTensor tensor;
~OwnedDLMTensor() {
if (tensor.deleter) tensor.deleter(&tensor);
}
};
auto owned = new OwnedDLMTensor{.tensor = dlp_tensor};
auto storage = make_storage_from_shared_ptr<...>(
std::shared_ptr<...>(data_ptr,
[owned](...) { delete owned; }),
desc.TotalSize());This guarantees the heap allocation is freed regardless of what the producer's deleter does with self.
No description provided.