Skip to content

arrow/scalar: *scalar.Extension does not implement Release()/Retain(), leaking storage through compute.ScalarDatum.Release() #827

@zeroshade

Description

@zeroshade

Describe the bug, including details regarding any error messages, version, and platform.

Surfaced while implementing #825 (closes #815). See the PR's "Out of scope" section and this comment.

*scalar.Extension does not implement Release() or Retain(). As a result, when compute.ScalarDatum.Release() is called on a *ScalarDatum whose Value is an extension scalar, the storage scalar's Release() is never invoked and any underlying buffers leak.

Mechanism

compute.ScalarDatum.Release() only forwards Release() if the wrapped scalar satisfies the unexported releasable interface (arrow/compute/datum.go#L122-L130):

type releasable interface {
    Release()
}

func (d *ScalarDatum) Release() {
    if v, ok := d.Value.(releasable); ok {
        v.Release()
    }
}

*scalar.Extension (arrow/scalar/scalar.go#L397-L467) defines value(), equals(), Validate(), ValidateFull(), CastTo(), and String() — but neither Release() nor Retain(). The type assertion in ScalarDatum.Release() therefore fails and the inner storage scalar is never released, even though the storage scalar itself (e.g. *scalar.FixedSizeList) carries a buffer-bearing array.

Peer scalar types implement the pair correctly by delegating to their inner storage. For example, *scalar.Binary (arrow/scalar/binary.go#L43-L53):

func (b *Binary) Retain() {
    if b.Value != nil {
        b.Value.Retain()
    }
}

func (b *Binary) Release() {
    if b.Value != nil {
        b.Value.Release()
    }
}

*scalar.List, *scalar.Struct, *scalar.Dictionary, *scalar.SparseUnion, *scalar.DenseUnion, and *scalar.RunEndEncoded all follow the same pattern in arrow/scalar/nested.go. *scalar.Extension is the lone outlier among scalar types whose storage holds buffers.

Affected paths

Anything that produces a *scalar.Extension and reaches ScalarDatum.Release(). Concretely in this repo:

Reproducer

Drive any of the affected paths with memory.NewCheckedAllocator and call Release() on the resulting *ScalarDatum. AssertSize(t, 0) will fail because the storage scalar's buffer is never released. (#825's TestLiteralToDatumIntervalYearToMonth deliberately uses memory.DefaultAllocator for exactly this reason and explicitly defers the leak fix to this issue.)

Expected behavior

*scalar.Extension should implement Release() and Retain() by delegating to s.Value when the inner storage scalar implements Releasable (arrow/scalar/scalar.go#L68-L71), mirroring *scalar.Binary and the nested scalars. Suggested implementation:

func (s *Extension) Retain() {
    if r, ok := s.Value.(Releasable); ok {
        r.Retain()
    }
}

func (s *Extension) Release() {
    if r, ok := s.Value.(Releasable); ok {
        r.Release()
    }
}

This is purely additive:

  • Storage scalars that don't implement Releasable (e.g. primitive numeric storage) become no-ops, matching today's behavior.
  • Storage scalars that do (e.g. *scalar.FixedSizeList for the interval extension types) get properly released — closing the leak on the existing *types.IntervalYearToMonthType / *types.IntervalDayType paths and on any future extension types whose storage carries buffers.
  • The Retain() half restores symmetry: today, a caller that takes ownership of an extension scalar has no way to bump the storage refcount.

A follow-up test using memory.NewCheckedAllocator with AssertSize(t, 0) around the interval extension paths in arrow/compute/exprs/exec_internal_test.go would lock in the fix.

Component(s)

Other (arrow/scalar, arrow/compute)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions