Skip to content

ParquetFileReader does not close previous PageReadStore on readNextRowGroup() or on close() #3487

@arouel

Description

@arouel

Describe the bug, including details regarding any error messages, version, and platform.

ParquetFileReader stores a reference to the last returned ColumnChunkPageReadStore in currentRowGroup, but never closes it:

  1. readNextRowGroup() (line 1153) overwrites this.currentRowGroup = rowGroup without closing the previous instance.
  2. readNextFilteredRowGroup() (line 1409) does the same.
  3. close() (line 1816-1827) does not close currentRowGroup at all, it only closes the input stream, dictionary reader, and codec factory.

ColumnChunkPageReadStore.close() releases the ByteBufferReleaser that holds both the compressed file I/O buffers (from ConsecutivePartList.readAll()) and any off-heap decompressed page buffers (from the useOffHeapDecryptBuffer path). Since close() is never called, these buffers are never released.
With the default HeapByteBufferAllocator this is masked by GC because HeapByteBufferAllocator.release() is a no-op. With a direct ByteBufferAllocator, this becomes a hard native memory leak that grows with every row group read.
Note that InternalParquetRecordReader works around this by manually calling currentRowGroup.close() before each readNextRowGroup() (line 134-135) and in its own close() (line 164-167). Other direct callers of ParquetFileReader (e.g., ParquetRewriter, ColumnIndexValidator) also close the PageReadStore themselves. However, any caller that does not manually close the returned PageReadStore will leak buffers.

Expected behavior

ParquetFileReader should close the previous currentRowGroup before assigning a new one in readNextRowGroup() / readNextFilteredRowGroup(), and close the final currentRowGroup in its own close() method. This matches the lifecycle that InternalParquetRecordReader implements manually.

Error messages

No error is thrown. The buffers silently leak. With a direct allocator, the native memory grows unboundedly until the process is killed or the allocator is explicitly closed.

Version

1.17.0 (older versions are also affected)

Component(s)

Core

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions