Skip to content

parquet/file: WriteBatchSpaced panics escape the API and silently discards commit-write errors #826

@zeroshade

Description

@zeroshade

Describe the bug, including details regarding any error messages, version, and platform.

Discovered while reviewing #824 (which fixes the related panic in file.NewParquetWriter).

Across all 8 generated *ColumnChunkWriter types in parquet/file/column_writer_types.gen.go, WriteBatch and WriteBatchSpaced are an asymmetric pair:

  • WriteBatch returns (valueOffset int64, err error) and opens with a defer recover() that converts any panic into a returned error via utils.FormatRecoveredError.
  • WriteBatchSpaced returns nothing (void signature), has no defer recover(), and silently discards the error returned by commitWriteAndCheckPageLimit.

This is the same class of "panic escapes the public API" bug as #820 (fixed in #824) but on a different code path.

Code

The asymmetry is most visible in Int32ColumnChunkWriter (column_writer_types.gen.go:66-143):

WriteBatch (line 66) — recovers panics, surfaces as err:

func (w *Int32ColumnChunkWriter) WriteBatch(values []int32, defLevels, repLevels []int16) (valueOffset int64, err error) {
    defer func() {
        if r := recover(); r != nil {
            err = utils.FormatRecoveredError("unknown error type", r)
        }
    }()
    ...
    w.doBatches(n, repLevels, func(offset, batch int64) {
        ...
        if err := w.commitWriteAndCheckPageLimit(batch, toWrite); err != nil {
            panic(err) // caught above
        }
        ...
    })
    return
}

WriteBatchSpaced (line 118) — no recover, no error return, silently swallows the commitWriteAndCheckPageLimit error:

func (w *Int32ColumnChunkWriter) WriteBatchSpaced(values []int32, defLevels, repLevels []int16, validBits []byte, validBitsOffset int64) {
    ...
    doBatches(int64(length), w.props.WriteBatchSize(), func(offset, batch int64) {
        ...
        w.commitWriteAndCheckPageLimit(batch, info.numSpaced()) // <-- error ignored
        ...
    })
}

commitWriteAndCheckPageLimit is defined in parquet/file/column_writer.go:264 as func (w *columnWriter) commitWriteAndCheckPageLimit(numLevels, numValues int64) error — the return value is meaningful.

The pattern is identical for the other 7 types — Int64, Int96, Float32, Float64, Boolean, ByteArray, FixedLenByteArray. The third method on each writer, WriteDictIndices, mirrors WriteBatch: it both returns error and uses defer recover(). So WriteBatchSpaced is the lone outlier.

Why this matters in practice

WriteBatchSpaced is reachable through public API surface — for example, (*pqarrow.FileWriter).WriteBuffered and the pqarrow array-encoder paths drive it for nullable columns. Any of the panic sources reachable from the inner closure (maybeCalculateValidityBits, writeLevelsSpaced, writeValuesSpaced, commitWriteAndCheckPageLimit returning a non-nil error in a context where the caller chose to panic, dictionary-fallback paths that panic(err) on bookkeeping failures) escapes the goroutine instead of surfacing as an error.

Concrete failure modes already in the package:

  • commitWriteAndCheckPageLimit returns an error → ignored entirely (the simpler bug)
  • Any helper called from the closure that panics on its own internal invariant (e.g. dictionary fallback's panic(err) calls in FallbackToPlain, or the panic(err) paths inside WriteDictionaryPage/drainBufferedDataPages) → propagates out of WriteBatchSpaced, past any caller that expected WriteBatch-style error semantics, and crashes the goroutine.

For consumers writing nullable columns through pqarrow to a network-attached sink (the same scenario described in #820 — GCS, S3 multipart, HDFS, etc.), this means a transient downstream failure on a spaced-write path crashes the worker process rather than returning an error.

Source of truth

Both methods are generated from parquet/file/column_writer_types.gen.go.tmpl:

  • WriteBatch template body: lines 71–172 (returns (valueOffset int64, err error), has defer recover()).
  • WriteBatchSpaced template body: lines 189 onward (no error return, no defer recover()).

A fix lands in one place and regenerates 8 method bodies plus the equivalent dictionary-write paths.

Expected behavior

WriteBatchSpaced should match WriteBatch:

  1. Signature (valueOffset int64, err error) (or at minimum error — matching WriteBatch is the cleanest API).
  2. defer func() { if r := recover(); r != nil { err = utils.FormatRecoveredError("unknown error type", r) } }() at the top.
  3. Check the error returned by commitWriteAndCheckPageLimit and panic(err) to let the deferred recover handle it (matching the WriteBatch pattern in the same file).

Adding a return value to a public method is a breaking API change, so a minimal-breakage path is also viable:

  • Introduce a sibling WriteBatchSpacedWithError(...) (int64, error)
  • Keep WriteBatchSpaced(...) as a thin wrapper that calls the new method and panics on error with a documented panic-string format

— exactly the pattern #824 chose for NewParquetWriter / NewParquetWriterWithError. That preserves source compatibility while giving callers a clean error path.

The breaking-API option is also defensible if maintainers prefer fixing the surface properly given that WriteBatchSpaced returning nothing is itself a Go API smell (it cannot signal failure at all today).

Component(s)

Parquet

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions