Describe the bug, including details regarding any error messages, version, and platform.
Discovered while reviewing #824 (which fixes the related panic in file.NewParquetWriter).
Across all 8 generated *ColumnChunkWriter types in parquet/file/column_writer_types.gen.go, WriteBatch and WriteBatchSpaced are an asymmetric pair:
WriteBatch returns (valueOffset int64, err error) and opens with a defer recover() that converts any panic into a returned error via utils.FormatRecoveredError.
WriteBatchSpaced returns nothing (void signature), has no defer recover(), and silently discards the error returned by commitWriteAndCheckPageLimit.
This is the same class of "panic escapes the public API" bug as #820 (fixed in #824) but on a different code path.
Code
The asymmetry is most visible in Int32ColumnChunkWriter (column_writer_types.gen.go:66-143):
WriteBatch (line 66) — recovers panics, surfaces as err:
func (w *Int32ColumnChunkWriter) WriteBatch(values []int32, defLevels, repLevels []int16) (valueOffset int64, err error) {
defer func() {
if r := recover(); r != nil {
err = utils.FormatRecoveredError("unknown error type", r)
}
}()
...
w.doBatches(n, repLevels, func(offset, batch int64) {
...
if err := w.commitWriteAndCheckPageLimit(batch, toWrite); err != nil {
panic(err) // caught above
}
...
})
return
}
WriteBatchSpaced (line 118) — no recover, no error return, silently swallows the commitWriteAndCheckPageLimit error:
func (w *Int32ColumnChunkWriter) WriteBatchSpaced(values []int32, defLevels, repLevels []int16, validBits []byte, validBitsOffset int64) {
...
doBatches(int64(length), w.props.WriteBatchSize(), func(offset, batch int64) {
...
w.commitWriteAndCheckPageLimit(batch, info.numSpaced()) // <-- error ignored
...
})
}
commitWriteAndCheckPageLimit is defined in parquet/file/column_writer.go:264 as func (w *columnWriter) commitWriteAndCheckPageLimit(numLevels, numValues int64) error — the return value is meaningful.
The pattern is identical for the other 7 types — Int64, Int96, Float32, Float64, Boolean, ByteArray, FixedLenByteArray. The third method on each writer, WriteDictIndices, mirrors WriteBatch: it both returns error and uses defer recover(). So WriteBatchSpaced is the lone outlier.
Why this matters in practice
WriteBatchSpaced is reachable through public API surface — for example, (*pqarrow.FileWriter).WriteBuffered and the pqarrow array-encoder paths drive it for nullable columns. Any of the panic sources reachable from the inner closure (maybeCalculateValidityBits, writeLevelsSpaced, writeValuesSpaced, commitWriteAndCheckPageLimit returning a non-nil error in a context where the caller chose to panic, dictionary-fallback paths that panic(err) on bookkeeping failures) escapes the goroutine instead of surfacing as an error.
Concrete failure modes already in the package:
commitWriteAndCheckPageLimit returns an error → ignored entirely (the simpler bug)
- Any helper called from the closure that panics on its own internal invariant (e.g. dictionary fallback's
panic(err) calls in FallbackToPlain, or the panic(err) paths inside WriteDictionaryPage/drainBufferedDataPages) → propagates out of WriteBatchSpaced, past any caller that expected WriteBatch-style error semantics, and crashes the goroutine.
For consumers writing nullable columns through pqarrow to a network-attached sink (the same scenario described in #820 — GCS, S3 multipart, HDFS, etc.), this means a transient downstream failure on a spaced-write path crashes the worker process rather than returning an error.
Source of truth
Both methods are generated from parquet/file/column_writer_types.gen.go.tmpl:
WriteBatch template body: lines 71–172 (returns (valueOffset int64, err error), has defer recover()).
WriteBatchSpaced template body: lines 189 onward (no error return, no defer recover()).
A fix lands in one place and regenerates 8 method bodies plus the equivalent dictionary-write paths.
Expected behavior
WriteBatchSpaced should match WriteBatch:
- Signature
(valueOffset int64, err error) (or at minimum error — matching WriteBatch is the cleanest API).
defer func() { if r := recover(); r != nil { err = utils.FormatRecoveredError("unknown error type", r) } }() at the top.
- Check the
error returned by commitWriteAndCheckPageLimit and panic(err) to let the deferred recover handle it (matching the WriteBatch pattern in the same file).
Adding a return value to a public method is a breaking API change, so a minimal-breakage path is also viable:
- Introduce a sibling
WriteBatchSpacedWithError(...) (int64, error)
- Keep
WriteBatchSpaced(...) as a thin wrapper that calls the new method and panics on error with a documented panic-string format
— exactly the pattern #824 chose for NewParquetWriter / NewParquetWriterWithError. That preserves source compatibility while giving callers a clean error path.
The breaking-API option is also defensible if maintainers prefer fixing the surface properly given that WriteBatchSpaced returning nothing is itself a Go API smell (it cannot signal failure at all today).
Component(s)
Parquet
Describe the bug, including details regarding any error messages, version, and platform.
Discovered while reviewing #824 (which fixes the related panic in
file.NewParquetWriter).Across all 8 generated
*ColumnChunkWritertypes inparquet/file/column_writer_types.gen.go,WriteBatchandWriteBatchSpacedare an asymmetric pair:WriteBatchreturns(valueOffset int64, err error)and opens with adefer recover()that converts any panic into a returnederrorviautils.FormatRecoveredError.WriteBatchSpacedreturns nothing (voidsignature), has nodefer recover(), and silently discards theerrorreturned bycommitWriteAndCheckPageLimit.This is the same class of "panic escapes the public API" bug as #820 (fixed in #824) but on a different code path.
Code
The asymmetry is most visible in
Int32ColumnChunkWriter(column_writer_types.gen.go:66-143):WriteBatch(line 66) — recovers panics, surfaces aserr:WriteBatchSpaced(line 118) — no recover, no error return, silently swallows thecommitWriteAndCheckPageLimiterror:commitWriteAndCheckPageLimitis defined inparquet/file/column_writer.go:264asfunc (w *columnWriter) commitWriteAndCheckPageLimit(numLevels, numValues int64) error— the return value is meaningful.The pattern is identical for the other 7 types —
Int64,Int96,Float32,Float64,Boolean,ByteArray,FixedLenByteArray. The third method on each writer,WriteDictIndices, mirrorsWriteBatch: it both returnserrorand usesdefer recover(). SoWriteBatchSpacedis the lone outlier.Why this matters in practice
WriteBatchSpacedis reachable through public API surface — for example,(*pqarrow.FileWriter).WriteBufferedand thepqarrowarray-encoder paths drive it for nullable columns. Any of the panic sources reachable from the inner closure (maybeCalculateValidityBits,writeLevelsSpaced,writeValuesSpaced,commitWriteAndCheckPageLimitreturning a non-nil error in a context where the caller chose to panic, dictionary-fallback paths thatpanic(err)on bookkeeping failures) escapes the goroutine instead of surfacing as anerror.Concrete failure modes already in the package:
commitWriteAndCheckPageLimitreturns an error → ignored entirely (the simpler bug)panic(err)calls inFallbackToPlain, or thepanic(err)paths insideWriteDictionaryPage/drainBufferedDataPages) → propagates out ofWriteBatchSpaced, past any caller that expectedWriteBatch-style error semantics, and crashes the goroutine.For consumers writing nullable columns through
pqarrowto a network-attached sink (the same scenario described in #820 — GCS, S3 multipart, HDFS, etc.), this means a transient downstream failure on a spaced-write path crashes the worker process rather than returning an error.Source of truth
Both methods are generated from
parquet/file/column_writer_types.gen.go.tmpl:WriteBatchtemplate body: lines 71–172 (returns(valueOffset int64, err error), hasdefer recover()).WriteBatchSpacedtemplate body: lines 189 onward (no error return, nodefer recover()).A fix lands in one place and regenerates 8 method bodies plus the equivalent dictionary-write paths.
Expected behavior
WriteBatchSpacedshould matchWriteBatch:(valueOffset int64, err error)(or at minimumerror— matchingWriteBatchis the cleanest API).defer func() { if r := recover(); r != nil { err = utils.FormatRecoveredError("unknown error type", r) } }()at the top.errorreturned bycommitWriteAndCheckPageLimitandpanic(err)to let the deferred recover handle it (matching theWriteBatchpattern in the same file).Adding a return value to a public method is a breaking API change, so a minimal-breakage path is also viable:
WriteBatchSpacedWithError(...) (int64, error)WriteBatchSpaced(...)as a thin wrapper that calls the new method and panics on error with a documented panic-string format— exactly the pattern #824 chose for
NewParquetWriter/NewParquetWriterWithError. That preserves source compatibility while giving callers a clean error path.The breaking-API option is also defensible if maintainers prefer fixing the surface properly given that
WriteBatchSpacedreturning nothing is itself a Go API smell (it cannot signal failure at all today).Component(s)
Parquet