Skip to content

fix(parquet): return error instead of panicking on first-write failure#824

Open
zeroshade wants to merge 1 commit into
apache:mainfrom
zeroshade:fix/820-parquet-writer-error-return
Open

fix(parquet): return error instead of panicking on first-write failure#824
zeroshade wants to merge 1 commit into
apache:mainfrom
zeroshade:fix/820-parquet-writer-error-return

Conversation

@zeroshade
Copy link
Copy Markdown
Member

Rationale for this change

Closes #820.

(*file.Writer).startFile() panics with the literal string "failed to write magic number" when the underlying sink's first Write call returns an error or short-writes. Because file.NewParquetWriter calls startFile synchronously inside the constructor and has no error return, callers had no idiomatic Go recovery path. The higher-level pqarrow.NewFileWriter already returns (*FileWriter, error), but the panic propagated through it unchanged, so its error return was never reached for this failure mode.

In production this manifests when the sink is a network-attached writer (*storage.Writer from GCS, an S3 multipart upload, an HDFS client, etc.). A transient network blip on the first 4-byte magic-header write reliably crashes the worker process, orphans any in-flight upload session, and forces a container restart.

What changes are included in this PR?

  1. (*Writer).startFile() now returns error instead of panicking on sink-write failures. The configuration-validation panic for "encrypted column not found in file schema" is preserved — that path is a programmer error, not an I/O failure.
  2. New public constructor file.NewParquetWriterWithError(w, sc, opts...) (*Writer, error) is the preferred entry point for callers whose sink may transiently fail.
  3. file.NewParquetWriter is now a thin wrapper over NewParquetWriterWithError that panics with the exact same string ("failed to write magic number") on init failure. Any consumer with a recover() block that string-matches the panic value continues to work unchanged.
  4. pqarrow.NewFileWriter switches to NewParquetWriterWithError internally. Its public signature is unchanged; its error return now meaningfully covers transient sink failures, with no API change for any of its callers.

Are these changes tested?

Yes. Six new tests in parquet/file/file_writer_test.go exercise the new behavior using a flakyMagicSink test helper modeled on the existing errCloseWriter pattern in the same file:

  • TestNewParquetWriterWithError_Success — happy path on a bytes.Buffer.
  • TestNewParquetWriterWithError_FirstWriteFails — first Write returns io.ErrUnexpectedEOF; constructor returns a wrapped error and does not panic.
  • TestNewParquetWriterWithError_FirstWriteShortWrites — first Write returns n=len(p)-1, nil; constructor returns an error mentioning the short write.
  • TestNewParquetWriter_PreservesPanicMessage — the legacy constructor still panics, the panic value is still a string, and the message is still the literal "failed to write magic number" (back-compat guard).
  • TestPqarrowNewFileWriter_PropagatesInitErrorpqarrow.NewFileWriter returns the wrapped sink error rather than panicking.
  • TestPqarrowNewFileWriter_PropagatesShortWrite — same for the short-write case.

The existing TestCloseError, full parquet/file/... and parquet/pqarrow/... test suites all pass (with PARQUET_TEST_DATA pointed at parquet-testing/data).

Are there any user-facing changes?

No breaking changes.

  • file.NewParquetWriter(w, sc, opts...) *Writer — signature unchanged, still panics on first-write failure with the identical literal string "failed to write magic number". Existing recover() workarounds keep working.
  • file.NewParquetWriterWithError(w, sc, opts...) (*Writer, error) — new exported symbol. Adding an exported function is non-breaking in Go.
  • pqarrow.NewFileWriter(...) (*FileWriter, error) — signature unchanged. Its error return now also covers the magic-header-write failure that previously escaped as a panic. Callers who already check err get strictly better behavior.

The new constructor's docstring is the recommended path for users on cloud-storage upload sinks. The legacy constructor's docstring now documents the panic behavior the original issue called out as undocumented.

Out of scope (separate follow-up)

While investigating, I noticed WriteBatchSpaced in parquet/file/column_writer_types.gen.go lacks the defer recover() that WriteBatch has, so an I/O error during a spaced write panics out of the calling goroutine instead of returning an error. Same class of bug as #820 but a different code path; happy to file a separate issue.

apacheGH-820)

(*file.Writer).startFile() previously panicked with "failed to write
magic number" when the underlying sink's first Write returned an error
or short-wrote. Because NewParquetWriter has no error return, callers
on flaky network sinks (e.g. cloud-storage upload writers) had no
idiomatic recovery path, and the higher-level pqarrow.NewFileWriter
propagated the panic unchanged through its (*FileWriter, error) return.

Add a sibling constructor file.NewParquetWriterWithError that returns
(*Writer, error). The legacy file.NewParquetWriter is now a thin wrapper
that panics with the original message string for back-compat. Switch
pqarrow.NewFileWriter to the error-returning constructor so its error
return finally covers this failure mode without any change to its own
public signature.

Closes apache#820.
@zeroshade zeroshade requested review from lidavidm and wgtmac May 22, 2026 19:05
@lidavidm
Copy link
Copy Markdown
Member

While investigating, I noticed WriteBatchSpaced in parquet/file/column_writer_types.gen.go lacks the defer recover() that WriteBatch has, so an I/O error during a spaced write panics out of the calling goroutine instead of returning an error. Same class of bug as #820 but a different code path; happy to file a separate issue.

Uhh, so @zeroshade do you (or your agent) want to file that issue? 😅

@zeroshade
Copy link
Copy Markdown
Member Author

Filed #826 🫤

// cloud-storage upload writer or other network-attached writer) should
// prefer [NewParquetWriterWithError], which surfaces the failure as a
// returned error instead.
func NewParquetWriter(w io.Writer, sc *schema.GroupNode, opts ...WriteOption) *Writer {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we deprecate this as well?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For someone coming to the codebase without being a maintainer, these doc comments are too focused on historical knowledge. I think NewParquetWriterWithError should be documented without reference to NewParquetWriter and simply mention that it may fail if it cannot write the header (e.g. because of a flaky writer), and NewParquetWriter can document the history if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parquet/file: NewParquetWriter panics on transient sink.Write errors during file initialization

2 participants