Skip to content

[pqarrow] pqarrow.FileWriter.WriteBuffered returns "unknown error type: interface conversion: interface is nil, not encoding.Encoder[int64]" #727

@alexandre-normand

Description

@alexandre-normand

Describe the bug, including details regarding any error messages, version, and platform.

github.com/apache/arrow-go/v18: v18.5.1

I'm using arrow-go to write arrow parquet files and using WriteBuffered to periodically flush record batches buffered in memory to disk every 10 seconds. It generally works fine but sometimes I get an error on WriteBuffered like this:

unknown error type: interface conversion: interface is nil, not encoding.Encoder[int64]

or

unknown error type: interface conversion: interface is nil, not encoding.Encoder[github.com/apache/arrow-go/v18/parquet.ByteArray]

I tried dumping the record batch on which the WriteBuffered fails to disk as json and the rows all look complete and correct.

While I don't have an easy reproducible test because I can't find the trigger for the issue, the parquet writer is created like this:

fileWriter, err := pqarrow.NewFileWriter(schema.arrowSchema, file, parquet.NewWriterProperties(parquet.WithAllocator(memory.DefaultAllocator)), pqarrow.DefaultWriterProps())
...

And then we call something like this every 10 seconds:

...
                 builder := array.NewRecordBuilder(memory.DefaultAllocator, schema.arrowSchema)
		defer builder.Release()

		arrowBuilder := newArrowBuilder(schema.arrowSchema, builder)
		for _, op := range toWrite {
			aErr = op.appendArrowRecord(arrowBuilder)
			if aErr != nil {
				aErr = ingestion.NewIrrecoverableError(fmt.Errorf("failed to append arrow record for write operation %s: %w", op.key.String(), aErr), "arrow_transformation_failure")
				return aErr
			}
		}

		// Create a new record batch from the buffer we just filled
		batch := builder.NewRecordBatch()
		defer batch.Release()

		aErr = activeBuffer.fileWriter.WriteBuffered(batch)
		if aErr != nil {
			// Dump a json representation of the batch in blob storage for inspection/troubleshooting
			jsonFileLocation, err := dumpBatchJSON(activeBuffer.fs, activeBuffer.locationProvider, batch)
			if err != nil {
				jsonFileLocation = "unavailable_failure_to_dump_batch"
			}
			aErr = fmt.Errorf("failed to write batch to parquet file, see json dump at '%s' for troubleshooting: %w", jsonFileLocation, aErr)
			return aErr
		}
...

That code obviously doesn't compile but it's a decent representation of how we use the APIs. I initially suspected we were misusing arrow-go/pqarrow but the fact that this isn't deterministic given the same data makes me think that there might be a bug.

I will also add that the failure on WriteBuffered seems to leave the file writer in an inconsistent state as I also see occasional failures on Close() like the one below and they seem to correlate with failures on WriteBuffered that happened prior to that file being closed:

row mismatch for buffered row group: 0, column: 60, count expected: 55000, actual: 54397

Note that I could run with a modified version of arrow-go if there are hypotheses to test. Usually, I can get this error within 30 minutes of startup.

Component(s)

Parquet

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions