-
Notifications
You must be signed in to change notification settings - Fork 105
Description
Describe the bug, including details regarding any error messages, version, and platform.
github.com/apache/arrow-go/v18: v18.5.1
I'm using arrow-go to write arrow parquet files and using WriteBuffered to periodically flush record batches buffered in memory to disk every 10 seconds. It generally works fine but sometimes I get an error on WriteBuffered like this:
unknown error type: interface conversion: interface is nil, not encoding.Encoder[int64]
or
unknown error type: interface conversion: interface is nil, not encoding.Encoder[github.com/apache/arrow-go/v18/parquet.ByteArray]
I tried dumping the record batch on which the WriteBuffered fails to disk as json and the rows all look complete and correct.
While I don't have an easy reproducible test because I can't find the trigger for the issue, the parquet writer is created like this:
fileWriter, err := pqarrow.NewFileWriter(schema.arrowSchema, file, parquet.NewWriterProperties(parquet.WithAllocator(memory.DefaultAllocator)), pqarrow.DefaultWriterProps())
...And then we call something like this every 10 seconds:
...
builder := array.NewRecordBuilder(memory.DefaultAllocator, schema.arrowSchema)
defer builder.Release()
arrowBuilder := newArrowBuilder(schema.arrowSchema, builder)
for _, op := range toWrite {
aErr = op.appendArrowRecord(arrowBuilder)
if aErr != nil {
aErr = ingestion.NewIrrecoverableError(fmt.Errorf("failed to append arrow record for write operation %s: %w", op.key.String(), aErr), "arrow_transformation_failure")
return aErr
}
}
// Create a new record batch from the buffer we just filled
batch := builder.NewRecordBatch()
defer batch.Release()
aErr = activeBuffer.fileWriter.WriteBuffered(batch)
if aErr != nil {
// Dump a json representation of the batch in blob storage for inspection/troubleshooting
jsonFileLocation, err := dumpBatchJSON(activeBuffer.fs, activeBuffer.locationProvider, batch)
if err != nil {
jsonFileLocation = "unavailable_failure_to_dump_batch"
}
aErr = fmt.Errorf("failed to write batch to parquet file, see json dump at '%s' for troubleshooting: %w", jsonFileLocation, aErr)
return aErr
}
...
That code obviously doesn't compile but it's a decent representation of how we use the APIs. I initially suspected we were misusing arrow-go/pqarrow but the fact that this isn't deterministic given the same data makes me think that there might be a bug.
I will also add that the failure on WriteBuffered seems to leave the file writer in an inconsistent state as I also see occasional failures on Close() like the one below and they seem to correlate with failures on WriteBuffered that happened prior to that file being closed:
row mismatch for buffered row group: 0, column: 60, count expected: 55000, actual: 54397
Note that I could run with a modified version of arrow-go if there are hypotheses to test. Usually, I can get this error within 30 minutes of startup.
Component(s)
Parquet