Skip to content

[Python] pyarrow.read_serialized cannot read concatenated records #17979

@asfimport

Description

@asfimport

The following code

import pyarrow as pa

f = pa.OSFile('arrow_test', 'w')
pa.serialize_to(12, f)
pa.serialize_to(23, f)
f.close()

f = pa.OSFile('arrow_test', 'r')
print(pa.read_serialized(f).deserialize())
print(pa.read_serialized(f).deserialize())
f.close()
gives the following result:
$ python pyarrow_test.py
First: 12
Traceback (most recent call last):
File "pyarrow_test.py", line 10, in
print('Second: {}'.format(pa.read_serialized(f).deserialize()))
File "pyarrow/serialization.pxi", line 347, in pyarrow.lib.read_serialized (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:79159)
File "pyarrow/error.pxi", line 77, in pyarrow.lib.check_status (/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:8270)
pyarrow.lib.ArrowInvalid: Expected schema message in stream, was null or length 0
I would have expected read_serialized to sucessfully read the second value.

Environment: Linux
Reporter: Richard Shin / @rshin
Assignee: Wes McKinney / @wesm

PRs and other links:

Note: This issue was originally created as ARROW-1996. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions