Skip to content

Conversation

@robertnishihara
Copy link
Contributor

This allows you to get the size of a record batch and schema through pyarrow by writing to a mock output stream. You can then use the resulting size to allocate an appropriately sized buffer to actually write to.

Example usage.

import pyarrow as pa
import pandas as pd

val = pd.DataFrame({'a': [1, 2, 3]})
record_batch = pa.RecordBatch.from_pandas(val)

# Get the size of the record batch and schema
sink = pa.MockOutputStream()
stream_writer = pa.RecordBatchStreamWriter(sink, record_batch.schema)
stream_writer.write_batch(record_batch)
size = sink.size()

@robertnishihara
Copy link
Contributor Author

I think the Travis test failures are unrelated to this PR.

@wesm
Copy link
Member

wesm commented Jul 11, 2017

They're unrelated, I'm working on fixing parquet-cpp after the API changes

@robertnishihara robertnishihara changed the title ARROW-1194: [Python] Expose MockOutputStream to in pyarrow. ARROW-1194: [Python] Expose MockOutputStream in pyarrow. Jul 11, 2017
@wesm
Copy link
Member

wesm commented Jul 12, 2017

The builds should be OK now. Reviewing this

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you rebase so we hopefully get a passing build?

@robertnishihara
Copy link
Contributor Author

Looks like the tests are passing now.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks!

@asfgit asfgit closed this in 28e06d8 Jul 13, 2017
@robertnishihara robertnishihara deleted the mockoutputstream branch July 13, 2017 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants