Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 29 additions & 9 deletions python/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,12 @@ Apache Arrow.

.. contents::

Write a Parquet file
====================

.. testsetup::

import numpy as np
import pyarrow as pa

arr = pa.array(np.arange(100))
Write a Parquet file
====================

Given an array with 100 numbers, from 0 to 99

Expand Down Expand Up @@ -179,14 +176,37 @@ format can be memory mapped back directly from the disk.
Writing CSV files
=================

It is currently possible to write an Arrow :class:`pyarrow.Table` to
CSV by going through pandas. Arrow doesn't currently provide an optimized
code path for writing to CSV.
It is possible to write an Arrow :class:`pyarrow.Table` to
a CSV file using the :func:`pyarrow.csv.write_csv` function

.. testcode::

arr = pa.array(range(100))
table = pa.Table.from_arrays([arr], names=["col1"])
table.to_pandas().to_csv("table.csv", index=False)

import pyarrow.csv
pa.csv.write_csv(table, "table.csv",
write_options=pa.csv.WriteOptions(include_header=True))
Comment on lines 182 to +189
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has been some movement here: #2 to avoid testsetup (which is hidden) in favor of fully standalone testcode blocks (at the risk of duplication).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw, that's why I moved the arr = pa.array(range(100)) into the test code, so that it's more explicit.


Writing CSV files incrementally
===============================

If you need to write data to a CSV file incrementally
as you generate or retrieve the data and you don't want to keep
in memory the whole table to write it at once, it's possible to use
:class:`pyarrow.csv.CSVWriter` to write data incrementally

.. testcode::

schema = pa.schema([("col1", pa.int32())])
with pa.csv.CSVWriter("table.csv", schema=schema) as writer:
for chunk in range(10):
datachunk = range(chunk*10, (chunk+1)*10)
table = pa.Table.from_arrays([pa.array(datachunk)], schema=schema)
writer.write(table)

It's equally possible to write :class:`pyarrow.RecordBatch`
by passing them as you would for tables.

Reading CSV files
=================
Expand Down