Skip to content

Commit

Permalink
ARROW-13454: [C++][Docs] Tables vs Record Batches (#14008)
Browse files Browse the repository at this point in the history
Adds a little more explanation of the difference between tables and record batches, as well as a diagram representation.

Authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
  • Loading branch information
wjones127 committed Sep 20, 2022
1 parent cd67e51 commit ab71673
Show file tree
Hide file tree
Showing 3 changed files with 120 additions and 2 deletions.
102 changes: 102 additions & 0 deletions docs/source/cpp/tables-versus-record-batches.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions docs/source/cpp/tables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,18 @@ has a schema which must match its arrays' datatypes.
Record batches are a convenient unit of work for various serialization
and computation functions, possibly incremental.

.. image:: tables-versus-record-batches.svg
:alt: A graphical representation of an Arrow Table and a Record Batch, with
structure as described in text above.

Record batches can be sent between implementations, such as via
:ref:`IPC <format-ipc>` or
via the :doc:`C Data Interface <../format/CDataInterface>`. Tables and
chunked arrays, on the other hand, are concepts in the C++ implementation,
not in the Arrow format itself, so they aren't directly portable.

However, a table can be converted to and built from a sequence of record
batches easily without needing to copy the underlying array buffers.
A table can be streamed as an arbitrary number of record batches using
a :class:`arrow::TableBatchReader`. Conversely, a logical sequence of
record batches can be assembled to form a table using one of the
Expand Down
8 changes: 6 additions & 2 deletions docs/source/format/Glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,11 @@ Glossary
different buffers for different indices.

Not part of the columnar format; this term is specific to
certain language implementations of Arrow (primarily C++ and
its bindings).
certain language implementations of Arrow (for example C++ and
its bindings, and Go).

.. image:: ../cpp/tables-versus-record-batches.svg
:alt: A graphical representation of an Arrow Table and a
Record Batch, with structure as described in text above.

.. seealso:: :term:`chunked array`, :term:`record batch`

0 comments on commit ab71673

Please sign in to comment.