diff --git a/docs/source/cpp/tables-versus-record-batches.svg b/docs/source/cpp/tables-versus-record-batches.svg new file mode 100644 index 0000000000000..d793b1de2bf7e --- /dev/null +++ b/docs/source/cpp/tables-versus-record-batches.svg @@ -0,0 +1,102 @@ + + + + + + Arrow Table versus Record Batch + + + + Arrow Table + + Schema + + + + + Field + + + + + + Chunked + Array + + + + + + + + Array + + + + + A Table is a C++ data structure, + allowing for a mixed chunking structure and very large arrays. + + + + Arrow Record Batch + + Schema + + + + + Field + + + + + + Array + + + + + A Record Batch is a common Arrow data structure which is recognized by all implementations. + + + \ No newline at end of file diff --git a/docs/source/cpp/tables.rst b/docs/source/cpp/tables.rst index ea9198771cfac..b28a9fc1e13a5 100644 --- a/docs/source/cpp/tables.rst +++ b/docs/source/cpp/tables.rst @@ -77,6 +77,18 @@ has a schema which must match its arrays' datatypes. Record batches are a convenient unit of work for various serialization and computation functions, possibly incremental. +.. image:: tables-versus-record-batches.svg + :alt: A graphical representation of an Arrow Table and a Record Batch, with + structure as described in text above. + +Record batches can be sent between implementations, such as via +:ref:`IPC ` or +via the :doc:`C Data Interface <../format/CDataInterface>`. Tables and +chunked arrays, on the other hand, are concepts in the C++ implementation, +not in the Arrow format itself, so they aren't directly portable. + +However, a table can be converted to and built from a sequence of record +batches easily without needing to copy the underlying array buffers. A table can be streamed as an arbitrary number of record batches using a :class:`arrow::TableBatchReader`. Conversely, a logical sequence of record batches can be assembled to form a table using one of the diff --git a/docs/source/format/Glossary.rst b/docs/source/format/Glossary.rst index 423ebf85783f6..5944d7c18cffe 100644 --- a/docs/source/format/Glossary.rst +++ b/docs/source/format/Glossary.rst @@ -196,7 +196,11 @@ Glossary different buffers for different indices. Not part of the columnar format; this term is specific to - certain language implementations of Arrow (primarily C++ and - its bindings). + certain language implementations of Arrow (for example C++ and + its bindings, and Go). + + .. image:: ../cpp/tables-versus-record-batches.svg + :alt: A graphical representation of an Arrow Table and a + Record Batch, with structure as described in text above. .. seealso:: :term:`chunked array`, :term:`record batch`