Skip to content

Commit

Permalink
Make changes to Arrow Columnar Fromat sction (tabular data, table, etc.)
Browse files Browse the repository at this point in the history
  • Loading branch information
AlenkaF committed Jun 10, 2024
1 parent c644e36 commit f8bd643
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions docs/source/format/Intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,18 +44,21 @@ in-memory analytical data processing. This includes such topics as:
Arrow Columnar Format
=====================

Apache Arrow focuses on tabular data so let's consider we have data
which are tabular so they can be organized into a table:

.. figure:: ./images/columnar-diagram_1.svg
:scale: 70%
:alt: Diagram with tabular data of 4 rows and columns.

Data can be represented in memory using a row based format or a column
based format. The row based format stores data by row meaning the rows
This kind of data can be represented in memory using a row based format or a
column based format. The row based format stores data by row meaning the rows
are adjacent in the computer memory:

.. figure:: ./images/columnar-diagram_2.svg
:alt: Tabular data being structured row by row in computer memory.

In a columnar format, on the other hand, the data is organised by column
In a columnar format, on the other hand, the data is organized by column
instead of by row making analytical operations like filtering, grouping,
aggregations and others more efficient because the CPU can maintain memory locality
and require less memory jumps to process the data. By keeping the data contiguous
Expand All @@ -64,6 +67,9 @@ CPUs have single instructions, multiple data (SIMD) enabling parallel
processing and execution of operations on vector data using a single CPU
instruction.

Apache Arrow is solving this exact problem. It is the specification that
uses the columnar layout.

.. figure:: ./images/columnar-diagram_3.svg
:alt: Tabular data being structured column by column in computer memory.

Expand Down

0 comments on commit f8bd643

Please sign in to comment.