Skip to content

Commit

Permalink
Fix format issues in basic_arrow.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
amoeba committed Feb 7, 2024
1 parent f609bb1 commit d15468a
Showing 1 changed file with 36 additions and 36 deletions.
72 changes: 36 additions & 36 deletions docs/source/cpp/tutorials/basic_arrow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,15 @@ Basic Arrow Data Structures
===========================

Apache Arrow provides fundamental data structures for representing data:
:class:`Array`, :class:`ChunkedArray`, :class:`RecordBatch`, and :class:`Table`.
This article shows how to construct these data structures from primitive
data types; specifically, we will work with integers of varying size
:class:`Array`, :class:`ChunkedArray`, :class:`RecordBatch`, and :class:`Table`.
This article shows how to construct these data structures from primitive
data types; specifically, we will work with integers of varying size
representing days, months, and years. We will use them to create the following data structures:

#. Arrow :class:`Arrays <Array>`
#. :class:`ChunkedArrays<ChunkedArray>`
#. :class:`ChunkedArrays<ChunkedArray>`
#. :class:`RecordBatch`, from :class:`Arrays <Array>`
#. :class:`Table`, from :class:`ChunkedArrays<ChunkedArray>`
#. :class:`Table`, from :class:`ChunkedArrays<ChunkedArray>`

Pre-requisites
--------------
Expand All @@ -50,14 +50,14 @@ Setup
Before trying out Arrow, we need to fill in a couple gaps:

1. We need to include necessary headers.

2. ``A main()`` is needed to glue things together.

Includes
^^^^^^^^

First, as ever, we need some includes. We'll get ``iostream`` for output, then import Arrow's basic
functionality from ``api.h``, like so:
functionality from ``api.h``, like so:

.. literalinclude:: ../../../../cpp/examples/tutorial_examples/arrow_example.cc
:language: cpp
Expand All @@ -75,14 +75,14 @@ following:
:start-after: (Doc section: Main)
:end-before: (Doc section: Main)

This allows us to easily use Arrows error-handling macros, which will
This allows us to easily use Arrow's error-handling macros, which will
return back to ``main()`` with a :class:`arrow::Status` object if a failure occurs – and
this ``main()`` will report the error. Note that this means Arrow never
raises exceptions, instead relying upon returning :class:`Status`. For more on
that, read here: :doc:`/cpp/conventions`.

To accompany this ``main()``, we have a ``RunMain()`` from which any :class:`Status`
objects can return – this is where well write the rest of the program:
objects can return – this is where we'll write the rest of the program:

.. literalinclude:: ../../../../cpp/examples/tutorial_examples/arrow_example.cc
:language: cpp
Expand All @@ -97,14 +97,14 @@ Building int8 Arrays
^^^^^^^^^^^^^^^^^^^^

Given that we have some data in standard C++ arrays, and want to use Arrow, we need to move
the data from said arrays into Arrow arrays. We still guarantee contiguity of memory in an
the data from said arrays into Arrow arrays. We still guarantee contiguity of memory in an
:class:`Array`, so no worries about a performance loss when using :class:`Array` vs C++ arrays.
The easiest way to construct an :class:`Array` uses an :class:`ArrayBuilder`.
The easiest way to construct an :class:`Array` uses an :class:`ArrayBuilder`.

.. seealso:: :doc:`/cpp/arrays` for more technical details on :class:`Array`

The following code initializes an :class:`ArrayBuilder` for an :class:`Array` that will hold 8 bit
integers. Specifically, it uses the ``AppendValues()`` method, present in concrete
integers. Specifically, it uses the ``AppendValues()`` method, present in concrete
:class:`arrow::ArrayBuilder` subclasses, to fill the :class:`ArrayBuilder` with the
contents of a standard C++ array. Note the use of :c:macro:`ARROW_RETURN_NOT_OK`.
If ``AppendValues()`` fails, this macro will return to ``main()``, which will
Expand All @@ -115,10 +115,10 @@ print out the meaning of the failure.
:start-after: (Doc section: int8builder 1 Append)
:end-before: (Doc section: int8builder 1 Append)

Given an :class:`ArrayBuilder` has the values we want in our :class:`Array`, we can use
:func:`ArrayBuilder::Finish` to output the final structure to an :class:`Array` – specifically,
Given an :class:`ArrayBuilder` has the values we want in our :class:`Array`, we can use
:func:`ArrayBuilder::Finish` to output the final structure to an :class:`Array` – specifically,
we output to a ``std::shared_ptr<arrow::Array>``. Note the use of :c:macro:`ARROW_ASSIGN_OR_RAISE`
in the following code. :func:`~ArrayBuilder::Finish` outputs a :class:`arrow::Result` object, which :c:macro:`ARROW_ASSIGN_OR_RAISE`
in the following code. :func:`~ArrayBuilder::Finish` outputs a :class:`arrow::Result` object, which :c:macro:`ARROW_ASSIGN_OR_RAISE`
can process. If the method fails, it will return to ``main()`` with a :class:`Status`
that will explain what went wrong. If it succeeds, then it will assign
the final output to the left-hand variable.
Expand All @@ -141,7 +141,7 @@ Building int16 Arrays

An :class:`ArrayBuilder` has its type specified at the time of declaration.
Once this is done, it cannot have its type changed. We have to make a new one when we switch to year data, which
requires a 16-bit integer at the minimum. Of course, theres an :class:`ArrayBuilder` for that.
requires a 16-bit integer at the minimum. Of course, there's an :class:`ArrayBuilder` for that.
It uses the exact same methods, but with the new data type:

.. literalinclude:: ../../../../cpp/examples/tutorial_examples/arrow_example.cc
Expand All @@ -154,16 +154,16 @@ Now, we have three Arrow :class:`Arrays <arrow::Array>`, with some variance in t
Making a RecordBatch
--------------------

A columnar data format only really comes into play when you have a table.
So, lets make one. The first kind well make is the :class:`RecordBatch` – this
uses :class:`Arrays <Array>` internally, which means all data will be contiguous within each
A columnar data format only really comes into play when you have a table.
So, let's make one. The first kind we'll make is the :class:`RecordBatch` – this
uses :class:`Arrays <Array>` internally, which means all data will be contiguous within each
column, but any appending or concatenating will require copying. Making a :class:`RecordBatch`
has two steps, given existing :class:`Arrays <Array>`:

#. Defining a :class:`Schema`
#. Loading the :class:`Schema` and Arrays into the constructor

Defining a Schema
Defining a Schema
^^^^^^^^^^^^^^^^^

To get started making a :class:`RecordBatch`, we first need to define
Expand All @@ -180,8 +180,8 @@ so:
Building a RecordBatch
^^^^^^^^^^^^^^^^^^^^^^

With data in :class:`Arrays <Array>` from the previous section, and column descriptions in our
:class:`Schema` from the previous step, we can make the :class:`RecordBatch`. Note that the
With data in :class:`Arrays <Array>` from the previous section, and column descriptions in our
:class:`Schema` from the previous step, we can make the :class:`RecordBatch`. Note that the
length of the columns is necessary, and the length is shared by all columns.

.. literalinclude:: ../../../../cpp/examples/tutorial_examples/arrow_example.cc
Expand All @@ -190,26 +190,26 @@ length of the columns is necessary, and the length is shared by all columns.
:end-before: (Doc section: RBatch)

Now, we have our data in a nice tabular form, safely within the :class:`RecordBatch`.
What we can do with this will be discussed in the later tutorials.
What we can do with this will be discussed in the later tutorials.

Making a ChunkedArray
---------------------

Lets say that we want an array made up of sub-arrays, because it
Let's say that we want an array made up of sub-arrays, because it
can be useful for avoiding data copies when concatenating, for parallelizing work, for fitting each chunk
cutely into cache, or for exceeding the 2,147,483,647 row limit in a
standard Arrow :class:`Array`. For this, Arrow offers :class:`ChunkedArray`, which can be
made up of individual Arrow :class:`Arrays <Array>`. In this example, we can reuse the arrays
we made earlier in part of our chunked array, allowing us to extend them without having to copy
data. So, lets build a few more :class:`Arrays <Array>`,
data. So, let's build a few more :class:`Arrays <Array>`,
using the same builders for ease of use:

.. literalinclude:: ../../../../cpp/examples/tutorial_examples/arrow_example.cc
:language: cpp
:start-after: (Doc section: More Arrays)
:end-before: (Doc section: More Arrays)

In order to support an arbitrary amount of :class:`Arrays <Array>` in the construction of the
In order to support an arbitrary amount of :class:`Arrays <Array>` in the construction of the
:class:`ChunkedArray`, Arrow supplies :class:`ArrayVector`. This provides a vector for :class:`Arrays <Array>`,
and we'll use it here to prepare to make a :class:`ChunkedArray`:

Expand All @@ -233,18 +233,18 @@ for the month and year data:
:start-after: (Doc section: ChunkedArray Month Year)
:end-before: (Doc section: ChunkedArray Month Year)

With that, we are left with three :class:`ChunkedArrays <ChunkedArray>`, varying in type.
With that, we are left with three :class:`ChunkedArrays <ChunkedArray>`, varying in type.

Making a Table
--------------

One particularly useful thing we can do with the :class:`ChunkedArrays <ChunkedArray>` from the previous section is creating
:class:`Tables <Table>`. Much like a :class:`RecordBatch`, a :class:`Table` stores tabular data. However, a
One particularly useful thing we can do with the :class:`ChunkedArrays <ChunkedArray>` from the previous section is creating
:class:`Tables <Table>`. Much like a :class:`RecordBatch`, a :class:`Table` stores tabular data. However, a
:class:`Table` does not guarantee contiguity, due to being made up of :class:`ChunkedArrays <ChunkedArray>`.
This can be useful for logic, parallelizing work, for fitting chunks into cache, or exceeding the 2,147,483,647 row limit
present in :class:`Array` and, thus, :class:`RecordBatch`.

If you read up to :class:`RecordBatch`, you may note that the :class:`Table` constructor in the following code is
If you read up to :class:`RecordBatch`, you may note that the :class:`Table` constructor in the following code is
effectively identical, it just happens to put the length of the columns
in position 3, and makes a :class:`Table`. We re-use the :class:`Schema` from before, and
make our :class:`Table`:
Expand All @@ -255,23 +255,23 @@ make our :class:`Table`:
:end-before: (Doc section: Table)

Now, we have our data in a nice tabular form, safely within the :class:`Table`.
What we can do with this will be discussed in the later tutorials.
What we can do with this will be discussed in the later tutorials.

Ending Program
Ending Program
--------------

At the end, we just return :func:`Status::OK()`, so the ``main()`` knows that
were done, and that everythings okay.
we're done, and that everything's okay.

.. literalinclude:: ../../../../cpp/examples/tutorial_examples/arrow_example.cc
:language: cpp
:start-after: (Doc section: Ret)
:end-before: (Doc section: Ret)

Wrapping Up
Wrapping Up
-----------

With that, youve created the fundamental data structures in Arrow, and
With that, you've created the fundamental data structures in Arrow, and
can proceed to getting them in and out of a program with file I/O in the next article.

Refer to the below for a copy of the complete code:
Expand All @@ -281,4 +281,4 @@ Refer to the below for a copy of the complete code:
:start-after: (Doc section: Basic Example)
:end-before: (Doc section: Basic Example)
:linenos:
:lineno-match:
:lineno-match:

0 comments on commit d15468a

Please sign in to comment.