Skip to content

Commit

Permalink
Start moving/adding to integration test docs
Browse files Browse the repository at this point in the history
  • Loading branch information
nealrichardson committed Feb 7, 2020
1 parent 2b690d6 commit 3579d5b
Show file tree
Hide file tree
Showing 3 changed files with 95 additions and 72 deletions.
90 changes: 90 additions & 0 deletions docs/source/format/Integration.rst
Expand Up @@ -20,8 +20,92 @@
Integration Testing
===================

Our strategy for integration testing between Arrow implementations is:

* Test datasets are specified in a custom human-readable, JSON-based format
designed for Arrow
* Each implementation provides a testing executable capable of converting
between the JSON and the binary Arrow file representation
* The test executable is also capable of validating the contents of a binary
file against a corresponding JSON file

Running integration tests
-------------------------

The integration test data generator and runner uses ``archery``, a Python script
that requires Python 3.6 or higher. You can create a standalone Python
distribution and environment for running the tests by using
`miniconda <https://conda.io/miniconda.html>`_. On Linux this is:

.. code-block:: shell
MINICONDA_URL=https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
wget -O miniconda.sh $MINICONDA_URL
bash miniconda.sh -b -p miniconda
export PATH=`pwd`/miniconda/bin:$PATH
conda create -n arrow-integration python=3.6 nomkl numpy six
conda activate arrow-integration
If you are on macOS, instead use the URL:

.. code-block:: shell
MINICONDA_URL=https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
Once you have Python, you can install archery

.. code-block:: shell
pip install -e dev/archery
The integration tests are run using the ``archery integration`` command.

.. code-block:: shell
archery integration --help
In order to run integration tests, you'll first need to build each component
you want to include. See the respective developer docs for C++, Java, etc.
for instructions on building those.

Some languages may require additional build options to enable integration
testing. For C++, for example, you need to add ``-DARROW_BUILD_INTEGRATION=ON``
to your cmake command.

Depending on which components you have built, you can enable and add them to
the archery test run. For example, if you only have the C++ project built, run:

.. code-block:: shell
archery integration --with-cpp=1
For Java, it may look like:

.. code-block:: shell
VERSION=0.11.0-SNAPSHOT
export ARROW_JAVA_INTEGRATION_JAR=$JAVA_DIR/tools/target/arrow-tools-$VERSION-jar-with-dependencies.jar
archery integration --with-cpp=1 --with-java=1
To run all tests, including Flight integration tests, do:

.. code-block:: shell
archery integration --with-all --run-flight
Note that we run these tests in continuous integration, and the CI job uses
docker-compose. You may also run the docker-compose job locally, or at least
refer to it if you have questions about how to build other languages or enable
certain tests.

JSON test data format
---------------------

A JSON representation of Arrow columnar data is provided for
cross-language integration testing purposes.
This representation is `not canonical <https://lists.apache.org/thread.html/6947fb7666a0f9cc27d9677d2dad0fb5990f9063b7cf3d80af5e270f%40%3Cdev.arrow.apache.org%3E>`_
but it provides a human-readable way of verifying language implementations.

See `here <https://github.com/apache/arrow/tree/master/integration/data>`_
for some examples of this JSON data.

.. can we check in more examples, e.g. from the generated_*.json test files?
The high level structure of a JSON integration test files is as follows:

Expand All @@ -33,6 +117,9 @@ The high level structure of a JSON integration test files is as follows:
"dictionaries": [ /*DictionaryBatch*/ ],
}

All files contain ``schema`` and ``batches``, while ``dictionaries`` is only
present if there are dictionary type fields in the schema.

**Schema** ::

{
Expand Down Expand Up @@ -68,6 +155,9 @@ The high level structure of a JSON integration test files is as follows:
"children": [ /* FieldData */ ]
}

The "name" member of a ``Field`` in the ``Schema`` corresponds to the "name"
of a ``FieldData`` contained in the "columns" of a ``RecordBatch``.

Here ``BUFFER_TYPE`` is one of ``VALIDITY``, ``OFFSET`` (for
variable-length types), ``TYPE`` (for unions), or ``DATA``.

Expand Down
15 changes: 4 additions & 11 deletions docs/source/index.rst
Expand Up @@ -34,16 +34,6 @@ such topics as:

.. _toc.columnar:

.. Deprecated documents for Google searches
.. toctree::
:hidden:

format/Guidelines
format/Layout
format/IPC
format/Metadata

.. toctree::
:maxdepth: 2
:caption: Arrow Specifications and Protocols
Expand All @@ -62,7 +52,10 @@ such topics as:

cpp/index
python/index
java/index
`Java <https://arrow.apache.org/docs/java/>`_
`C GLib <https://arrow.apache.org/docs/c_glib/>`_
`JavaScript <https://arrow.apache.org/docs/js/>`_
`R <https://arrow.apache.org/docs/r/>`_

.. _toc.development:

Expand Down
62 changes: 1 addition & 61 deletions integration/README.md
Expand Up @@ -19,64 +19,4 @@

# Arrow integration testing

Our strategy for integration testing between Arrow implementations is as follows:

* Test datasets are specified in a custom human-readable, JSON-based format
designed for Arrow

* Each implementation provides a testing executable capable of converting
between the JSON and the binary Arrow file representation

* The test executable is also capable of validating the contents of a binary
file against a corresponding JSON file

## Environment setup

The integration test data generator and runner is written in Python and
currently requires Python 3.6 or higher. You can create a standalone Python
distribution and environment for running the tests by using [miniconda][1]. On
Linux this is:

```shell
MINICONDA_URL=https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
wget -O miniconda.sh $MINICONDA_URL
bash miniconda.sh -b -p miniconda
export PATH=`pwd`/miniconda/bin:$PATH

conda create -n arrow-integration python=3.6 nomkl numpy six
conda activate arrow-integration
```

If you are on macOS, instead use the URL:

```shell
MINICONDA_URL=https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
```

After this, you can follow the instructions in the next section.

## Running the existing integration tests

The integration tests are run using the `archery integration` command.

```
archery integration --help
```

Depending on which components you have built, you can enable and add them to
the test run. For example, if you only have the C++ project built, you set:

```
export ARROW_CPP_EXE_PATH=$CPP_BUILD_DIR/debug
archery integration --enable-cpp=1
```

For Java, it may look like:

```
VERSION=0.11.0-SNAPSHOT
export ARROW_JAVA_INTEGRATION_JAR=$JAVA_DIR/tools/target/arrow-tools-$VERSION-jar-with-dependencies.jar
archery integration --enable-cpp=1 --enable-java=1
```

[1]: https://conda.io/miniconda.html
See https://github.com/apache/arrow/blob/master/docs/source/format/Integration.rst

0 comments on commit 3579d5b

Please sign in to comment.