Start moving/adding to integration test docs

apache · Feb 7, 2020 · 3579d5b · 3579d5b
1 parent 2b690d6
commit 3579d5b
Show file tree

Hide file tree

Showing 3 changed files with 95 additions and 72 deletions.
diff --git a/docs/source/format/Integration.rst b/docs/source/format/Integration.rst
@@ -20,8 +20,92 @@
 Integration Testing
 ===================
 
+Our strategy for integration testing between Arrow implementations is:
+
+* Test datasets are specified in a custom human-readable, JSON-based format
+  designed for Arrow
+* Each implementation provides a testing executable capable of converting
+  between the JSON and the binary Arrow file representation
+* The test executable is also capable of validating the contents of a binary
+  file against a corresponding JSON file
+
+Running integration tests
+-------------------------
+
+The integration test data generator and runner uses ``archery``, a Python script
+that requires Python 3.6 or higher. You can create a standalone Python
+distribution and environment for running the tests by using
+`miniconda <https://conda.io/miniconda.html>`_. On Linux this is:
+
+.. code-block:: shell
+   MINICONDA_URL=https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
+   wget -O miniconda.sh $MINICONDA_URL
+   bash miniconda.sh -b -p miniconda
+   export PATH=`pwd`/miniconda/bin:$PATH
+
+   conda create -n arrow-integration python=3.6 nomkl numpy six
+   conda activate arrow-integration
+
+
+If you are on macOS, instead use the URL:
+
+.. code-block:: shell
+   MINICONDA_URL=https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
+
+Once you have Python, you can install archery
+
+.. code-block:: shell
+   pip install -e dev/archery
+
+The integration tests are run using the ``archery integration`` command.
+
+.. code-block:: shell
+   archery integration --help
+
+In order to run integration tests, you'll first need to build each component
+you want to include. See the respective developer docs for C++, Java, etc.
+for instructions on building those.
+
+Some languages may require additional build options to enable integration
+testing. For C++, for example, you need to add ``-DARROW_BUILD_INTEGRATION=ON``
+to your cmake command.
+
+Depending on which components you have built, you can enable and add them to
+the archery test run. For example, if you only have the C++ project built, run:
+
+.. code-block:: shell
+   archery integration --with-cpp=1
+
+
+For Java, it may look like:
+
+.. code-block:: shell
+   VERSION=0.11.0-SNAPSHOT
+   export ARROW_JAVA_INTEGRATION_JAR=$JAVA_DIR/tools/target/arrow-tools-$VERSION-jar-with-dependencies.jar
+   archery integration --with-cpp=1 --with-java=1
+
+To run all tests, including Flight integration tests, do:
+
+.. code-block:: shell
+   archery integration --with-all --run-flight
+
+Note that we run these tests in continuous integration, and the CI job uses
+docker-compose. You may also run the docker-compose job locally, or at least
+refer to it if you have questions about how to build other languages or enable
+certain tests.
+
+JSON test data format
+---------------------
+
 A JSON representation of Arrow columnar data is provided for
 cross-language integration testing purposes.
+This representation is `not canonical <https://lists.apache.org/thread.html/6947fb7666a0f9cc27d9677d2dad0fb5990f9063b7cf3d80af5e270f%40%3Cdev.arrow.apache.org%3E>`_
+but it provides a human-readable way of verifying language implementations.
+
+See `here <https://github.com/apache/arrow/tree/master/integration/data>`_
+for some examples of this JSON data.
+
+.. can we check in more examples, e.g. from the generated_*.json test files?
 
 The high level structure of a JSON integration test files is as follows:
 
@@ -33,6 +117,9 @@ The high level structure of a JSON integration test files is as follows:
       "dictionaries": [ /*DictionaryBatch*/ ],
     }
 
+All files contain ``schema`` and ``batches``, while ``dictionaries`` is only
+present if there are dictionary type fields in the schema.
+
 **Schema** ::
 
     {
@@ -68,6 +155,9 @@ The high level structure of a JSON integration test files is as follows:
       "children": [ /* FieldData */ ]
     }
 
+The "name" member of a ``Field`` in the ``Schema`` corresponds to the "name"
+of a ``FieldData`` contained in the "columns" of a ``RecordBatch``.
+
 Here ``BUFFER_TYPE`` is one of ``VALIDITY``, ``OFFSET`` (for
 variable-length types), ``TYPE`` (for unions), or ``DATA``.
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -34,16 +34,6 @@ such topics as:
 
 .. _toc.columnar:
 
-.. Deprecated documents for Google searches
-
-.. toctree::
-   :hidden:
-
-   format/Guidelines
-   format/Layout
-   format/IPC
-   format/Metadata
-
 .. toctree::
    :maxdepth: 2
    :caption: Arrow Specifications and Protocols
@@ -62,7 +52,10 @@ such topics as:
 
    cpp/index
    python/index
-   java/index
+   `Java <https://arrow.apache.org/docs/java/>`_
+   `C GLib <https://arrow.apache.org/docs/c_glib/>`_
+   `JavaScript <https://arrow.apache.org/docs/js/>`_
+   `R <https://arrow.apache.org/docs/r/>`_
 
 .. _toc.development:
 

diff --git a/integration/README.md b/integration/README.md
@@ -19,64 +19,4 @@
 
 # Arrow integration testing
 
-Our strategy for integration testing between Arrow implementations is as follows:
-
-* Test datasets are specified in a custom human-readable, JSON-based format
-  designed for Arrow
-
-* Each implementation provides a testing executable capable of converting
-  between the JSON and the binary Arrow file representation
-
-* The test executable is also capable of validating the contents of a binary
-  file against a corresponding JSON file
-
-## Environment setup
-
-The integration test data generator and runner is written in Python and
-currently requires Python 3.6 or higher. You can create a standalone Python
-distribution and environment for running the tests by using [miniconda][1]. On
-Linux this is:
-
-```shell
-MINICONDA_URL=https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
-wget -O miniconda.sh $MINICONDA_URL
-bash miniconda.sh -b -p miniconda
-export PATH=`pwd`/miniconda/bin:$PATH
-
-conda create -n arrow-integration python=3.6 nomkl numpy six
-conda activate arrow-integration
-```
-
-If you are on macOS, instead use the URL:
-
-```shell
-MINICONDA_URL=https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
-```
-
-After this, you can follow the instructions in the next section.
-
-## Running the existing integration tests
-
-The integration tests are run using the `archery integration` command.
-
-```
-archery integration --help
-```
-
-Depending on which components you have built, you can enable and add them to
-the test run. For example, if you only have the C++ project built, you set:
-
-```
-export ARROW_CPP_EXE_PATH=$CPP_BUILD_DIR/debug
-archery integration --enable-cpp=1
-```
-
-For Java, it may look like:
-
-```
-VERSION=0.11.0-SNAPSHOT
-export ARROW_JAVA_INTEGRATION_JAR=$JAVA_DIR/tools/target/arrow-tools-$VERSION-jar-with-dependencies.jar
-archery integration --enable-cpp=1 --enable-java=1
-```
-
-[1]: https://conda.io/miniconda.html
+See https://github.com/apache/arrow/blob/master/docs/source/format/Integration.rst