diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index bf5ca08d53c32..7dcc1c9816d12 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -136,5 +136,11 @@ repos: rev: v0.9.1 hooks: - id: sphinx-lint - files: ^docs/ - args: ['--disable', 'all', '--enable', 'trailing-whitespace,missing-final-newline', 'docs'] + files: ^docs/source + exclude: ^docs/source/python/generated + args: [ + '--enable', + 'all', + '--disable', + 'dangling-hyphen,line-too-long', + ] diff --git a/docs/source/conf.py b/docs/source/conf.py index b487200555a09..1e6c113e33188 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -535,7 +535,7 @@ # # latex_appendices = [] -# It false, will not define \strong, \code, itleref, \crossref ... but only +# It false, will not define \strong, \code, \titleref, \crossref ... but only # \sphinxstrong, ..., \sphinxtitleref, ... To help avoid clash with user added # packages. # diff --git a/docs/source/cpp/acero/developer_guide.rst b/docs/source/cpp/acero/developer_guide.rst index 80ca68556fc40..7dd08fe3ce2ce 100644 --- a/docs/source/cpp/acero/developer_guide.rst +++ b/docs/source/cpp/acero/developer_guide.rst @@ -327,8 +327,8 @@ An engine could choose to create a thread task for every execution of a node. H this leads to problems with cache locality. For example, let's assume we have a basic plan consisting of three exec nodes, scan, project, and then filter (this is a very common use case). Now let's assume there are 100 batches. In a task-per-operator model we would have tasks like "Scan Batch 5", "Project Batch 5", and "Filter Batch 5". Each -of those tasks is potentially going to access the same data. For example, maybe the `project` and `filter` nodes need -to read the same column. A column which is intially created in a decode phase of the `scan` node. To maximize cache +of those tasks is potentially going to access the same data. For example, maybe the ``project`` and ``filter`` nodes need +to read the same column. A column which is intially created in a decode phase of the ``scan`` node. To maximize cache utilization we would need to carefully schedule our tasks to ensure that all three of those tasks are run consecutively and assigned to the same CPU core. @@ -412,7 +412,7 @@ Ordered Execution ================= Some nodes either establish an ordering to their outgoing batches or they need to be able to process batches in order. -Acero handles ordering using the `batch_index` property on an ExecBatch. If a node has a deterministic output order +Acero handles ordering using the ``batch_index`` property on an ExecBatch. If a node has a deterministic output order then it should apply a batch index on batches that it emits. For example, the OrderByNode applies a new ordering to batches (regardless of the incoming ordering). The scan node is able to attach an implicit ordering to batches which reflects the order of the rows in the files being scanned. @@ -461,8 +461,8 @@ Acero's tracing is currently half-implemented and there are major gaps in profil effort at tracing with open telemetry and most of the necessary pieces are in place. The main thing currently lacking is some kind of effective visualization of the tracing results. -In order to use the tracing that is present today you will need to build with Arrow with `ARROW_WITH_OPENTELEMETRY=ON`. -Then you will need to set the environment variable `ARROW_TRACING_BACKEND=otlp_http`. This will configure open telemetry +In order to use the tracing that is present today you will need to build with Arrow with ``ARROW_WITH_OPENTELEMETRY=ON``. +Then you will need to set the environment variable ``ARROW_TRACING_BACKEND=otlp_http``. This will configure open telemetry to export trace results (as OTLP) to the HTTP endpoint http://localhost:4318/v1/traces. You will need to configure an open telemetry collector to collect results on that endpoint and you will need to configure a trace viewer of some kind such as Jaeger: https://www.jaegertracing.io/docs/1.21/opentelemetry/ diff --git a/docs/source/cpp/acero/overview.rst b/docs/source/cpp/acero/overview.rst index 8be4cbc1b1772..34e0b143bc2ce 100644 --- a/docs/source/cpp/acero/overview.rst +++ b/docs/source/cpp/acero/overview.rst @@ -209,16 +209,16 @@ must have the same length. There are a few key differences from ExecBatch: Both the record batch and the exec batch have strong ownership of the arrays & buffers -* An `ExecBatch` does not have a schema. This is because an `ExecBatch` is assumed to be +* An ``ExecBatch`` does not have a schema. This is because an ``ExecBatch`` is assumed to be part of a stream of batches and the stream is assumed to have a consistent schema. So - the schema for an `ExecBatch` is typically stored in the ExecNode. -* Columns in an `ExecBatch` are either an `Array` or a `Scalar`. When a column is a `Scalar` - this means that the column has a single value for every row in the batch. An `ExecBatch` + the schema for an ``ExecBatch`` is typically stored in the ExecNode. +* Columns in an ``ExecBatch`` are either an ``Array`` or a ``Scalar``. When a column is a ``Scalar`` + this means that the column has a single value for every row in the batch. An ``ExecBatch`` also has a length property which describes how many rows are in a batch. So another way to - view a `Scalar` is a constant array with `length` elements. -* An `ExecBatch` contains additional information used by the exec plan. For example, an - `index` can be used to describe a batch's position in an ordered stream. We expect - that `ExecBatch` will also evolve to contain additional fields such as a selection vector. + view a ``Scalar`` is a constant array with ``length`` elements. +* An ``ExecBatch`` contains additional information used by the exec plan. For example, an + ``index`` can be used to describe a batch's position in an ordered stream. We expect + that ``ExecBatch`` will also evolve to contain additional fields such as a selection vector. .. figure:: scalar_vs_array.svg @@ -231,8 +231,8 @@ only zero copy if there are no scalars in the exec batch. .. note:: Both Acero and the compute module have "lightweight" versions of batches and arrays. - In the compute module these are called `BatchSpan`, `ArraySpan`, and `BufferSpan`. In - Acero the concept is called `KeyColumnArray`. These types were developed concurrently + In the compute module these are called ``BatchSpan``, ``ArraySpan``, and ``BufferSpan``. In + Acero the concept is called ``KeyColumnArray``. These types were developed concurrently and serve the same purpose. They aim to provide an array container that can be completely stack allocated (provided the data type is non-nested) in order to avoid heap allocation overhead. Ideally these two concepts will be merged someday. @@ -247,9 +247,9 @@ execution of the nodes. Both ExecPlan and ExecNode are tied to the lifecycle of They have state and are not expected to be restartable. .. warning:: - The structures within Acero, including `ExecBatch`, are still experimental. The `ExecBatch` - class should not be used outside of Acero. Instead, an `ExecBatch` should be converted to - a more standard structure such as a `RecordBatch`. + The structures within Acero, including ``ExecBatch``, are still experimental. The ``ExecBatch`` + class should not be used outside of Acero. Instead, an ``ExecBatch`` should be converted to + a more standard structure such as a ``RecordBatch``. Similarly, an ExecPlan is an internal concept. Users creating plans should be using Declaration objects. APIs for consuming and executing plans should abstract away the details of the underlying diff --git a/docs/source/cpp/acero/user_guide.rst b/docs/source/cpp/acero/user_guide.rst index adcc17216e5ae..0271be2180e99 100644 --- a/docs/source/cpp/acero/user_guide.rst +++ b/docs/source/cpp/acero/user_guide.rst @@ -455,8 +455,8 @@ can be selected from :ref:`this list of aggregation functions will be added which should alleviate this constraint. The aggregation can provide results as a group or scalar. For instances, -an operation like `hash_count` provides the counts per each unique record -as a grouped result while an operation like `sum` provides a single record. +an operation like ``hash_count`` provides the counts per each unique record +as a grouped result while an operation like ``sum`` provides a single record. Scalar Aggregation example: @@ -490,7 +490,7 @@ caller will repeatedly call this function until the generator function is exhaus will accumulate in memory. An execution plan should only have one "terminal" node (one sink node). An :class:`ExecPlan` can terminate early due to cancellation or an error, before the output is fully consumed. However, the plan can be safely destroyed independently -of the sink, which will hold the unconsumed batches by `exec_plan->finished()`. +of the sink, which will hold the unconsumed batches by ``exec_plan->finished()``. As a part of the Source Example, the Sink operation is also included; @@ -515,7 +515,7 @@ The consuming function may be called before a previous invocation has completed. function does not run quickly enough then many concurrent executions could pile up, blocking the CPU thread pool. The execution plan will not be marked finished until all consuming function callbacks have been completed. -Once all batches have been delivered the execution plan will wait for the `finish` future to complete +Once all batches have been delivered the execution plan will wait for the ``finish`` future to complete before marking the execution plan finished. This allows for workflows where the consumption function converts batches into async tasks (this is currently done internally for the dataset write node). diff --git a/docs/source/cpp/build_system.rst b/docs/source/cpp/build_system.rst index 0c94d7e5ce5dc..e80bca4c949dc 100644 --- a/docs/source/cpp/build_system.rst +++ b/docs/source/cpp/build_system.rst @@ -167,7 +167,7 @@ file into an executable linked with the Arrow C++ shared library: .. code-block:: makefile my_example: my_example.cc - $(CXX) -o $@ $(CXXFLAGS) $< $$(pkg-config --cflags --libs arrow) + $(CXX) -o $@ $(CXXFLAGS) $< $$(pkg-config --cflags --libs arrow) Many build systems support pkg-config. For example: diff --git a/docs/source/cpp/compute.rst b/docs/source/cpp/compute.rst index 546b6e5716df7..701c7d573ac0e 100644 --- a/docs/source/cpp/compute.rst +++ b/docs/source/cpp/compute.rst @@ -514,8 +514,8 @@ Mixed time resolution temporal inputs will be cast to finest input resolution. +------------+---------------------------------------------+ It's compatible with Redshift's decimal promotion rules. All decimal digits - are preserved for `add`, `subtract` and `multiply` operations. The result - precision of `divide` is at least the sum of precisions of both operands with + are preserved for ``add``, ``subtract`` and ``multiply`` operations. The result + precision of ``divide`` is at least the sum of precisions of both operands with enough scale kept. Error is returned if the result precision is beyond the decimal value range. @@ -1029,7 +1029,7 @@ These functions trim off characters on both sides (trim), or the left (ltrim) or +--------------------------+------------+-------------------------+---------------------+----------------------------------------+---------+ * \(1) Only characters specified in :member:`TrimOptions::characters` will be - trimmed off. Both the input string and the `characters` argument are + trimmed off. Both the input string and the ``characters`` argument are interpreted as ASCII characters. * \(2) Only trim off ASCII whitespace characters (``'\t'``, ``'\n'``, ``'\v'``, @@ -1570,7 +1570,7 @@ is the same, even though the UTC years would be different. Timezone handling ~~~~~~~~~~~~~~~~~ -`assume_timezone` function is meant to be used when an external system produces +``assume_timezone`` function is meant to be used when an external system produces "timezone-naive" timestamps which need to be converted to "timezone-aware" timestamps (see for example the `definition `__ @@ -1581,11 +1581,11 @@ Input timestamps are assumed to be relative to the timezone given in UTC-relative timestamps with the timezone metadata set to the above value. An error is returned if the timestamps already have the timezone metadata set. -`local_timestamp` function converts UTC-relative timestamps to local "timezone-naive" +``local_timestamp`` function converts UTC-relative timestamps to local "timezone-naive" timestamps. The timezone is taken from the timezone metadata of the input -timestamps. This function is the inverse of `assume_timezone`. Please note: +timestamps. This function is the inverse of ``assume_timezone``. Please note: **all temporal functions already operate on timestamps as if they were in local -time of the metadata provided timezone**. Using `local_timestamp` is only meant to be +time of the metadata provided timezone**. Using ``local_timestamp`` is only meant to be used when an external system expects local timestamps. +-----------------+-------+-------------+---------------+---------------------------------+-------+ @@ -1649,8 +1649,8 @@ overflow is detected. * \(1) CumulativeOptions has two optional parameters. The first parameter :member:`CumulativeOptions::start` is a starting value for the running - accumulation. It has a default value of 0 for `sum`, 1 for `prod`, min of - input type for `max`, and max of input type for `min`. Specified values of + accumulation. It has a default value of 0 for ``sum``, 1 for ``prod``, min of + input type for ``max``, and max of input type for ``min``. Specified values of ``start`` must be castable to the input type. The second parameter :member:`CumulativeOptions::skip_nulls` is a boolean. When set to false (the default), the first encountered null is propagated. When set to diff --git a/docs/source/developers/cpp/building.rst b/docs/source/developers/cpp/building.rst index 040a046c5153d..83ca4915c3355 100644 --- a/docs/source/developers/cpp/building.rst +++ b/docs/source/developers/cpp/building.rst @@ -312,7 +312,7 @@ depends on ``python`` being available). On some Linux distributions, running the test suite might require setting an explicit locale. If you see any locale-related errors, try setting the -environment variable (which requires the `locales` package or equivalent): +environment variable (which requires the ``locales`` package or equivalent): .. code-block:: diff --git a/docs/source/developers/documentation.rst b/docs/source/developers/documentation.rst index 8b1ea28c0f54b..a479065f6297e 100644 --- a/docs/source/developers/documentation.rst +++ b/docs/source/developers/documentation.rst @@ -259,7 +259,7 @@ Build the docs in the target directory: sphinx-build ./source/developers ./source/developers/_build -c ./source -D master_doc=temp_index This builds everything in the target directory to a folder inside of it -called ``_build`` using the config file in the `source` directory. +called ``_build`` using the config file in the ``source`` directory. Once you have verified the HTML documents, you can remove temporary index file: diff --git a/docs/source/developers/guide/step_by_step/arrow_codebase.rst b/docs/source/developers/guide/step_by_step/arrow_codebase.rst index 0beece991b197..0c194ab3a3f70 100644 --- a/docs/source/developers/guide/step_by_step/arrow_codebase.rst +++ b/docs/source/developers/guide/step_by_step/arrow_codebase.rst @@ -99,8 +99,8 @@ can be called from a function in another language. After a function is defined C++ we must create the binding manually to use it in that implementation. .. note:: - There is much you can learn by checking **Pull Requests** - and **unit tests** for similar issues. + There is much you can learn by checking **Pull Requests** + and **unit tests** for similar issues. .. tab-set:: diff --git a/docs/source/developers/guide/step_by_step/set_up.rst b/docs/source/developers/guide/step_by_step/set_up.rst index 9a2177568d6f5..9c808ceee7be6 100644 --- a/docs/source/developers/guide/step_by_step/set_up.rst +++ b/docs/source/developers/guide/step_by_step/set_up.rst @@ -118,10 +118,10 @@ Should give you a result similar to this: .. code:: console - origin https://github.com//arrow.git (fetch) - origin https://github.com//arrow.git (push) - upstream https://github.com/apache/arrow (fetch) - upstream https://github.com/apache/arrow (push) + origin https://github.com//arrow.git (fetch) + origin https://github.com//arrow.git (push) + upstream https://github.com/apache/arrow (fetch) + upstream https://github.com/apache/arrow (push) If you did everything correctly, you should now have a copy of the code in the ``arrow`` directory and two remotes that refer to your own GitHub diff --git a/docs/source/developers/java/development.rst b/docs/source/developers/java/development.rst index 17d47c324ce12..3f0ff6cdd0103 100644 --- a/docs/source/developers/java/development.rst +++ b/docs/source/developers/java/development.rst @@ -118,7 +118,7 @@ This checks the code style of all source code under the current directory or fro $ mvn checkstyle:check -Maven `pom.xml` style is enforced with Spotless using `Apache Maven pom.xml guidelines`_ +Maven ``pom.xml`` style is enforced with Spotless using `Apache Maven pom.xml guidelines`_ You can also just check the style without building the project. This checks the style of all pom.xml files under the current directory or from within an individual module. diff --git a/docs/source/developers/release.rst b/docs/source/developers/release.rst index 0b3a83dc5aabe..d903cc71bd5c4 100644 --- a/docs/source/developers/release.rst +++ b/docs/source/developers/release.rst @@ -106,7 +106,7 @@ If there is consensus and there is a Release Manager willing to take the effort the release a patch release can be created. Committers can tag issues that should be included on the next patch release using the -`backport-candidate` label. Is the responsability of the author or the committer to add the +``backport-candidate`` label. Is the responsability of the author or the committer to add the label to the issue to help the Release Manager identify the issues that should be backported. If a specific issue is identified as the reason to create a patch release the Release Manager @@ -117,7 +117,7 @@ Be sure to go through on the following checklist: #. Create milestone #. Create maintenance branch #. Include issue that was requested as requiring new patch release -#. Add new milestone to issues with `backport-candidate` label +#. Add new milestone to issues with ``backport-candidate`` label #. cherry-pick issues into maintenance branch Creating a Release Candidate diff --git a/docs/source/format/CanonicalExtensions.rst b/docs/source/format/CanonicalExtensions.rst index c60f095dd354d..c258f889dc6ac 100644 --- a/docs/source/format/CanonicalExtensions.rst +++ b/docs/source/format/CanonicalExtensions.rst @@ -77,7 +77,7 @@ Official List Fixed shape tensor ================== -* Extension name: `arrow.fixed_shape_tensor`. +* Extension name: ``arrow.fixed_shape_tensor``. * The storage type of the extension: ``FixedSizeList`` where: @@ -153,7 +153,7 @@ Fixed shape tensor Variable shape tensor ===================== -* Extension name: `arrow.variable_shape_tensor`. +* Extension name: ``arrow.variable_shape_tensor``. * The storage type of the extension is: ``StructArray`` where struct is composed of **data** and **shape** fields describing a single diff --git a/docs/source/format/Columnar.rst b/docs/source/format/Columnar.rst index ec6a7fa5e334a..7c853de7829be 100644 --- a/docs/source/format/Columnar.rst +++ b/docs/source/format/Columnar.rst @@ -312,7 +312,7 @@ Each value in this layout consists of 0 or more bytes. While primitive arrays have a single values buffer, variable-size binary have an **offsets** buffer and **data** buffer. -The offsets buffer contains `length + 1` signed integers (either +The offsets buffer contains ``length + 1`` signed integers (either 32-bit or 64-bit, depending on the logical type), which encode the start position of each slot in the data buffer. The length of the value in each slot is computed using the difference between the offset @@ -374,7 +374,7 @@ locations are indicated using a **views** buffer, which may point to one of potentially several **data** buffers or may contain the characters inline. -The views buffer contains `length` view structures with the following layout: +The views buffer contains ``length`` view structures with the following layout: :: @@ -394,7 +394,7 @@ should be interpreted. In the short string case the string's bytes are inlined — stored inside the view itself, in the twelve bytes which follow the length. Any remaining bytes -after the string itself are padded with `0`. +after the string itself are padded with ``0``. In the long string case, a buffer index indicates which data buffer stores the data bytes and an offset indicates where in that buffer the diff --git a/docs/source/format/FlightSql.rst b/docs/source/format/FlightSql.rst index 181efce286e70..c37c407f57c6d 100644 --- a/docs/source/format/FlightSql.rst +++ b/docs/source/format/FlightSql.rst @@ -196,7 +196,7 @@ in the ``app_metadata`` field of the Flight RPC ``PutResult`` returned. When used with DoPut: load the stream of Arrow record batches into the specified target table and return the number of rows ingested - via a `DoPutUpdateResult` message. + via a ``DoPutUpdateResult`` message. Flight Server Session Management -------------------------------- diff --git a/docs/source/format/Integration.rst b/docs/source/format/Integration.rst index c800255687796..436747989acf3 100644 --- a/docs/source/format/Integration.rst +++ b/docs/source/format/Integration.rst @@ -501,7 +501,7 @@ integration testing actually tests. There are two types of integration test cases: the ones populated on the fly by the data generator in the Archery utility, and *gold* files that exist -in the `arrow-testing ` +in the `arrow-testing `_ repository. Data Generator Tests diff --git a/docs/source/java/algorithm.rst b/docs/source/java/algorithm.rst index 06ed32bd48cf7..d4838967d614f 100644 --- a/docs/source/java/algorithm.rst +++ b/docs/source/java/algorithm.rst @@ -82,7 +82,7 @@ for fixed width and variable width vectors, respectively. Both algorithms run in 3. **Index sorter**: this sorter does not actually sort the vector. Instead, it returns an integer vector, which correspond to indices of vector elements in sorted order. With the index vector, one can -easily construct a sorted vector. In addition, some other tasks can be easily achieved, like finding the ``k``th +easily construct a sorted vector. In addition, some other tasks can be easily achieved, like finding the ``k`` th smallest value in the vector. Index sorting is supported by ``org.apache.arrow.algorithm.sort.IndexSorter``, which runs in ``O(nlog(n))`` time. It is applicable to vectors of any type. diff --git a/docs/source/java/flight_sql_jdbc_driver.rst b/docs/source/java/flight_sql_jdbc_driver.rst index cc8822247b007..f95c2ac755d97 100644 --- a/docs/source/java/flight_sql_jdbc_driver.rst +++ b/docs/source/java/flight_sql_jdbc_driver.rst @@ -162,7 +162,7 @@ the Flight SQL service as gRPC headers. For example, the following URI :: This will connect without authentication or encryption, to a Flight SQL service running on ``localhost`` on port 12345. Each request will -also include a `database=mydb` gRPC header. +also include a ``database=mydb`` gRPC header. Connection parameters may also be supplied using the Properties object when using the JDBC Driver Manager to connect. When supplying using diff --git a/docs/source/java/install.rst b/docs/source/java/install.rst index a551edc36c477..dc6a55c87fcd6 100644 --- a/docs/source/java/install.rst +++ b/docs/source/java/install.rst @@ -63,7 +63,7 @@ Modifying the command above for Flight: Otherwise, you may see errors like ``java.lang.IllegalAccessError: superclass access check failed: class org.apache.arrow.flight.ArrowMessage$ArrowBufRetainingCompositeByteBuf (in module org.apache.arrow.flight.core) cannot access class io.netty.buffer.CompositeByteBuf (in unnamed module ...) because module -org.apache.arrow.flight.core does not read unnamed module ... +org.apache.arrow.flight.core does not read unnamed module ...`` Finally, if you are using arrow-dataset, you'll also need to report that JDK internals need to be exposed. Modifying the command above for arrow-memory: diff --git a/docs/source/java/ipc.rst b/docs/source/java/ipc.rst index 01341ff2cc391..f5939179177d5 100644 --- a/docs/source/java/ipc.rst +++ b/docs/source/java/ipc.rst @@ -81,7 +81,7 @@ Here we used an in-memory stream, but this could have been a socket or some othe writer.end(); Note that, since the :class:`VectorSchemaRoot` in the writer is a container that can hold batches, batches flow through -:class:`VectorSchemaRoot` as part of a pipeline, so we need to populate data before `writeBatch`, so that later batches +:class:`VectorSchemaRoot` as part of a pipeline, so we need to populate data before ``writeBatch``, so that later batches could overwrite previous ones. Now the :class:`ByteArrayOutputStream` contains the complete stream which contains 5 record batches. diff --git a/docs/source/java/quickstartguide.rst b/docs/source/java/quickstartguide.rst index a71ddc5b5e55f..1f3ec861d3f46 100644 --- a/docs/source/java/quickstartguide.rst +++ b/docs/source/java/quickstartguide.rst @@ -195,10 +195,10 @@ Example: Create a dataset of names (strings) and ages (32-bit signed integers). .. code-block:: shell VectorSchemaRoot created: - age name - 10 Dave - 20 Peter - 30 Mary + age name + 10 Dave + 20 Peter + 30 Mary Interprocess Communication (IPC) @@ -306,10 +306,10 @@ Example: Read the dataset from the previous example from an Arrow IPC file (rand Record batches in file: 1 VectorSchemaRoot read: - age name - 10 Dave - 20 Peter - 30 Mary + age name + 10 Dave + 20 Peter + 30 Mary More examples available at `Arrow Java Cookbook`_. diff --git a/docs/source/java/substrait.rst b/docs/source/java/substrait.rst index c5857dcc23f75..fa20dbd61dbfb 100644 --- a/docs/source/java/substrait.rst +++ b/docs/source/java/substrait.rst @@ -100,9 +100,9 @@ Here is an example of a Java program that queries a Parquet file using Java Subs .. code-block:: text // Results example: - FieldPath(0) FieldPath(1) FieldPath(2) FieldPath(3) - 0 ALGERIA 0 haggle. carefully final deposits detect slyly agai - 1 ARGENTINA 1 al foxes promise slyly according to the regular accounts. bold requests alon + FieldPath(0) FieldPath(1) FieldPath(2) FieldPath(3) + 0 ALGERIA 0 haggle. carefully final deposits detect slyly agai + 1 ARGENTINA 1 al foxes promise slyly according to the regular accounts. bold requests alon Executing Projections and Filters Using Extended Expressions ============================================================ @@ -189,13 +189,13 @@ This Java program: .. code-block:: text - column-1 column-2 - 13 ROMANIA - ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account - 14 SAUDI ARABIA - ts. silent requests haggle. closely express packages sleep across the blithely - 12 VIETNAM - hely enticingly express accounts. even, final - 13 RUSSIA - requests against the platelets use never according to the quickly regular pint - 13 UNITED KINGDOM - eans boost carefully special requests. accounts are. carefull - 11 UNITED STATES - y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be + column-1 column-2 + 13 ROMANIA - ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account + 14 SAUDI ARABIA - ts. silent requests haggle. closely express packages sleep across the blithely + 12 VIETNAM - hely enticingly express accounts. even, final + 13 RUSSIA - requests against the platelets use never according to the quickly regular pint + 13 UNITED KINGDOM - eans boost carefully special requests. accounts are. carefull + 11 UNITED STATES - y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be .. _`Substrait`: https://substrait.io/ .. _`Substrait Java`: https://github.com/substrait-io/substrait-java diff --git a/docs/source/java/table.rst b/docs/source/java/table.rst index 603910f51694f..5aa95e153cea0 100644 --- a/docs/source/java/table.rst +++ b/docs/source/java/table.rst @@ -75,7 +75,7 @@ Tables are created from a ``VectorSchemaRoot`` as shown below. The memory buffer Table t = new Table(someVectorSchemaRoot); -If you now update the vectors held by the ``VectorSchemaRoot`` (using some version of `ValueVector#setSafe()`), it would reflect those changes, but the values in table *t* are unchanged. +If you now update the vectors held by the ``VectorSchemaRoot`` (using some version of ``ValueVector#setSafe()``), it would reflect those changes, but the values in table *t* are unchanged. Creating a Table from FieldVectors ********************************** @@ -243,7 +243,7 @@ It is important to recognize that rows are NOT reified as objects, but rather op Getting a row ************* -Calling `immutableRow()` on any table instance returns a new ``Row`` instance. +Calling ``immutableRow()`` on any table instance returns a new ``Row`` instance. .. code-block:: Java @@ -262,7 +262,7 @@ Since rows are iterable, you can traverse a table using a standard while loop: // do something useful here } -``Table`` implements `Iterable` so you can access rows directly from a table in an enhanced *for* loop: +``Table`` implements ``Iterable`` so you can access rows directly from a table in an enhanced *for* loop: .. code-block:: Java @@ -272,7 +272,7 @@ Since rows are iterable, you can traverse a table using a standard while loop: ... } -Finally, while rows are usually iterated in the order of the underlying data vectors, but they are also positionable using the `Row#setPosition()` method, so you can skip to a specific row. Row numbers are 0-based. +Finally, while rows are usually iterated in the order of the underlying data vectors, but they are also positionable using the ``Row#setPosition()`` method, so you can skip to a specific row. Row numbers are 0-based. .. code-block:: Java @@ -281,7 +281,7 @@ Finally, while rows are usually iterated in the order of the underlying data vec Any changes to position are applied to all the columns in the table. -Note that you must call `next()`, or `setPosition()` before accessing values via a row. Failure to do so results in a runtime exception. +Note that you must call ``next()``, or ``setPosition()`` before accessing values via a row. Failure to do so results in a runtime exception. Read operations using rows ************************** @@ -304,7 +304,7 @@ You can also get value using a nullable ``ValueHolder``. For example: This can be used to retrieve values without creating a new Object for each. -In addition to getting values, you can check if a value is null using `isNull()`. This is important if the vector contains any nulls, as asking for a value from a vector can cause NullPointerExceptions in some cases. +In addition to getting values, you can check if a value is null using ``isNull()``. This is important if the vector contains any nulls, as asking for a value from a vector can cause NullPointerExceptions in some cases. .. code-block:: Java @@ -352,13 +352,13 @@ Working with the C-Data interface The ability to work with native code is required for many Arrow features. This section describes how tables can be be exported for use with native code -Exporting works by converting the data to a ``VectorSchemaRoot`` instance and using the existing facilities to transfer the data. You could do it yourself, but that isn't ideal because conversion to a vector schema root breaks the immutability guarantees. Using the `exportTable()` methods in the `Data`_ class avoids this concern. +Exporting works by converting the data to a ``VectorSchemaRoot`` instance and using the existing facilities to transfer the data. You could do it yourself, but that isn't ideal because conversion to a vector schema root breaks the immutability guarantees. Using the ``exportTable()`` methods in the `Data`_ class avoids this concern. .. code-block:: Java Data.exportTable(bufferAllocator, table, dictionaryProvider, outArrowArray); -If the table contains dictionary-encoded vectors and was constructed with a ``DictionaryProvider``, the provider argument to `exportTable()` can be omitted and the table's provider attribute will be used: +If the table contains dictionary-encoded vectors and was constructed with a ``DictionaryProvider``, the provider argument to ``exportTable()`` can be omitted and the table's provider attribute will be used: .. code-block:: Java diff --git a/docs/source/python/api/compute.rst b/docs/source/python/api/compute.rst index ae48578a1bd61..09fd9765738dc 100644 --- a/docs/source/python/api/compute.rst +++ b/docs/source/python/api/compute.rst @@ -173,7 +173,7 @@ variants which detect domain errors where appropriate. Comparisons ----------- -These functions expect two inputs of the same type. If one of the inputs is `null` +These functions expect two inputs of the same type. If one of the inputs is ``null`` they return ``null``. .. autosummary:: diff --git a/docs/source/python/data.rst b/docs/source/python/data.rst index 9156157fcd0c2..f17475138c9a4 100644 --- a/docs/source/python/data.rst +++ b/docs/source/python/data.rst @@ -76,7 +76,7 @@ We use the name **logical type** because the **physical** storage may be the same for one or more types. For example, ``int64``, ``float64``, and ``timestamp[ms]`` all occupy 64 bits per value. -These objects are `metadata`; they are used for describing the data in arrays, +These objects are ``metadata``; they are used for describing the data in arrays, schemas, and record batches. In Python, they can be used in functions where the input data (e.g. Python objects) may be coerced to more than one Arrow type. @@ -99,7 +99,7 @@ types' children. For example, we can define a list of int32 values with: t6 = pa.list_(t1) t6 -A `struct` is a collection of named fields: +A ``struct`` is a collection of named fields: .. ipython:: python diff --git a/docs/source/python/extending_types.rst b/docs/source/python/extending_types.rst index 8df0ef0b1fe99..83fce84f47c08 100644 --- a/docs/source/python/extending_types.rst +++ b/docs/source/python/extending_types.rst @@ -101,7 +101,7 @@ define the ``__arrow_array__`` method to return an Arrow array:: import pyarrow return pyarrow.array(..., type=type) -The ``__arrow_array__`` method takes an optional `type` keyword which is passed +The ``__arrow_array__`` method takes an optional ``type`` keyword which is passed through from :func:`pyarrow.array`. The method is allowed to return either a :class:`~pyarrow.Array` or a :class:`~pyarrow.ChunkedArray`. diff --git a/docs/source/python/filesystems.rst b/docs/source/python/filesystems.rst index 22f983a60c349..23d10aaaad720 100644 --- a/docs/source/python/filesystems.rst +++ b/docs/source/python/filesystems.rst @@ -182,7 +182,7 @@ Example how you can read contents from a S3 bucket:: Note that it is important to configure :class:`S3FileSystem` with the correct -region for the bucket being used. If `region` is not set, the AWS SDK will +region for the bucket being used. If ``region`` is not set, the AWS SDK will choose a value, defaulting to 'us-east-1' if the SDK version is <1.8. Otherwise it will try to use a variety of heuristics (environment variables, configuration profile, EC2 metadata server) to resolve the region. @@ -277,7 +277,7 @@ load time, since the library may not be in your LD_LIBRARY_PATH), and relies on some environment variables. * ``HADOOP_HOME``: the root of your installed Hadoop distribution. Often has - `lib/native/libhdfs.so`. + ``lib/native/libhdfs.so``. * ``JAVA_HOME``: the location of your Java SDK installation. diff --git a/docs/source/python/install.rst b/docs/source/python/install.rst index 4b966e6d2653d..12555c93067f9 100644 --- a/docs/source/python/install.rst +++ b/docs/source/python/install.rst @@ -83,7 +83,7 @@ While Arrow uses the OS-provided timezone database on Linux and macOS, it requir user-provided database on Windows. To download and extract the text version of the IANA timezone database follow the instructions in the C++ :ref:`download-timezone-database` or use pyarrow utility function -`pyarrow.util.download_tzdata_on_windows()` that does the same. +``pyarrow.util.download_tzdata_on_windows()`` that does the same. By default, the timezone database will be detected at ``%USERPROFILE%\Downloads\tzdata``. If the database has been downloaded in a different location, you will need to set diff --git a/docs/source/python/integration/extending.rst b/docs/source/python/integration/extending.rst index b380fea7e902c..d4d099bcf43c8 100644 --- a/docs/source/python/integration/extending.rst +++ b/docs/source/python/integration/extending.rst @@ -474,7 +474,7 @@ Toolchain Compatibility (Linux) The Python wheels for Linux are built using the `PyPA manylinux images `_ which use -the CentOS `devtoolset-9`. In addition to the other notes +the CentOS ``devtoolset-9``. In addition to the other notes above, if you are compiling C++ using these shared libraries, you will need to make sure you use a compatible toolchain as well or you might see a segfault during runtime. diff --git a/docs/source/python/memory.rst b/docs/source/python/memory.rst index 23474b923718d..7b49d48ab20fa 100644 --- a/docs/source/python/memory.rst +++ b/docs/source/python/memory.rst @@ -46,7 +46,7 @@ parent-child relationships. There are many implementations of ``arrow::Buffer``, but they all provide a standard interface: a data pointer and length. This is similar to Python's -built-in `buffer protocol` and ``memoryview`` objects. +built-in ``buffer protocol`` and ``memoryview`` objects. A :class:`Buffer` can be created from any Python object implementing the buffer protocol by calling the :func:`py_buffer` function. Let's consider diff --git a/docs/source/python/timestamps.rst b/docs/source/python/timestamps.rst index cecbd5b595bc7..80a1b7280cbfa 100644 --- a/docs/source/python/timestamps.rst +++ b/docs/source/python/timestamps.rst @@ -24,7 +24,7 @@ Arrow/Pandas Timestamps Arrow timestamps are stored as a 64-bit integer with column metadata to associate a time unit (e.g. milliseconds, microseconds, or nanoseconds), and an -optional time zone. Pandas (`Timestamp`) uses a 64-bit integer representing +optional time zone. Pandas (``Timestamp``) uses a 64-bit integer representing nanoseconds and an optional time zone. Python/Pandas timestamp types without a associated time zone are referred to as "Time Zone Naive". Python/Pandas timestamp types with an associated time zone are