ARROW-2102: [C++] Implement take kernel functions - primitive value type #1563

alphalfalfa · 2018-02-06T22:27:29Z

Currently the take kernel supports:

Primitive value types (NA, Boolean and numeric types)
Single numeric index
Multiple flat indices in an array

The kernel does not yet handle special indices including negative integers or out-of-bound indices. ARROW-2105 is created for such feature. Parent ticket is at Arrow-772.

The current build script for Travis CI (`ci/travis_before_script_c_glib.sh`) does not use `--enable-gtk-doc` option on Mac. Now it can be used also on Mac, so it should be enabled to check build status. Author: Wataru Shimizu <waruzilla@gmail.com> Closes apache#1195 from wagavulin/ci-mac-enable-gtk-doc and squashes the following commits: 1b93aa7 [Wataru Shimizu] Add environment variable for xsltproc on macOS 675c910 [Wataru Shimizu] Enable gtk-doc on Travis CI Mac environment

…eated in custom serializer This uses the NumPyBuffer built into Arrow's Tensor facility to protect the numpy arrays holding the Tensors to be serialized. See also the problem description in https://issues.apache.org/jira/browse/ARROW-1695. Author: Philipp Moritz <pcmoritz@gmail.com> Closes apache#1220 from pcmoritz/fix-serialize-tensors and squashes the following commits: 7e23bb5 [Philipp Moritz] fix linting dce92ad [Philipp Moritz] fix handling of numpy arrays generated in the custom serializer methods

… directory Omitting the trailing slash results in a `java/apidocs` subdirectory being created Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1221 from wesm/ARROW-1686 and squashes the following commits: 565681e [Wes McKinney] Move API docs generation readme to RELEASE_MANAGEMENT.md 2fcf5fc [Wes McKinney] rsync contents of apidocs directory into site java directory

…ch more than once Author: Brian Hulette <brian.hulette@ccri.com> Closes apache#1222 from TheNeuralBit/ARROW-1698 and squashes the following commits: aec2f6d [Brian Hulette] readDictionary now returns a single vector or null f7b251b [Brian Hulette] only load a dictionary once

Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1230 from xhochy/update-manylinux1-jemalloc and squashes the following commits: 6a29fa8 [Uwe L. Korn] Update jemalloc in manylinux1 build

@BryanCutler

This is being used in PySpark. After 0.8.0 we should migrate Spark to use the `pyarrow.types` APIs and then remove this from the top level namespace again cc @BryanCutler Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1229 from wesm/ARROW-1683 and squashes the following commits: 7378d9d [Wes McKinney] Restore TimestampType to pyarrow namespace

This will save us repeating ourselves when users report bugs or request features on GitHub Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1226 from wesm/ARROW-1697 and squashes the following commits: 8a2ed58 [Wes McKinney] Add note about mailing list b11e497 [Wes McKinney] Add ISSUE_TEMPLATE.md

…ation This also restructures the code such that it is easier to reset the default serialization context back to the initial state after more handlers have been registered. Author: Philipp Moritz <pcmoritz@gmail.com> Closes apache#1223 from pcmoritz/torch-tensor and squashes the following commits: db09afc [Philipp Moritz] fix test ba0856c [Philipp Moritz] don't run pytorch test on python 2.7 264d199 [Philipp Moritz] remove import that is not required 882d9a5 [Philipp Moritz] small restructuring and support all PyTorch tensor types c6dac9e [Philipp Moritz] add -q flag 23de67b [Philipp Moritz] add -y to torch installation 9814897 [Philipp Moritz] test torch tensor conversion 375bbfa [Philipp Moritz] support serializing torch tensors

It's caused by GObject Introspection package in Ubuntu 14.04 is old. Recent GObject Introspection doesn't have this problem.

This PR makes it possible to add serialization handlers that allow to deserialize python objects using zero copy. If the serialization handler returns an arrow buffer, it will be appended to the serialized objects and the object can be reconstructed from there without copying the buffer. TODO before merge: - [x] Add pandas zero copy buffer read - [x] See if fixing the TODO about union tags makes the code cleaner Author: Philipp Moritz <pcmoritz@gmail.com> Closes apache#1231 from pcmoritz/buffer-serialization and squashes the following commits: 10a01c0 [Philipp Moritz] run clang-format 529366d [Philipp Moritz] fix linting 7d88a67 [Philipp Moritz] fixes and make pandas deserialization zero-copy 2b75a0d [Philipp Moritz] fix linting e078cb0 [Philipp Moritz] use union array fields to distinguish between ints and tensor/buffer references dbd2364 [Philipp Moritz] add read path 0cc6d45 [Philipp Moritz] add read path 9f0173d [Philipp Moritz] buffer serialization write path

Initially I was looking to see if we could gracefully skip the tests if we have the requisite libraries available but are unable to connect to a running HDFS cluster, but I think this would mask errors that we want to see. So this just makes it so that developers who don't want to see the test failure can skip building the unit test executable Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1232 from wesm/ARROW-641 and squashes the following commits: eed9ea4 [Wes McKinney] Do not build io-hdfs-test if ARROW_HDFS is off

…s in all builds Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1219 from wesm/ARROW-1087 and squashes the following commits: 5444b82 [Wes McKinney] Use more stable include 6cc855d [Wes McKinney] Add stub rst file about C extensions with pyarrow

…actor existing code to new variant We should be very strict to not return Status from functions that cannot fail Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1227 from wesm/ARROW-1671 and squashes the following commits: 8a15a06 [Wes McKinney] Deprecate arrow::MakeArray that returns Status, refactor existing code to use new MakeArray returning std::shared_ptr<Array>

Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1236 from xhochy/ARROW-1708 and squashes the following commits: 0695745 [Uwe L. Korn] ARROW-1708: [JS] Fix linter error

Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#1235 from xhochy/ARROW-1707 and squashes the following commits: 21a6734 [Uwe L. Korn] Add link to GitBox for account setup d484bb0 [Uwe L. Korn] ARROW-1707: Update dev README after movement to GitBox

…ng, improve docs This was actually already documented. I improved the docstring for `ParquetWriter` and added a unit test to validate the user API. Added ParquetWriter to the API listing also Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1218 from wesm/ARROW-571 and squashes the following commits: e4f44b4 [Wes McKinney] Use py27-compatible class __doc__ setting bb16d9b [Wes McKinney] Add ParquetWriter to API listing 4cda8ab [Wes McKinney] Add unit test for incremental Parquet file building

…nit tests In the event that the offsets array has nulls, this will backward-fill the offsets to compute the correct value sizes. Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1224 from wesm/ARROW-507 and squashes the following commits: 9027c14 [Wes McKinney] Clean valid bits to remove trailing set bit 8d2cb51 [Wes McKinney] Implement / add tests for ListArray.from_arrays in Python 1c6a870 [Wes McKinney] Complete C++ implementation, unit test for ListArray::FromArrays, handling of offsets with nulls

This patch includes ARROW-1172 which I put up separately. Putting this up for comments on the API before I go too far down the rabbit hole. The idea is to make it simpler for users to construct record batches given a known schema. For example, this could be used in turbodbc or another database driver closes apache#810 incidentally Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#802 from wesm/ARROW-1114 and squashes the following commits: 5f104c4 [Wes McKinney] Rename Create to Make, remove const method versions a8b6c5c [Wes McKinney] Update doxygen comments, change FlushAndReset to instead have Flush with a flag for resetting 2573ae7 [Wes McKinney] Test invalid field length 256419f [Wes McKinney] Basic test passing be7e232 [Wes McKinney] Start table_builder-test.cc 9660502 [Wes McKinney] Draft RecordBatchBuilder, no tests or benchmarks yet

Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1238 from wesm/ARROW-1654 and squashes the following commits: 2e6f9e3 [Wes McKinney] Add pickling test cases for timestamp, decimal 1827b23 [Wes McKinney] Fix pickling on py27, implement for Schema. Also pickle field/schema metadata 1395583 [Wes McKinney] Implement pickling for list, struct, add __richcmp__ for Field 366f428 [Wes McKinney] Start implementing pickling for DataType, Field

This closes [ARROW-1720](https://issues.apache.org/jira/browse/ARROW-1720). Author: Licht-T <licht-t@outlook.jp> Closes apache#1243 from Licht-T/fix-unbound-chunk and squashes the following commits: cabdd43 [Licht-T] TST: Add bounds check tests for chunk getter bda7f4c [Licht-T] BUG: Implement bounds check in chunk getter

This got messed up during one of the patches in which these files were refactored. Once the build fails, I will fix the lint errors Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1242 from wesm/ARROW-1711 and squashes the following commits: cd4b655 [Wes McKinney] Fix more flake8 warnings 2eb8bf4 [Wes McKinney] Fix flake8 issues cef7a7c [Wes McKinney] Fix flake8 calls to lint the right directories

@TobyShaw

… to avoid using nullptr in public headers cc @TobyShaw. Can you test this? Close apache#1098 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1228 from wesm/ARROW-1134 and squashes the following commits: bf18158 [Wes McKinney] Only define NULLPTR if not already defined a51dd88 [Wes McKinney] Add NULLPTR macro to avoid using nullptr in public headers for C++/CLI users

Author: Phillip Cloud <cpcloud@gmail.com> Closes apache#1211 from cpcloud/ARROW-1588 and squashes the following commits: ae0d562 [Phillip Cloud] ARROW-1588: [C++/Format] Harden Decimal Format

Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1247 from kou/c-glib-release-verify and squashes the following commits: e9f2307 [Kouhei Sutou] [GLib] Add setup description to verify C GLib build

@BryanCutler

…p units Several JIRAs here that made sense to tackle together: * ARROW-1680 * ARROW-1482 * ARROW-1484 * ARROW-1524 This also fixes bugs relating to ignoring the offset in sliced arrays in some of the cast kernel implementations. cc @BryanCutler @xhochy @cpcloud Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1245 from wesm/ARROW-1484 and squashes the following commits: 6f6107f [Wes McKinney] Test that timestamp truncation raises for safe casts 1d28ce2 [Wes McKinney] Fix typo a820ed9 [Wes McKinney] Add test case for ARROW-1680 ac9845e [Wes McKinney] Implement date32<->date64 casts, unit tests 4911b09 [Wes McKinney] Add time casts 6441a95 [Wes McKinney] Write unit tests for timestamp to timestamp casts 1bde233 [Wes McKinney] Cast table for timestamps 569dfb8 [Wes McKinney] Fix existing cast tests with sliced arrays 884f6d2 [Wes McKinney] Baby steps a040136 [Wes McKinney] Placeholders for time32/64/timestamp cast functions fddf466 [Wes McKinney] Stub out unit test for timestamp unit change

…'t supported in numpy_to_arrow.cc This closes [ARROW-1721](https://issues.apache.org/jira/projects/ARROW/issues/ARROW-1721). Author: Licht-T <licht-t@outlook.jp> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1246 from Licht-T/feature-object-from_pandas-mask and squashes the following commits: 41a1229 [Wes McKinney] Fix flake8 issues d754533 [Licht-T] Fix lint issues by clang-format-4.0 7ef7f78 [Licht-T] Revert "Fix lint issues" 5c6c182 [Licht-T] Fix lint issues 78d3c3f [Licht-T] TST: Add tests of null-mask check for object types 72030bf [Licht-T] ENH: Implement null-mask check for object types

This also makes Feather writes more robust to columns having a mix of unicode and bytes (these gets coerced to binary) Also resolves ARROW-1672 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1250 from wesm/ARROW-1675 and squashes the following commits: 9d55886 [Wes McKinney] Use RecordBatch.from_pandas in Feather write path. Fix test cases

…test pandas roundtrips I ran into this rough edge today, invariably serialization code paths will need to send across a DataFrame with no columns, this will need to work even if `preserve_index=False` Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1252 from wesm/ARROW-1732 and squashes the following commits: a240c05 [Wes McKinney] Permit creating record batches with no columns, test pandas roundtrips

This makes performance debugging much easier, as it allows you to track down what (Arrow) data is causing unexpected delays in loading. It also makes testing features like ARROW-1689 easier as you can prove (via unit tests) that copies are not being made. Author: Nick White <n.j.white@gmail.com> Closes apache#1233 from njwhite/feature/zerocopycategories and squashes the following commits: b06f50d [Nick White] ARROW-1689 Don't Deserialize the Dictionary Twice a968b0b [Nick White] ARROW-1689 Allow User To Request No Data Copies

This was ported from parquet-mr/parquet-cpp and modified to work in Python 2 and 3 Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1248 from wesm/ARROW-587 and squashes the following commits: da49b1c [Wes McKinney] Another buglet d65a1b5 [Wes McKinney] Fix a python2 only statement 938b6dc [Wes McKinney] Fix script to pass right JIRA object bbbce86 [Wes McKinney] Add fix version to PR merge tool (ported from apache/parquet-mr)

…tRequest. Author: Robert Nishihara <robertnishihara@gmail.com> Closes apache#1479 from robertnishihara/deduplicatefiledescriptors and squashes the following commits: 9be9643 [Robert Nishihara] Fix bug. 8a827cf [Robert Nishihara] Remove mmap_size from PlasmaObject. ab30d7d [Robert Nishihara] Fix tests. 2916e87 [Robert Nishihara] Remove mmap_size from PlasmaObjectSpec, and file_descriptor -> fd. 7f5c618 [Robert Nishihara] Deduplicate file descriptors when store replies to Get. ab12d63 [Robert Nishihara] Make Create return a MutableBuffer.

…ype check exceptions. (apache#1495)

Author: Kouhei Sutou <kou@clear-code.com> Closes apache#1494 from kou/glib-support-distclean and squashes the following commits: d660e0f [Kouhei Sutou] [GLib] Support "make distclean"

Author: Phillip Cloud <cpcloud@gmail.com> Closes apache#1489 from cpcloud/ARROW-1580 and squashes the following commits: ff81567 [Phillip Cloud] Move to sphinx 700d2c5 [Phillip Cloud] Remove link to nightlies 9fbe9ac [Phillip Cloud] Add build artifact location cb9f2a5 [Phillip Cloud] [Python] Instructions for setting up nightly builds on Linux

…trings_to_categorical` is True This closes [ARROW-1997](https://issues.apache.org/jira/browse/ARROW-1997). The problem is ```python >>> import pandas as pd >>> import pyarrow as pa >>> >>> df = pd.DataFrame({ ... 'Foo': ['A', 'A', 'B', 'B'] ... }) >>> table = pa.Table.from_pandas(df) >>> df = table.to_pandas(strings_to_categorical=True) <class 'pandas.core.categorical.Categorical'> Traceback (most recent call last): File "<stdin>", line 1, in <module> File "table.pxi", line 1043, in pyarrow.lib.Table.to_pandas File "pyarrow/pandas_compat.py", line 535, in table_to_blockmanager blocks = _table_to_blocks(options, block_table, nthreads, memory_pool) File "pyarrow/pandas_compat.py", line 629, in _table_to_blocks return [_reconstruct_block(item) for item in result] File "pyarrow/pandas_compat.py", line 436, in _reconstruct_block ordered=item['ordered']) File "/home/rito/miniconda3/envs/pyarrow-dev-27/lib/python2.7/site-packages/pandas/core/categorical.py", line 624, in from_codes raise ValueError("codes need to be between -1 and " ValueError: codes need to be between -1 and len(categories)-1 ``` When `strings_to_categorical=True`, the categorical index is newly created in `to_pandas` procedure. But, this passes data to Python by zero-copy, so the array is deallocated. https://github.com/Licht-T/arrow/blob/be58af6dd0333652abbe2333ee5968df3f2e371f/cpp/src/arrow/python/arrow_to_pandas.cc#L1040 Author: Licht-T <licht-t@outlook.jp> Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1480 from Licht-T/fix-to_pandas-with-strings_to_categorical and squashes the following commits: 61eac9c [Wes McKinney] Adjust error message c1bc353 [Licht-T] TST: Add test for to_pandas no-NA strings with strings_to_categorical cce3f50 [Licht-T] BUG: Avoid zero-copy-option in to_pandas when strings_to_categorical is True

Author: yosuke shiro <ys2014hs@gmail.com> Closes apache#1496 from shiro615/build-instruction-on-macos-and-homebrew-is-incomplete and squashes the following commits: 6b32f68 [yosuke shiro] [C++] fix Build instruction on macOS and Homebrew

…lue data Modified BinaryBuilder::Resize(int64_t) so that when building BinaryArrays with a known size, space is also reserved for value_data_builder_ to prevent internal reallocation. Author: Panchen Xue <pan.panchen.xue@gmail.com> Closes apache#1481 from xuepanchen/master and squashes the following commits: 707b67b [Panchen Xue] ARROW-1712: [C++] Fix lint errors 360e601 [Panchen Xue] Merge branch 'master' of https://github.com/xuepanchen/arrow d4bbd15 [Panchen Xue] ARROW-1712: [C++] Modify test case for BinaryBuilder::ReserveData() and change arguments for offsets_builder_.Resize() 77f8f3c [Panchen Xue] Merge pull request apache#5 from apache/master bc5db7d [Panchen Xue] ARROW-1712: [C++] Remove unneeded data member in BinaryBuilder and modify test case 5a5b70e [Panchen Xue] Merge pull request apache#4 from apache/master 8e4c892 [Panchen Xue] Merge pull request apache#3 from xuepanchen/xuepanchen-arrow-1712 d3c8202 [Panchen Xue] ARROW-1945: [C++] Fix a small typo 0b07895 [Panchen Xue] ARROW-1945: [C++] Add data_capacity_ to track capacity of value data 18f90fb [Panchen Xue] ARROW-1945: [C++] Add data_capacity_ to track capacity of value data bbc6527 [Panchen Xue] ARROW-1945: [C++] Update test case for BinaryBuild data value space reservation 15e045c [Panchen Xue] Add test case for array-test.cc 5a5593e [Panchen Xue] Update again ReserveData(int64_t) method for BinaryBuilder 9b5e805 [Panchen Xue] Update ReserveData(int64_t) method signature for BinaryBuilder 8dd5eaa [Panchen Xue] Update builder.cc b002e0b [Panchen Xue] Remove override keyword from ReserveData(int64_t) method for BinaryBuilder de318f4 [Panchen Xue] Implement ReserveData(int64_t) method for BinaryBuilder e0434e6 [Panchen Xue] Add ReserveData(int64_t) and value_data_capacity() for methods for BinaryBuilder 5ebfb32 [Panchen Xue] Add capacity() method for TypedBufferBuilder 5b73c1c [Panchen Xue] Update again BinaryBuilder::Resize(int64_t capacity) in builder.cc d021c54 [Panchen Xue] Merge pull request apache#2 from xuepanchen/xuepanchen-arrow-1712 232024e [Panchen Xue] Update BinaryBuilder::Resize(int64_t capacity) in builder.cc c2f8dc4 [Panchen Xue] Merge pull request apache#1 from apache/master

…rks fine Previously creating two instances of `HadoopFileSystem` using the same init parameters would result in both pointing to the same `hdfsFS` object. If one `HadoopFileSystem` disconnected then the underlying `hdfsFS` would be closed for both instances. To fix this, we force a new instance of `hdfsFS` on connect, removing this cacheing behavior. Author: Jim Crist <jiminy.crist@gmail.com> Closes apache#1499 from jcrist/no-cache-hdfs and squashes the following commits: f8ff135 [Jim Crist] Add test bf6627e [Jim Crist] Force libhdfs/libhdfs3 to return new FS on connect

…internals.make_block Apparently this argument is not used at all in pandas, and the pandas developers wish to simply remove the argument rather than go through a deprecation cycle Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1507 from wesm/ARROW-2003 and squashes the following commits: a838226 [Wes McKinney] Remove use of fastpath parameter to pandas.core.internals.make_block

* [C++] Update README for linting There are hidden gotchas when trying to move the linting Makefile targets, mention them. * Mention the LLVM releases page

Previously checking if the file was closed was subclass specific, and wasn't caught in the hdfs backed file, leading to program crashes. This adds a check in `NativeFile.tell` for the file being open, and a test on a few subclasses of `NativeFile` to assure the error is raised. Note that since most python file-like objects raise a `ValueError` for operations after close, I changed the type of the existing error for these cases. This could be changed back, but an error should at least be thrown. Author: Jim Crist <jiminy.crist@gmail.com> Closes apache#1502 from jcrist/hdfs-tell-on-closed and squashes the following commits: 8a9dc94 [Jim Crist] NativeFile.tell errors after close

…IST (apache#1497) * ARROW-2019: [JAVA] Control the memory allocated for inner vector in LIST * address review comments

Adds support for pickling `HadoopFileSystem` A few additional small fixes: - Adds a check that `driver` is one of {'libhdfs3', 'libhdfs'} - A few small tweaks to the hdfs tests to make them easier to run locally. Author: Jim Crist <jiminy.crist@gmail.com> Closes apache#1505 from jcrist/pickle-hdfs-filesystem and squashes the following commits: b19ed3e [Jim Crist] Compat with older cython versions 1f74726 [Jim Crist] HadoopFileSystem is pickleable

… files when passing flavor='spark' Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1511 from wesm/ARROW-1961 and squashes the following commits: e13b6b4 [Wes McKinney] Preserve pre-existing schema metadata when sanitizing fields when passing flavor='spark'

…nt64 range Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1504 from wesm/ARROW-2017 and squashes the following commits: 56e67dc [Wes McKinney] Use unsigned PyLong API for uint64 values over int64 range

…hen possible for inner arrays Author: Wes McKinney <wes.mckinney@twosigma.com> Closes apache#1509 from wesm/ARROW-2007 and squashes the following commits: cd12626 [Wes McKinney] Pin thrift-cpp in Appveyor 326c82e [Wes McKinney] Pin Thrift 0.10.0 in toolchain e334f4e [Wes McKinney] Add explicit type check db04659 [Wes McKinney] Implement float32 conversions, use NumPy dtype when possible for inner arrays rather than dispatching to the generic sequence routine

Author: Wes McKinney <wes.mckinney@twosigma.com> Author: Antoine Pitrou <antoine@python.org> Closes apache#1516 from pitrou/ARROW-2035-update-cpplint and squashes the following commits: 8f2f892 [Wes McKinney] Fix IWYU errors surfaced by newer cpplint f5175dd [Antoine Pitrou] ARROW-2035: [C++] Update vendored cpplint.py to a Py3-compatible one

Author: yosuke shiro <ys2014hs@gmail.com> Closes apache#1521 from shiro615/change-description-from-osx-to-macos and squashes the following commits: ab03f72 [yosuke shiro] [C++] change description from OS X to macOS

… only for now

alphalfalfa · 2018-02-06T22:28:07Z

@wesm @xuepanchen

wesm · 2018-02-12T19:41:39Z

@alphalfalfa thank you for starting on these kernels; I'm sorry it's taken me a little while to get down to review but I intend to do so this week

fsaintjacques · 2019-02-11T14:44:21Z

cpp/src/arrow/compute/kernels/take.h

+namespace compute {
+
+struct ARROW_EXPORT TakeOptions {
+  enum OutOfBoundMode { RAISE, WRAP, CLIP };


Add comments describing the behavior of each options.

fsaintjacques · 2019-02-12T13:30:15Z

cpp/src/arrow/compute/compute-test.cc

+  }
+};
+
+TEST_F(TestTake, NumericValueTypeCases) {


This could be simplified with GTest templated tests feature (and type_traits.h).

fsaintjacques · 2019-02-12T13:30:41Z

cpp/src/arrow/compute/compute-test.cc

+                 const vector<bool>& is_valid_expect,
+                 const TakeOptions& options) {
+    shared_ptr<Array> input_a, indices_a, expect_a;
+    if (is_valid_input.empty()) {


Is there a reason why the empty check is done? You can still pass a vector with zero elements.

fsaintjacques · 2019-02-12T13:31:37Z

cpp/src/arrow/compute/compute-test.cc

+    ASSERT_RAISES(NotImplemented, Take(&ctx_, input, indices, options, &result));
+  }
+
+  template <typename ValueType, typename V, typename IndexType, typename I>


Consider using default argument to templates to simplify the callsite, e.g.

template <typename ValueType, typename V = ValueType::c_type, typename IndexType, typename I = IndexType::c_type>

fsaintjacques · 2019-02-12T13:32:37Z

cpp/src/arrow/compute/kernels/take.h

+Status Take(FunctionContext* context, const Array& in, const Array& indices,
+            const TakeOptions& options, std::shared_ptr<Array>* out);
+
+// Single integer index


Why does it require a special case for a single element? I think it adds a lot of clutter, if it's for testing purpose, consider using ArrayFromVector (you already do) or ArrayFromJSON in tests.

fsaintjacques · 2019-02-12T14:06:41Z