Skip to content

Conversation

@fsaintjacques
Copy link
Contributor

@fsaintjacques fsaintjacques commented May 27, 2019

Comparison only supported for the left argument to be an array and the right argument a scalar. This extends support for comparing two arrays, but also supporting the case where the left argument is a scalar and the right an array.

@fsaintjacques fsaintjacques force-pushed the ARROW-4990-compare-array-array branch from 5c47b82 to 6cfef25 Compare May 29, 2019 15:32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it pay to support scalar-scalar comparison (i.e. should this be 1)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can support comparing scalars by promoting Scalar to Array of length 1. Is there a JIRA about Scalar->Array promotion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different then AssignNullIntersection?

Copy link
Contributor Author

@fsaintjacques fsaintjacques May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it absolutely is not, didn't know why I missed this one. I'll refactor do call AssignNullIntersection or remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think refactoring to call AssignNullIntersection sounds good, since I clearly didn't pick a good name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more that I didn't look what was there than name picking. Even worse, I think I even reviewed your function...

@emkornfield
Copy link
Contributor

Took a quick pass on this, will look at it in more detail tomorrow.

Copy link
Contributor

@emkornfield emkornfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like appveyor is broken with something related.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think refactoring to call AssignNullIntersection sounds good, since I clearly didn't pick a good name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not this change, but I think the convention we should be using for now is that we use detail::Invoke to handle chunked arrays in the top level method, and worry about the parallelization when we expose the raw kernels? Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree but I suspect it will be hard to make with the current Invoke method, as the shape can vary, e.g. the signature is (S,T) -> R, where S,T,R can be Array, ChunkedArray, Scalar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so I think the complication is scalar. (I thought invoke handled combinations of Array and ChunkedArray). One way of potentially doing this is converting the Kernel into a Unary one when the function is called with scalar and use the InvokeUnary instead (and handle scalar to scalar here).

@fsaintjacques fsaintjacques force-pushed the ARROW-4990-compare-array-array branch from b97c787 to 8c3d1de Compare May 31, 2019 13:56
@fsaintjacques
Copy link
Contributor Author

@emkornfield updated with comments. I filled https://issues.apache.org/jira/browse/ARROW-5489

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems strange why the r-value reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emkornfield
Copy link
Contributor

1 small nit about r-value reference that I don't understand, otherwise LGTM.

@fsaintjacques
Copy link
Contributor Author

@ursabot build

@ursabot
Copy link

ursabot commented Jun 4, 2019

I've successfully started builds for this PR

@fsaintjacques
Copy link
Contributor Author

@emkornfield I changed the forward to a pointer as per comment and style guide.

Convenience method for sub-type supporting the length method.
When given 2 arrays, it should compose the validity bitmap by
intersecting the input validity bitmaps.
- Extend CompareKernel to support (Array, Array) -> Array signature.
- Extend CompareKernel to support (Scalar, Array) -> Array signature
  (swap the order of inputs).
@fsaintjacques fsaintjacques force-pushed the ARROW-4990-compare-array-array branch from 8fafd6d to 864c679 Compare June 5, 2019 11:46
@fsaintjacques
Copy link
Contributor Author

@ursabot benchmark

@ursabot
Copy link

ursabot commented Jun 5, 2019

I've successfully started builds for this PR

@ursabot
Copy link

ursabot commented Jun 5, 2019

AMD64 Ubuntu 18.04 C++ Benchmark (#14985) builder failed.

Revision: 864c679

Archery: 'archery benchmark ...' step's stderr:

Selected compiler gcc 7.4.0
Using ld linker
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
CMake Warning at cmake_modules/ThirdpartyToolchain.cmake:148 (find_package):
  No "Findbenchmark.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1378 (resolve_dependency)
  CMakeLists.txt:365 (include)


CMake Warning (dev) at cmake_modules/ThirdpartyToolchain.cmake:148 (find_package):
  Findbenchmark.cmake must either be part of this project itself, in this
  case adjust CMAKE_MODULE_PATH so that it points to the correct location
  inside its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed Findbenchmark.cmake.  This must be a location provided by that
  package.  This error in general means that the buildsystem of this project
  is relying on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1378 (resolve_dependency)
  CMakeLists.txt:365 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

Cloning into '/tmp/arrow-bench-xheqpnh7/master/arrow'...
done.
Switched to a new branch 'master'
Selected compiler gcc 7.4.0
Using ld linker
Configured for RELEASE build (set with cmake -DCMAKE_BUILD_TYPE={release,debug,...})
CMake Warning at cmake_modules/ThirdpartyToolchain.cmake:148 (find_package):
  No "Findbenchmark.cmake" found in CMAKE_MODULE_PATH.
Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1378 (resolve_dependency)
  CMakeLists.txt:365 (include)


CMake Warning (dev) at cmake_modules/ThirdpartyToolchain.cmake:148 (find_package):
  Findbenchmark.cmake must either be part of this project itself, in this
  case adjust CMAKE_MODULE_PATH so that it points to the correct location
  inside its source tree.

  Or it must be installed by a package which has already been found via
  find_package().  In this case make sure that package has indeed been found
  and adjust CMAKE_MODULE_PATH to contain the location where that package has
  installed Findbenchmark.cmake.  This must be a location provided by that
  package.  This error in general means that the buildsystem of this project
  is relying on a Find-module without ensuring that it is actually available.

Call Stack (most recent call first):
  cmake_modules/ThirdpartyToolchain.cmake:1378 (resolve_dependency)
  CMakeLists.txt:365 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

2019-06-05 11:59:02
Running /tmp/arrow-bench-xheqpnh7/WORKSPACE/build/release/arrow-ipc-read-write-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:08:45
Running /tmp/arrow-bench-xheqpnh7/WORKSPACE/build/release/arrow-int-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:09:14
Running /tmp/arrow-bench-xheqpnh7/WORKSPACE/build/release/arrow-trie-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:09:27
Running /tmp/arrow-bench-xheqpnh7/WORKSPACE/build/release/arrow-json-parser-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:10:10
Running /tmp/arrow-bench-xheqpnh7/WORKSPACE/build/release/arrow-io-memory-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:11:15
Running /tmp/arrow-bench-xheqpnh7/WORKSPACE/build/release/arrow-io-file-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:11:57
Running /tmp/arrow-bench-xheqpnh7/WORKSPACE/build/release/arrow-number-parsing-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:13:37
Running /tmp/arrow-bench-xheqpnh7/WORKSPACE/build/release/arrow-csv-converter-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:13:52
Running /tmp/arrow-bench-xheqpnh7/WORKSPACE/build/release/arrow-thread-pool-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:17:13
Running /tmp/arrow-bench-xheqpnh7/WORKSPACE/build/release/arrow-compute-filter-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:22:02
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-ipc-read-write-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:23:59
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-utf8-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:24:57
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-hashing-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:25:26
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-csv-parser-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:26:02
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-compute-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:28:01
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-decimal-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:28:08
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-compute-aggregate-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:28:37
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-bit-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:29:25
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-column-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:30:16
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-builder-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:31:48
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-int-util-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:32:17
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-trie-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:32:30
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-json-parser-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:33:14
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-io-memory-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:34:20
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-io-file-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:35:03
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-number-parsing-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:36:45
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-csv-converter-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:36:59
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-thread-pool-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
2019-06-05 12:40:14
Running /tmp/arrow-bench-xheqpnh7/master/build/release/arrow-compute-filter-benchmark
Run on (40 X 3600 MHz CPU s)
CPU Caches:
  L1 Data 32K (x20)
  L1 Instruction 32K (x20)
  L2 Unified 256K (x20)
  L3 Unified 51200K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.

@codecov-io
Copy link

Codecov Report

Merging #4398 into master will decrease coverage by 11.34%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #4398       +/-   ##
===========================================
- Coverage   88.26%   76.92%   -11.35%     
===========================================
  Files         846       51      -795     
  Lines      103360     1976   -101384     
  Branches     1253        0     -1253     
===========================================
- Hits        91231     1520    -89711     
+ Misses      11882      456    -11426     
+ Partials      247        0      -247
Impacted Files Coverage Δ
python/pyarrow/ipc.pxi
cpp/src/arrow/csv/chunker-test.cc
cpp/src/parquet/column_page.h
cpp/src/parquet/bloom_filter-test.cc
cpp/src/arrow/array/builder_decimal.cc
cpp/src/plasma/client.cc
cpp/src/arrow/io/test-common.h
cpp/src/arrow/util/int-util-test.cc
cpp/src/arrow/python/io.cc
python/pyarrow/hdfs.py
... and 778 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a024f7...864c679. Read the comment docs.

@emkornfield
Copy link
Contributor

+1, LGTM. It looks like the Ursabot output could be more useful

@fsaintjacques
Copy link
Contributor Author

I'm fixing this with Kristian.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants