Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-89: [Python] Add benchmarks for Arrow<->Pandas conversion #51

Closed
wants to merge 2 commits into from

Conversation

xhochy
Copy link
Member

@xhochy xhochy commented Mar 29, 2016

No description provided.

@xhochy
Copy link
Member Author

xhochy commented Mar 29, 2016

Seems like we only have zero-copy Pandas-to-Arrow for int64 columns at the moment as float64 seems to have some overhead:

[ 42.86%] ··· Running array.PandasConversions.time_from_series                                                                               ok
[ 42.86%] ···· 
               ========== ========== ========== ==========
               --                      dtype              
               ---------- --------------------------------
                  size      int64     float64      str    
               ========== ========== ========== ==========
                   1       197.17μs   197.98μs   201.56μs 
                 100000    198.50μs    2.04ms     9.51ms  
                1000000    196.28μs   17.12ms    96.38ms  
                10000000   195.93μs   165.74ms    1.07s   
               ========== ========== ========== ==========

@wesm
Copy link
Member

wesm commented Mar 29, 2016

@xhochy yes -- because integer data in pandas does not support nulls, the null accounting can be skipped. In the case of float64, you must convert the NaN values to a bitmap. See the ValuesToBitmap function.

@wesm
Copy link
Member

wesm commented Mar 29, 2016

Not sure if you are benchmarking -O3 release builds but that should make a performance difference.

@xhochy
Copy link
Member Author

xhochy commented Mar 29, 2016

I'm benchmarking -O3 builds, I've got a separate build env locally for these benchmarks ;)

@wesm
Copy link
Member

wesm commented Mar 29, 2016

I figured =)

@xhochy
Copy link
Member Author

xhochy commented Mar 31, 2016

Also made me a VM setup to run the benchmarks independently from my current setup in a VM: https://github.com/xhochy/arrow-performance-setup Maybe it could be of use for someone else, at least for me, I can be sure that those builds are all the ones with -O3.

@wesm
Copy link
Member

wesm commented Apr 1, 2016

Ah, that is super useful. I starred it for now, let's see how things develop, and having reproducible testing VMs (we could use docker, too?) will be very helpful, I'm sure.

@wesm
Copy link
Member

wesm commented Apr 1, 2016

+1, thank you

@asfgit asfgit closed this in b3ebce1 Apr 1, 2016
@xhochy xhochy deleted the arrow-89 branch March 7, 2017 16:16
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 2, 2018
This PR implements a SetData interface for the LevelDecoder class similar to existing value decoders.
This PR also adds a test for PARQUET-523

Author: Deepak Majeti <deepak.majeti@hp.com>

Closes apache#51 from majetideepak/PARQUET-515 and squashes the following commits:

dde3654 [Deepak Majeti] fixed headers order
c26db08 [Deepak Majeti] rebased with upstream
1420bbf [Deepak Majeti] PARQUET-515
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 4, 2018
This PR implements a SetData interface for the LevelDecoder class similar to existing value decoders.
This PR also adds a test for PARQUET-523

Author: Deepak Majeti <deepak.majeti@hp.com>

Closes apache#51 from majetideepak/PARQUET-515 and squashes the following commits:

dde3654 [Deepak Majeti] fixed headers order
c26db08 [Deepak Majeti] rebased with upstream
1420bbf [Deepak Majeti] PARQUET-515

Change-Id: I115e65bf37b1ad4eb1c5032223769db4183b4272
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 6, 2018
This PR implements a SetData interface for the LevelDecoder class similar to existing value decoders.
This PR also adds a test for PARQUET-523

Author: Deepak Majeti <deepak.majeti@hp.com>

Closes apache#51 from majetideepak/PARQUET-515 and squashes the following commits:

dde3654 [Deepak Majeti] fixed headers order
c26db08 [Deepak Majeti] rebased with upstream
1420bbf [Deepak Majeti] PARQUET-515

Change-Id: I115e65bf37b1ad4eb1c5032223769db4183b4272
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 7, 2018
This PR implements a SetData interface for the LevelDecoder class similar to existing value decoders.
This PR also adds a test for PARQUET-523

Author: Deepak Majeti <deepak.majeti@hp.com>

Closes apache#51 from majetideepak/PARQUET-515 and squashes the following commits:

dde3654 [Deepak Majeti] fixed headers order
c26db08 [Deepak Majeti] rebased with upstream
1420bbf [Deepak Majeti] PARQUET-515

Change-Id: I115e65bf37b1ad4eb1c5032223769db4183b4272
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 8, 2018
This PR implements a SetData interface for the LevelDecoder class similar to existing value decoders.
This PR also adds a test for PARQUET-523

Author: Deepak Majeti <deepak.majeti@hp.com>

Closes apache#51 from majetideepak/PARQUET-515 and squashes the following commits:

dde3654 [Deepak Majeti] fixed headers order
c26db08 [Deepak Majeti] rebased with upstream
1420bbf [Deepak Majeti] PARQUET-515

Change-Id: I115e65bf37b1ad4eb1c5032223769db4183b4272
kou pushed a commit that referenced this pull request May 10, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). #7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit ed5f534
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test #11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test #12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test #13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test #14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test #15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test #16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test #17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test #18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test #19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test #20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test #21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test #22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test #23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test #24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test #25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test #26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test #27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test #28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test #29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test #30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test #31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test #32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test #33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test #34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test #35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test #36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test #37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test #38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test #39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test #40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test #41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test #42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test #43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test #44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test #45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test #46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test #47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test #48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test #49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test #50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test #51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes #7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
zhouyuan pushed a commit to zhouyuan/arrow that referenced this pull request Dec 3, 2021
zhztheplayer added a commit to zhztheplayer/arrow-1 that referenced this pull request Feb 8, 2022
zhztheplayer added a commit to zhztheplayer/arrow-1 that referenced this pull request Mar 3, 2022
rui-mo pushed a commit to rui-mo/arrow-1 that referenced this pull request Mar 23, 2022
zhouyuan pushed a commit to zhouyuan/arrow that referenced this pull request Apr 26, 2022
rui-mo pushed a commit to rui-mo/arrow-1 that referenced this pull request May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants