Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Integration tests against arrow #10

Merged
merged 6 commits into from Apr 21, 2021

Conversation

jorgecarleitao
Copy link
Member

@jorgecarleitao jorgecarleitao commented Apr 20, 2021

This enables the integration tests on our side by cloning apache/arrow@master and running our PRs against it.

This job takes some time, as there docker layers are not being cached. I think that this is already a problem in apache/arrow@master and imo we should address there.

@codecov-commenter
Copy link

Codecov Report

Merging #10 (e2b0faa) into master (ae812db) will not change coverage.
The diff coverage is n/a.

❗ Current head e2b0faa differs from pull request most recent head 0f389bd. Consider uploading reports for the commit 0f389bd to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master      #10   +/-   ##
=======================================
  Coverage   82.47%   82.47%           
=======================================
  Files         162      162           
  Lines       43414    43414           
=======================================
  Hits        35806    35806           
  Misses       7608     7608           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae812db...0f389bd. Read the comment docs.

@jorgecarleitao jorgecarleitao force-pushed the integration branch 2 times, most recently from e2235a2 to 89afe67 Compare April 20, 2021 05:23
@jorgecarleitao jorgecarleitao marked this pull request as draft April 20, 2021 05:33
@jorgecarleitao jorgecarleitao marked this pull request as ready for review April 20, 2021 05:33
@alamb
Copy link
Contributor

alamb commented Apr 20, 2021

I rewrote master to remove the history of the other languages (and 50MB of history). This means this PR will need to be "rebased" against the current master.

The best way I found to do this was to find the relevant commits (via git log) and then cherry-pick them one at a time. For example (change COMMIT-SHA to whatever the commit actually is):

git reset --hard apache/master
git cherry-pick COMMIT-SHA
git push -f alamb

alamb pushed a commit that referenced this pull request Apr 20, 2021
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). #7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit e2e6db2
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test #11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test #12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test #13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test #14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test #15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test #16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test #17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test #18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test #19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test #20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test #21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test #22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test #23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test #24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test #25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test #26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test #27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test #28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test #29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test #30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test #31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test #32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test #33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test #34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test #35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test #36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test #37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test #38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test #39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test #40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test #41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test #42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test #43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test #44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test #45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test #46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test #47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test #48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test #49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test #50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test #51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes #7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@alamb alamb changed the base branch from master to master_pre_rewrite April 20, 2021 11:09
@alamb alamb changed the base branch from master_pre_rewrite to master April 20, 2021 11:09
@alamb
Copy link
Contributor

alamb commented Apr 20, 2021

@jorgecarleitao I don't see the Integration test being triggered on this PR . Do we need to do something else to get it to run?

@jorgecarleitao
Copy link
Member Author

I think I made a mistake: we probably need synchronize on the list of triggers in the pull_request. I was trying to not trigger this when the PR was marked as drafted but caused a false-positive.

@jorgecarleitao
Copy link
Member Author

(it did triggered the first time I made it, but I can see that a force push is not triggering it)

@andygrove
Copy link
Member

I'm not too familiar with archery but the overall approach seems reasonable

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could make integration opt-in, like how the rustlang project only triggers certain jobs manually if it's deemed that a PR could affect performance.

I can investigate this over the weekend, having quite a busy week.

@jorgecarleitao
Copy link
Member Author

@alamb , thanks for the rebase; @nevi-me , thanks for the review.

@wesm @kszucs , any concerns / alternatives with this approach? I suspect that other implementations (Julia) may copy this, and thus some validation beyond rust is appreciated.

@kszucs
Copy link
Member

kszucs commented Apr 20, 2021

It seems like a clean approach, though we'll need to update the integration testing in apache/arrow to include arrow-rs, so this configuration will need an update after that.

@jorgecarleitao
Copy link
Member Author

though we'll need to update the integration testing in apache/arrow to include arrow-rs, so this configuration will need an update after that.

Got it. In apache/arrow#10096, the last commit introduces a git clone from this repo to rust/ in apacher/arrow/, so that the tests run on apache/arrow pick this one. Wouldn't that be enough?

@kszucs
Copy link
Member

kszucs commented Apr 20, 2021

Wouldn't that be enough?

Yes, at least I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants