New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11026 [Rust]: Run tests without requiring environment variables #8996
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rust/README.md
Outdated
The following Env vars are required to run `cargo test`, examples, etc. | ||
By default, `cargo test` will look for these directories at their | ||
standard location. The following Env vars can be used to override the | ||
location shoud you choose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
location shoud you choose | |
location should you choose. |
} | ||
}; | ||
let mut pathbuf = | ||
PathBuf::from_str(&arrow::util::test_util::parquet_test_data()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, better to move the parquet test data fn into this crate, because arrow
is an optional dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @nevi-me
Actually I had asked similar question "where to put them?", finally I decided leaving this question to reviewers, because the most important thing at that time is coding.
At present, with this PR, thanks to @alamb, we finally got a chance to discuss this issue. Actually I don't mind how to improve or split the code in #8967 at all, as long as the fundamental purpose could be achieved, and they do not mess up the existing testing code too much :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nevi-me and @mqy -- I tried to move parquet_test_data
into https://github.com/apache/arrow/blob/master/rust/parquet/src/util/test_common/file_util.rs -- however, the code quickly got messy because parquet::util
is not publically exported and thus I can't use functions defined there in places (like datafusion) outside the parquet crate. Furthermore, the test_utils
are only compiled in test
config, but several datafusion examples use the parquet test data but they are not compiled in test
config.
I can think of several possibilities:
- Leave the
parquet_test_data
function in the arrow crate as it is in this PR - Make a copy of parquet_test_data in the parquet crate
- Make the parquet util module public and export test_util in all configurations
Given that this function is used in tests and the other options seem messy to me, I suggest number 1 (though perhaps I am being lazy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the first choice right now. Perhaps these functions should stay in a top crate parallel to arrow crate, as common utilities for testing or something else. We may create that crate later.
0125b0f
to
195e743
Compare
Codecov Report
@@ Coverage Diff @@
## master #8996 +/- ##
=======================================
Coverage 82.87% 82.88%
=======================================
Files 201 201
Lines 49729 49734 +5
=======================================
+ Hits 41213 41220 +7
+ Misses 8516 8514 -2
Continue to review full report at Codecov.
|
@alamb those two envs also stay in rust/arrow/README.md |
195e743
to
84ceb2b
Compare
Good idea @mqy -- done. |
@nevi-me -- I have not been able to find a satisfactory way to avoid using the common |
Hey @alamb, it's fine, we can merge the changes as they are. Thanks for trying to move the function. |
This PR is based on @mqy 's great work in: apache#8967. (If we want to take this PR, we can either merge it in to https://github.com/apache/arrow/pull/8967/files# directly or I can make a new independent PR when that is merged). ## Rationale The outcome is that developers can now simply run `cargo test` in a typical checkout without having to mess with environment variables. I think this will lower the barrier to entry for people to contribute. ## Changes 1. Code from apache#8967 to encode heuristics of where to check for test data 1. Remove all references to ARROW_TEST_DATA and PARQUET_TEST_DATA and uses the test_util methods instead 2. Update the comments / error messages in test_util ## Example Error Handling Error handling: here is what happens with a fresh checkout and no git modules checked out and no environment variables set: ``` cargo test -p arrow ---- ipc::reader::tests::read_decimal_be_file_should_panic stdout ---- thread 'ipc::reader::tests::read_decimal_be_file_should_panic' panicked at 'failed to get arrow data dir: env `ARROW_TEST_DATA` is undefined or has empty value, and the pre-defined data dir `/private/tmp/arrow/rust/arrow/../../testing/data` not found HINT: try running `git submodule update --init`', arrow/src/util/test_util.rs:81:21 ``` Here is an example of what happens when `ARROW_TEST_DATA` is pointing somewhere non existent ``` ARROW_TEST_DATA=blargh cargo test -p arrow ... --- ipc::reader::tests::read_decimal_be_file_should_panic stdout ---- thread 'ipc::reader::tests::read_decimal_be_file_should_panic' panicked at 'failed to get arrow data dir: the data dir `blargh` defined by env ARROW_TEST_DATA not found', arrow/src/util/test_util.rs:81:21 ``` Closes apache#8996 from alamb/alamb/tests_without_env Authored-by: Andrew Lamb <andrew@nerdnetworks.org> Signed-off-by: Neville Dipale <nevilledips@gmail.com>
This PR is based on @mqy 's great work in: #8967.
(If we want to take this PR, we can either merge it in to https://github.com/apache/arrow/pull/8967/files# directly or I can make a new independent PR when that is merged).
Rationale
The outcome is that developers can now simply run
cargo test
in a typical checkout without having to mess with environment variables. I think this will lower the barrier to entry for people to contribute.Changes
Example Error Handling
Error handling: here is what happens with a fresh checkout and no git modules checked out and no environment variables set:
Here is an example of what happens when
ARROW_TEST_DATA
is pointing somewhere non existent