New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11775: [Rust][DataFusion] Feature Flags for Dependencies #9567
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9567 +/- ##
==========================================
+ Coverage 82.25% 82.38% +0.12%
==========================================
Files 244 244
Lines 55685 56234 +549
==========================================
+ Hits 45806 46329 +523
- Misses 9879 9905 +26
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seddonm1 I really like this approach (where the user gets told what is wrong if the function doesn't work). THANK YOU!
I made a little test program that refers to this branch and tried to call the md5
function and it works as I expected!
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ExternalError(Internal("requires compilation with feature flag: crypto-functions")))', src/main.rs:12:48
Code Thoughts
I noticed that the PR has a bunch of
#[cfg(feature = "crypto-functions")]
annotations sprinkled around and then a bunch of duplicated stubs like
#[cfg(not(feature = "crypto-functions"))]
pub fn md5(_: &[ColumnarValue]) -> Result<ColumnarValue> {
Err(DataFusionError::Internal(
"requires compilation with feature flag: crypto-functions".to_string(),
))
}
I wonder if we could hide that all behind a macro or something -- like maybe attach #[cfg(not(feature = "crypto-functions"))]
to the entire crypto_functions
module and then rather than directly use
ing the functions, have a macro that is like
import_if_enabled!(crypto_functions::md5, md5, "crypto-functions");
which would boil down to
use crypto_functions::md5;
if the feature flag crypto-functions
was enabled, and
pub fn md5(_: &[ColumnarValue]) -> Result<ColumnarValue> {
Err(DataFusionError::Internal(
"requires compilation with feature flag: crypto-functions".to_string(),
))
}
CI Thoughts
I think adding a new CI job for datafusion without default features would be enough (aka test without any of the functions enabled) -- and as we add more feature flags we could just keep using the same CI test.
Warnings
Just FYI I got a bunch of warnings when compiling without the feature flags:
cd /Users/alamb/Software/rust_datafusion_deps && cargo run
warning: unused import: `std::sync::Arc`
--> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:19:5
|
19 | use std::sync::Arc;
| ^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused import: `scalar::ScalarValue`
--> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:31:5
|
31 | scalar::ScalarValue,
| ^^^^^^^^^^^^^^^^^^^
warning: unused imports: `Array`, `BinaryArray`, `GenericStringArray`, `StringOffsetSizeTrait`, `datatypes::DataType`
--> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:34:13
|
34 | array::{Array, BinaryArray, GenericStringArray, StringOffsetSizeTrait},
| ^^^^^ ^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^
35 | datatypes::DataType,
| ^^^^^^^^^^^^^^^^^^^
warning: unused import: `string_expressions::unary_string_function`
--> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:38:13
|
38 | use super::{string_expressions::unary_string_function, ColumnarValue};
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
warning: missing documentation for a function
--> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:213:1
|
213 | pub fn md5(_: &[ColumnarValue]) -> Result<ColumnarValue> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
note: the lint level is defined here
--> /Users/alamb/Software/arrow2/rust/datafusion/src/lib.rs:17:9
|
17 | #![warn(missing_docs)]
| ^^^^^^^^^^^^
warning: missing documentation for a function
--> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:220:1
|
220 | pub fn sha224(_: &[ColumnarValue]) -> Result<ColumnarValue> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
warning: missing documentation for a function
--> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:227:1
|
227 | pub fn sha256(_: &[ColumnarValue]) -> Result<ColumnarValue> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
warning: missing documentation for a function
--> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:234:1
|
234 | pub fn sha384(_: &[ColumnarValue]) -> Result<ColumnarValue> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
warning: missing documentation for a function
--> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:241:1
|
241 | pub fn sha512(_: &[ColumnarValue]) -> Result<ColumnarValue> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
warning: 9 warnings emitted
FYI @ovr -- given you contributed the original crypto functions, do you have any thoughts about making them optional (on by default)? |
Looks ok to me :) |
Thanks @alamb I did try to wrap code blocks I am not sure if we will ever be able to get away from the warnings unless MANY annotations are added which makes the code really horrible. I will try a few more tests. |
Thanks @alamb and @elferherrera I have incorporated both your ideas within the limitations of the rust compiler; namely:
I have also updated the rust.yml CICD file and added tests for consistency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this @seddonm1 -- really nice work and a pattern that we can follow with other functions.
@@ -447,6 +446,26 @@ pub fn return_type( | |||
} | |||
} | |||
|
|||
#[cfg(feature = "crypto_expressions")] | |||
macro_rules! invoke_if_crypto_expressions_feature_flag { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@andygrove @Dandandan @elferherrera what do you think of this approach? I really like it |
}}; | ||
} | ||
|
||
#[cfg(not(feature = "crypto_expressions"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can put the #[cfg(feature)]
inside the macro so you dont have to maintain two funtions
What do you think?
macro_rules! invoke_if_crypto_expressions_feature_flag {
($FUNC:ident, $NAME:expr) => {
#[cfg(not(feature = "crypto_expressions"))]
{
|_: &[ColumnarValue]| -> Result<ColumnarValue> {
Err(DataFusionError::Internal(format!(
"function {} requires compilation with feature flag: crypto_expressions.",
$NAME
)))
}
}
#[cfg(feature = "crypto_expressions")]
{
use crate::physical_plan::crypto_expressions;
crypto_expressions::$FUNC
}
};
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elferherrera I tried this but unfortunately the compiler does not like it. The #[cfg]
tag does not operate on {..}
blocks.
The end-state of this was my least-bad solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seddonm1 That's odd, because before posting that I did a small test and it did compile.
#[cfg(feature = "rand_calc")]
use playground::myrand;
macro_rules! testing_something {
($name:expr) => {
#[cfg(not(feature = "rand_calc"))]
{
println!("No feature: {}", $name);
}
#[cfg(feature = "rand_calc")]
{
println!("Feature: {}", $name);
}
};
}
fn main() {
let a = 3.0001f64.to_le_bytes();
println!("{:?}", a);
#[cfg(feature = "rand_calc")]
{
let res = myrand::with_rand::create_rnd();
println!("{:?}", res);
}
testing_something!("Name");
}
I should have tested it with your code to see if it would compile as well
@@ -40,9 +40,10 @@ name = "datafusion-cli" | |||
path = "src/bin/main.rs" | |||
|
|||
[features] | |||
default = ["cli"] | |||
default = ["cli", "crypto_expressions"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By having them on by default, how do we reduce compile times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By having them on by default, how do we reduce compile times?
You can locally run the tests with cargo test --features cli
or something, right?
The features for cryptography functions looks good to me! Thanks @seddonm1 ! I think it is a good idea to make some things optional, to be able to exclude certain features. I think the features introduce some more complexity to the build, as we introduce more variants that maybe could fail to compile or to fail passing test in some configuration. If we can reduce the number of dependencies or move something out, I think that might be preferable if at all possible. |
@Dandandan this was discussed at the last Arrow Rust sync call. The intention was to allow reducing compile time of projects which depend on DataFusion - not the DataFusion project itself. I think that care should be taken to keep the dependencies sensible but ultimately if we are chasing some sort of SQL compatibility (i.e. Postgres) then having the standard set of SQL functions on by default seems the correct approach. |
I agree with the macro for the dependencies. Im just curious on why in the TOML file in the features section the crypto feature is added: |
@elferherreraI I think the response above answers this? (#9567 (comment)) |
@Dandandan I suggest the approach, long term, of putting extension packages (and their tests) in their own crates. I envision this as a two step process:
The thing is that we don't have a huge amount of dev time to devote to this project, and I worry about the overhead of splitting into separate crates. However until the time we have time to do something better, I think feature flags is the minimal overhead way to make the dependencies optional. |
Sorry, didn't read carefully the last bit. So I guess if we don't want crypto you can only select I like the changes |
@alamb @andygrove @nevi-me Based on the Rust Arrow sync call we discussed setting feature flags to allow use of DataFusion without having to pull in unnecessary dependencies. Here is an example applied to the crypto functions. It: - puts the crypto functions behind a feature flag called `crypto-functions` with the required dependencies tagged. - adds the `crypto-functions` to the `default` feature flag meaning it will be included in the default build. - adds tests for both states of the feature flag. Note that the cargo behavior of specifying conditions in the rust test framework is inverted (at least in my mind) so that you can only specify `ignore` not `include` rules. Practically this means to run the test with the feature flag you do: `#[cfg_attr(not(feature = "crypto-functions"), ignore)]` which is not ideal but functional. - currently I am throwing `Err(DataFusionError::Internal("requires compilation with feature flag: crypto-functions".to_string()))` errors which I think is a better user experience than `panic` based macros but this could easily be changed to `unimplemented!` with a similar message. - unfortunately we will probably need to execute `cargo test --no-default-features --features cli` AND `cargo test` in CICD to ensure the proper coverage (`--features cli` is required to compile). Thoughts? Closes apache#9567 from seddonm1/feature-flags Authored-by: Mike Seddon <seddonm1@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb @andygrove @nevi-me Based on the Rust Arrow sync call we discussed setting feature flags to allow use of DataFusion without having to pull in unnecessary dependencies. Here is an example applied to the crypto functions. It: - puts the crypto functions behind a feature flag called `crypto-functions` with the required dependencies tagged. - adds the `crypto-functions` to the `default` feature flag meaning it will be included in the default build. - adds tests for both states of the feature flag. Note that the cargo behavior of specifying conditions in the rust test framework is inverted (at least in my mind) so that you can only specify `ignore` not `include` rules. Practically this means to run the test with the feature flag you do: `#[cfg_attr(not(feature = "crypto-functions"), ignore)]` which is not ideal but functional. - currently I am throwing `Err(DataFusionError::Internal("requires compilation with feature flag: crypto-functions".to_string()))` errors which I think is a better user experience than `panic` based macros but this could easily be changed to `unimplemented!` with a similar message. - unfortunately we will probably need to execute `cargo test --no-default-features --features cli` AND `cargo test` in CICD to ensure the proper coverage (`--features cli` is required to compile). Thoughts? Closes apache#9567 from seddonm1/feature-flags Authored-by: Mike Seddon <seddonm1@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb @andygrove @nevi-me
Based on the Rust Arrow sync call we discussed setting feature flags to allow use of DataFusion without having to pull in unnecessary dependencies.
Here is an example applied to the crypto functions. It:
crypto-functions
with the required dependencies tagged.crypto-functions
to thedefault
feature flag meaning it will be included in the default build.ignore
notinclude
rules. Practically this means to run the test with the feature flag you do:#[cfg_attr(not(feature = "crypto-functions"), ignore)]
which is not ideal but functional.Err(DataFusionError::Internal("requires compilation with feature flag: crypto-functions".to_string()))
errors which I think is a better user experience thanpanic
based macros but this could easily be changed tounimplemented!
with a similar message.cargo test --no-default-features --features cli
ANDcargo test
in CICD to ensure the proper coverage (--features cli
is required to compile).Thoughts?