Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11775: [Rust][DataFusion] Feature Flags for Dependencies #9567

Closed
wants to merge 2 commits into from

Conversation

seddonm1
Copy link
Contributor

@alamb @andygrove @nevi-me

Based on the Rust Arrow sync call we discussed setting feature flags to allow use of DataFusion without having to pull in unnecessary dependencies.

Here is an example applied to the crypto functions. It:

  • puts the crypto functions behind a feature flag called crypto-functions with the required dependencies tagged.
  • adds the crypto-functions to the default feature flag meaning it will be included in the default build.
  • adds tests for both states of the feature flag. Note that the cargo behavior of specifying conditions in the rust test framework is inverted (at least in my mind) so that you can only specify ignore not include rules. Practically this means to run the test with the feature flag you do: #[cfg_attr(not(feature = "crypto-functions"), ignore)] which is not ideal but functional.
  • currently I am throwing Err(DataFusionError::Internal("requires compilation with feature flag: crypto-functions".to_string())) errors which I think is a better user experience than panic based macros but this could easily be changed to unimplemented! with a similar message.
  • unfortunately we will probably need to execute cargo test --no-default-features --features cli AND cargo test in CICD to ensure the proper coverage (--features cli is required to compile).

Thoughts?

@github-actions
Copy link

@codecov-io
Copy link

Codecov Report

Merging #9567 (56c854b) into master (5bea624) will increase coverage by 0.12%.
The diff coverage is 89.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9567      +/-   ##
==========================================
+ Coverage   82.25%   82.38%   +0.12%     
==========================================
  Files         244      244              
  Lines       55685    56234     +549     
==========================================
+ Hits        45806    46329     +523     
- Misses       9879     9905      +26     
Impacted Files Coverage Δ
rust/datafusion/src/logical_plan/expr.rs 81.56% <ø> (+0.42%) ⬆️
...datafusion/src/physical_plan/crypto_expressions.rs 52.45% <ø> (ø)
...ion-testing/src/bin/arrow-json-integration-test.rs 0.00% <0.00%> (ø)
rust/arrow/src/ipc/writer.rs 87.23% <50.00%> (-0.59%) ⬇️
...datafusion/src/physical_plan/string_expressions.rs 77.00% <82.23%> (+7.37%) ⬆️
rust/datafusion/src/physical_plan/functions.rs 83.84% <88.23%> (+10.02%) ⬆️
rust/parquet/src/arrow/array_reader.rs 77.61% <91.30%> (-0.02%) ⬇️
rust/datafusion/tests/sql.rs 98.28% <100.00%> (+0.14%) ⬆️
rust/parquet/src/arrow/arrow_reader.rs 91.91% <100.00%> (+0.34%) ⬆️
rust/parquet/src/schema/types.rs 89.54% <100.00%> (+0.02%) ⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81e9417...56c854b. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seddonm1 I really like this approach (where the user gets told what is wrong if the function doesn't work). THANK YOU!

I made a little test program that refers to this branch and tried to call the md5 function and it works as I expected!

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ExternalError(Internal("requires compilation with feature flag: crypto-functions")))', src/main.rs:12:48

Code Thoughts

I noticed that the PR has a bunch of

#[cfg(feature = "crypto-functions")]

annotations sprinkled around and then a bunch of duplicated stubs like

#[cfg(not(feature = "crypto-functions"))]
pub fn md5(_: &[ColumnarValue]) -> Result<ColumnarValue> {
    Err(DataFusionError::Internal(
        "requires compilation with feature flag: crypto-functions".to_string(),
    ))
}

I wonder if we could hide that all behind a macro or something -- like maybe attach #[cfg(not(feature = "crypto-functions"))] to the entire crypto_functions module and then rather than directly useing the functions, have a macro that is like

import_if_enabled!(crypto_functions::md5, md5, "crypto-functions");

which would boil down to

use crypto_functions::md5;

if the feature flag crypto-functions was enabled, and

pub fn md5(_: &[ColumnarValue]) -> Result<ColumnarValue> {
    Err(DataFusionError::Internal(
        "requires compilation with feature flag: crypto-functions".to_string(),
    ))
}

CI Thoughts

I think adding a new CI job for datafusion without default features would be enough (aka test without any of the functions enabled) -- and as we add more feature flags we could just keep using the same CI test.

Warnings

Just FYI I got a bunch of warnings when compiling without the feature flags:

cd /Users/alamb/Software/rust_datafusion_deps && cargo run
warning: unused import: `std::sync::Arc`
  --> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:19:5
   |
19 | use std::sync::Arc;
   |     ^^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

warning: unused import: `scalar::ScalarValue`
  --> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:31:5
   |
31 |     scalar::ScalarValue,
   |     ^^^^^^^^^^^^^^^^^^^

warning: unused imports: `Array`, `BinaryArray`, `GenericStringArray`, `StringOffsetSizeTrait`, `datatypes::DataType`
  --> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:34:13
   |
34 |     array::{Array, BinaryArray, GenericStringArray, StringOffsetSizeTrait},
   |             ^^^^^  ^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^
35 |     datatypes::DataType,
   |     ^^^^^^^^^^^^^^^^^^^

warning: unused import: `string_expressions::unary_string_function`
  --> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:38:13
   |
38 | use super::{string_expressions::unary_string_function, ColumnarValue};
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: missing documentation for a function
   --> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:213:1
    |
213 | pub fn md5(_: &[ColumnarValue]) -> Result<ColumnarValue> {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
note: the lint level is defined here
   --> /Users/alamb/Software/arrow2/rust/datafusion/src/lib.rs:17:9
    |
17  | #![warn(missing_docs)]
    |         ^^^^^^^^^^^^

warning: missing documentation for a function
   --> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:220:1
    |
220 | pub fn sha224(_: &[ColumnarValue]) -> Result<ColumnarValue> {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: missing documentation for a function
   --> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:227:1
    |
227 | pub fn sha256(_: &[ColumnarValue]) -> Result<ColumnarValue> {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: missing documentation for a function
   --> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:234:1
    |
234 | pub fn sha384(_: &[ColumnarValue]) -> Result<ColumnarValue> {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: missing documentation for a function
   --> /Users/alamb/Software/arrow2/rust/datafusion/src/physical_plan/crypto_expressions.rs:241:1
    |
241 | pub fn sha512(_: &[ColumnarValue]) -> Result<ColumnarValue> {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: 9 warnings emitted

@alamb
Copy link
Contributor

alamb commented Feb 25, 2021

FYI @ovr -- given you contributed the original crypto functions, do you have any thoughts about making them optional (on by default)?

@ovr
Copy link
Contributor

ovr commented Feb 25, 2021

FYI @ovr -- given you contributed the original crypto functions, do you have any thoughts about making them optional (on by default)?

Looks ok to me :)

@seddonm1
Copy link
Contributor Author

Thanks @alamb

I did try to wrap code blocks {} with the macro and that did not work. I will see if the compiler is clever enough to handle macros as I agree this gets pretty messy.

I am not sure if we will ever be able to get away from the warnings unless MANY annotations are added which makes the code really horrible. I will try a few more tests.

@elferherrera
Copy link
Contributor

@seddonm1 I think you dont need to put the #[cfg(feature)] to all your functions. As long as you only put it where you are going to use the module. Something like this example.

@seddonm1
Copy link
Contributor Author

seddonm1 commented Mar 1, 2021

Thanks @alamb and @elferherrera

I have incorporated both your ideas within the limitations of the rust compiler; namely:

  1. feature flag names if used like #[cfg(feature = "crypto_expressions")] lose type safety on the feature name so care needs to be taken.
  2. using the cfg! macro looks like it would work but requires the function to compile under both conditions which is not easy.

I have also updated the rust.yml CICD file and added tests for consistency.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this @seddonm1 -- really nice work and a pattern that we can follow with other functions.

@@ -447,6 +446,26 @@ pub fn return_type(
}
}

#[cfg(feature = "crypto_expressions")]
macro_rules! invoke_if_crypto_expressions_feature_flag {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@alamb alamb requested a review from andygrove March 1, 2021 17:28
@alamb
Copy link
Contributor

alamb commented Mar 1, 2021

@andygrove @Dandandan @elferherrera what do you think of this approach? I really like it

}};
}

#[cfg(not(feature = "crypto_expressions"))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can put the #[cfg(feature)] inside the macro so you dont have to maintain two funtions
What do you think?

macro_rules! invoke_if_crypto_expressions_feature_flag {
    ($FUNC:ident, $NAME:expr) => {
        #[cfg(not(feature = "crypto_expressions"))]
        {
            |_: &[ColumnarValue]| -> Result<ColumnarValue> {
                Err(DataFusionError::Internal(format!(
                    "function {} requires compilation with feature flag: crypto_expressions.",
                    $NAME
                )))
            }
        }

        #[cfg(feature = "crypto_expressions")]
        {
            use crate::physical_plan::crypto_expressions;
            crypto_expressions::$FUNC
        }
    };
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elferherrera I tried this but unfortunately the compiler does not like it. The #[cfg] tag does not operate on {..} blocks.

The end-state of this was my least-bad solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seddonm1 That's odd, because before posting that I did a small test and it did compile.

#[cfg(feature = "rand_calc")]
use playground::myrand;

macro_rules! testing_something {
    ($name:expr) => {
        #[cfg(not(feature = "rand_calc"))]
        {
            println!("No feature: {}", $name);
        }

        #[cfg(feature = "rand_calc")]
        {
            println!("Feature: {}", $name);
        }
    };
}

fn main() {
    let a = 3.0001f64.to_le_bytes();
    println!("{:?}", a);

    #[cfg(feature = "rand_calc")]
    {
        let res = myrand::with_rand::create_rnd();
        println!("{:?}", res);
    }

    testing_something!("Name");
}

I should have tested it with your code to see if it would compile as well

@@ -40,9 +40,10 @@ name = "datafusion-cli"
path = "src/bin/main.rs"

[features]
default = ["cli"]
default = ["cli", "crypto_expressions"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By having them on by default, how do we reduce compile times?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By having them on by default, how do we reduce compile times?

You can locally run the tests with cargo test --features cli or something, right?

@Dandandan
Copy link
Contributor

The features for cryptography functions looks good to me! Thanks @seddonm1 !

I think it is a good idea to make some things optional, to be able to exclude certain features. I think the features introduce some more complexity to the build, as we introduce more variants that maybe could fail to compile or to fail passing test in some configuration.
So, overall I think we should strive to keep the usage relatively straightforward and avoid too complex usage.

If we can reduce the number of dependencies or move something out, I think that might be preferable if at all possible.

@seddonm1
Copy link
Contributor Author

seddonm1 commented Mar 1, 2021

@Dandandan this was discussed at the last Arrow Rust sync call.

The intention was to allow reducing compile time of projects which depend on DataFusion - not the DataFusion project itself.

I think that care should be taken to keep the dependencies sensible but ultimately if we are chasing some sort of SQL compatibility (i.e. Postgres) then having the standard set of SQL functions on by default seems the correct approach.

@elferherrera
Copy link
Contributor

I agree with the macro for the dependencies. Im just curious on why in the TOML file in the features section the crypto feature is added: default = ["cli", "crypto_expressions"]. Should it only be default = ["cli"]?

@seddonm1
Copy link
Contributor Author

seddonm1 commented Mar 2, 2021

@elferherreraI I think the response above answers this? (#9567 (comment))

@alamb
Copy link
Contributor

alamb commented Mar 2, 2021

If we can reduce the number of dependencies or move something out, I think that might be preferable if at all possible.

@Dandandan I suggest the approach, long term, of putting extension packages (and their tests) in their own crates. I envision this as a two step process:

  1. Pull the functions into crates in the same repo, but still release them all at the same time
  2. Eventually pull the functions into some other repo which would free them to be released on whatever timeline was good.

The thing is that we don't have a huge amount of dev time to devote to this project, and I worry about the overhead of splitting into separate crates.

However until the time we have time to do something better, I think feature flags is the minimal overhead way to make the dependencies optional.

@elferherrera
Copy link
Contributor

I think that care should be taken to keep the dependencies sensible but ultimately if we are chasing some sort of SQL compatibility (i.e. Postgres) then having the standard set of SQL functions on by default seems the correct approach.

Sorry, didn't read carefully the last bit. So I guess if we don't want crypto you can only select --features cli like Andrew suggested.

I like the changes

@alamb alamb closed this in cc49beb Mar 3, 2021
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
@alamb @andygrove @nevi-me

Based on the Rust Arrow sync call we discussed setting feature flags to allow use of DataFusion without having to pull in unnecessary dependencies.

Here is an example applied to the crypto functions. It:

- puts the crypto functions behind a feature flag called `crypto-functions` with the required dependencies tagged.
- adds the `crypto-functions` to the `default` feature flag meaning it will be included in the default build.
- adds tests for both states of the feature flag. Note that the cargo behavior of specifying conditions in the rust test framework is inverted (at least in my mind) so that you can only specify `ignore` not `include` rules. Practically this means to run the test with the feature flag you do: `#[cfg_attr(not(feature = "crypto-functions"), ignore)]` which is not ideal but functional.
- currently I am throwing `Err(DataFusionError::Internal("requires compilation with feature flag: crypto-functions".to_string()))` errors which I think is a better user experience than `panic` based macros but this could easily be changed to `unimplemented!` with a similar message.
- unfortunately we will probably need to execute `cargo test --no-default-features --features cli` AND `cargo test` in CICD to ensure the proper coverage (`--features cli` is required to compile).

Thoughts?

Closes apache#9567 from seddonm1/feature-flags

Authored-by: Mike Seddon <seddonm1@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
@alamb @andygrove @nevi-me

Based on the Rust Arrow sync call we discussed setting feature flags to allow use of DataFusion without having to pull in unnecessary dependencies.

Here is an example applied to the crypto functions. It:

- puts the crypto functions behind a feature flag called `crypto-functions` with the required dependencies tagged.
- adds the `crypto-functions` to the `default` feature flag meaning it will be included in the default build.
- adds tests for both states of the feature flag. Note that the cargo behavior of specifying conditions in the rust test framework is inverted (at least in my mind) so that you can only specify `ignore` not `include` rules. Practically this means to run the test with the feature flag you do: `#[cfg_attr(not(feature = "crypto-functions"), ignore)]` which is not ideal but functional.
- currently I am throwing `Err(DataFusionError::Internal("requires compilation with feature flag: crypto-functions".to_string()))` errors which I think is a better user experience than `panic` based macros but this could easily be changed to `unimplemented!` with a similar message.
- unfortunately we will probably need to execute `cargo test --no-default-features --features cli` AND `cargo test` in CICD to ensure the proper coverage (`--features cli` is required to compile).

Thoughts?

Closes apache#9567 from seddonm1/feature-flags

Authored-by: Mike Seddon <seddonm1@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants