Config support type conversion#3522
Conversation
| pub fn get_bool(&self, key: &str) -> bool { | ||
| match self.get(key) { | ||
| Some(ScalarValue::Boolean(Some(b))) => b, | ||
| Some(b) => b |
There was a problem hiding this comment.
Thanks for looking at this @comphead. I think it might be better to do the conversion in the set methods rather than the get methods, and have the set methods return a Result but I haven't looked closely at this.
There was a problem hiding this comment.
Agreed with @andygrove. It would be better to make the configuration dynamically and strongly typed:
pub fn get_bool(&self, key: &str) -> Result<bool> {
match self.get(key) {
Some(ScalarValue::Boolean(Some(b))) => b,
Some(_) => Err(DatafusionError("The configuration {} is not a Boolean", key)),
None => Err(DatafusionError("The configuration {} is not set", key)),
}}|
Why does this PR close 3500? |
| Some(ScalarValue::Utf8(Some(s))) => s, | ||
| _ => "".into(), | ||
| Some(s) => s.to_string(), | ||
| _ => "".to_string(), |
There was a problem hiding this comment.
If the key is not set, should we return an Error?
There was a problem hiding this comment.
I think a more common thing in Rust would be to return Option<String> so if the string wasn't set the caller can react to that.
|
@andygrove @HaoYang670 Thank you for comments.
|
Personally, I prefer returning an error, because it tells users what has happened (a get_before_set error). And users can customize the error in their ways.
I guess this assumption is somewhat unrealistic. How could a reader use the config if he/she doesn't know the type of it? |
|
Could you please remove |
Hi @HaoYang670, thanks for pointing out on this. That makes sense. If we consider spark config, which supports both initial and runtime config params, there all values are strings. The end user casts the config value to the type he needs on his side. If its not the case, then making values strongly typed on set is good idea |
Codecov Report
@@ Coverage Diff @@
## master #3522 +/- ##
==========================================
+ Coverage 85.79% 85.90% +0.11%
==========================================
Files 300 301 +1
Lines 55403 56046 +643
==========================================
+ Hits 47533 48147 +614
- Misses 7870 7899 +29
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
Made get methods as options, returning None if the key doesn't exist. |
we have already had a typeless /// get a configuration option
pub fn get(&self, key: &str) -> Option<ScalarValue> {
self.options.get(key).cloned()
}for users who don't know the type of the config. For other |
# Conflicts: # datafusion/core/src/execution/context.rs
alamb
left a comment
There was a problem hiding this comment.
Thank you @comphead -- this is great progress. I am sorry for the late review.
I would really prefer:
- No panic's
- Do not try and parse any arbitrary
ScalarValuevariant (parsingScalarValue::Utf8makes a lot of sense to me)
I don't have a strong opinion on how to handle config values stored as strings but that can't be parsed into whatever type DataFusion expects.
I think the most complete API would be returning Result<Option<..>>
/// return the value of the key as a `u64`. If there is no value or a null
/// value stored for that key, returns Ok(None)
///
/// If the value was stored as a string but that string is not a valid `u64` returns
/// Err
pub fn get_u64(&self, key: &str) -> Result<Option<u64>> {
...
}But I am not sure if that would be annoying to call in the rest of the DataFusion codebase
| Some(b) => Some( | ||
| b.to_string() | ||
| .parse::<bool>() | ||
| .unwrap_or_else(|_| panic!("Cannot parse bool from {:?}", &b)), |
There was a problem hiding this comment.
🤔 I feel like in other PRs like #3316 are going in the opposite direction and removing the use of panic!
I would prefer this API returned None rather than panic'd (and maybe log a warn! message)
There was a problem hiding this comment.
I agree. We don't want to introduce more panics at this point. There is an effort to remove as many of the existing panics as is practical.
| Some(ScalarValue::UInt64(Some(n))) => n, | ||
| _ => 0, | ||
| Some(ScalarValue::UInt64(n)) => n, | ||
| Some(n) => Some( |
There was a problem hiding this comment.
calling to_string() on a ScalarValue calls its Display implementation: https://github.com/apache/arrow-datafusion/blob/master/datafusion/common/src/scalar.rs#L1991
I am not sure if trying to parse this as a u64 is what someone may expect. For example, if it is ScalarValue::Utf8(None) the value returned will be "None" which will fail to parse and panic.
I would recommend only trying to parse ScalarValue::Utf8(n)) rather than any ScalarValue
|
I plan on reviewing this tomorrow. |
|
Thanks @alamb for detailed suggestions, I have digged into couple of config implementations, namely https://docs.rs/config/latest/src/config/config.rs.html#180 They use Result with Err for both parse or key not found |
|
@andygrove please us know your thoughts as well, so we finally can close it |
| Some(ScalarValue::Utf8(Some(s))) => s, | ||
| _ => "".into(), | ||
| Some(ScalarValue::Utf8(s)) => s, | ||
| Some(s) => Some(s.to_string()), |
There was a problem hiding this comment.
I'm not sure this is what we want. What does this return for a u64 value? Could you add a test for this?
|
Apologies for not getting to this sooner. I have been overstretched recently. Here is my view on how we should implement this (we don't have to do all of this in this PR, though). In the If we do the validation when setting configs, then there is no need to return a |
Thanks @andygrove. Implemented exactly like discussed. |
| Some(ScalarValue::$TPE(v)) => v, | ||
| Some(v) => { | ||
| warn!( | ||
| "Config type mismatch for {}. Expected: {}, got: {:?}", |
|
Benchmark runs are scheduled for baseline = be6ad1c and contender = 1e16e43. 1e16e43 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3505 .
Rationale for this change
Avoid unexpected defaults when the type doesn't match for config value
What changes are included in this PR?
Conversion between types when reading config values
Are there any user-facing changes?