Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible inputs #97

Closed
wants to merge 4 commits into from
Closed

Conversation

nmathewson
Copy link
Contributor

Hello!

Motivation

We're converting a pre-existing application from config-rs to Figment. As it stands, our users have a number of pre-existing configuration files that Figment will not accept, because those files:

  • represent booleans as numbers (1, 0)
  • represent booleans as strings ("true", "false")
  • represent numbers as strings ("5", "10").

Previously, all of these representations were accepted. Thus, we can't convert to Figment without breaking our existing users.

This PR

This PR adapts the deserialize implementations in Figment to accept these representations as well.

Questions for the reviewer:

  1. Would you prefer that this behavior not be the default? If so, please let me know what kind of option you would like it to be controlled by.

  2. I've written more tests and documentation this time, but I'm not sure I have as much documentation as you'd like. Just let me know! :) If you point me to the kind of tests and documentation you would prefer, I'll try to add it.

When a bool is expected, we now accept the numbers 1 and 0, and the
case-insensitive strings "true", "false", "on", "off", "yes", "no",
"1", and "0".

When a number is expected, we now accept a string that parses to a
number.

New Value::to_flexible_{bool,num} functions.

For many configuration formats, it's convenient to accept the string
`"true"` is if it were the boolean `true`, or the string `"7"` as if it
were the number `7`.  This is part of making that possible.
@SergioBenitez
Copy link
Owner

This is great. Going to make a few edits and then push it back your way for some final comments.

@@ -563,6 +717,27 @@ impl Num {
Num::F64(v) => Actual::Float(v as f64),
}
}

// /// Given a number, return the narrowest representation of that number.
// fn compact(self) -> Self {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, to_num_lossy() used this method to get the lowest fitting integer type. I've now changed it to use usize or isize irrespective of the narrowest size -- this is commensurate with how values are parsed from environment variables (see value/parse.rs). I'm wondering, however, if the compacting behavior was important for applications I can't think of at the moment. And if it is, then we should change both to to_lossy() impl and the value parser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't 100% remember why I had this method; I think it should be okay to leave values in a wider form, unless I am also missing something. Perhaps I wrote a test that behaved badly when the result was truly u128?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to choose the smallest representation, I think that's okay. I don't know if there's any precedence for doing this or not. Deserialization libraries like json and toml simply choose a representation like i64 and use it for everything.

Whatever we do, we should be consistent everywhere. I think the way to go here is to implement FromStr for Num and have that be the canonical implementation. I'll do that now and compact in it, and we can figure out if that's the most sensible thing or not.

@SergioBenitez
Copy link
Owner

Okay, I've made those edits. It's mostly a simplification of what you'd written. There is one "large" change though, which I need your input for in the comment above.

src/value/de.rs Outdated Show resolved Hide resolved
@@ -563,6 +717,27 @@ impl Num {
Num::F64(v) => Actual::Float(v as f64),
}
}

// /// Given a number, return the narrowest representation of that number.
// fn compact(self) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't 100% remember why I had this method; I think it should be okay to leave values in a wider form, unless I am also missing something. Perhaps I wrote a test that behaved badly when the result was truly u128?

Comment on lines 368 to 371
if let Ok(n) = s.parse::<usize>() {
Some(n.into())
} else if let Ok(n) = s.parse::<isize>() {
Some(n.into())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand right, using only usize/isize here means that 128-bit values can't be configured as strings (and neither can 64-bit values on 32-bit systems). That seems like a surprising behavior for users.

In other words,

# this works
x = 1
# and this works
x = "1"
# and this works
x = 300000000000000000000
# but this doesn't work
x = "300000000000000000000"

That seems like it's going to confuse somebody at some point.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but this is the way the value parser works right now. My point is not to say it should work this way, but that all conversions from strings to Values of any kind in the library should all work the same. Any reasonable and consistent behavior is okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense; and I like the new approach. It seems to simplify things in a few ways.

@nmathewson
Copy link
Contributor Author

Hi! I've left some comments and questions. They may be offbase; you know your codebase better than I do, so feel free to do what you think is best if I'm wrong here. ;)

@SergioBenitez
Copy link
Owner

Hi! I've left some comments and questions. They may be offbase; you know your codebase better than I do, so feel free to do what you think is best if I'm wrong here. ;)

No, totally on-point! Let me know what you think about the updates here. Feel free to make any changes yourself, of course!

@nmathewson
Copy link
Contributor Author

I'm happy with where this is now; I think it's good to merge.

@SergioBenitez
Copy link
Owner

SergioBenitez commented Mar 28, 2024

I think there might still be consistency issues with the Env variable parser. Can you take a look at the Env provider and ensure that the way it parses values is consistent with this new lossy mechanism? Some tests would be great too.

@nmathewson
Copy link
Contributor Author

Sure, I can give it a shot. Can you give me a little more clarity on what exactly I should be checking? I see three ways ahead, and you may see more.

My working guess is that what I should be checking is that whenever an option is set via APP_XYZ=STR in Env, we get the same result as if that same option were set via xyz = "STR" in a configuration file?

As I'm reading the code, that conversion from "STR" to a Value in env happens in <Env as Provider>::data where it calls v::parse(), which uses <Value as FromStr>, which is implemented in value/parse.rs.

This makes me think there are a few possibilities for testing/consistency:

Possibility 1

Write a test to ensure that for all strings s:

  1. If Value::from_str(s) is a bool, then Value::String(s).to_bool_lossy() is the same bool.
  2. If Value::String(s).to_bool_lossy() is a bool, then Value::from_str(s) is the same bool.
  3. If Value::from_str(s) is a number, then Value::String(s).to_num_lossy() is the same number.
  4. If Value::String(s).to_num_lossy() is a number, then Value::from_str(s) is the same number.

But note that property 2 is not true: "TRUE" and "on" and "1" will get handled as true by to_bool_lossy but not by Value::from_str.

So if we're going to take this approach, we'll need to tolerate some inconsistency, or we'll need to cause things like "APP_VAR=1" to get parsed as Value::Bool, which is probably not a great idea.

Possiblility 2

Check something other than the four properties above. For example, we could make sure that Value::from_str(s) and Value::String(s) deserialize into the same booleans and numbers for every s.

This would amount to saying that we allow Value::from_str(s) to produce best-guess Values, so long as they eventually deserialize to the same thing as if those values had come in as Value::String(s).

Possibility 3

Revise Value::from_str so that it always converts its string inputs into Value::String, and never Value::Bool or Value::Num. Allow conversion to bool or num to come later, during deserialization.

This is my favorite option. On the positive side, it would remove redundant parsing code. It would also make Env more consistent in some cases. For example, the current Env parser can be lossy if it parses a number that was supposed to be a string, as in "APP_USERNAME=012345". (IIUC, this will parse to the number 12345 rather than the string "012345". If the application wanted a string here, there is no way to find the leading 0 that was originally there.)

On the minus side, this would break compatibility with anybody who is currently relying on the Values returned from Env to do early conversion into Num and Bool.

What do you think?

@SergioBenitez
Copy link
Owner

SergioBenitez commented Apr 2, 2024

Possibility 3 is interesting indeed, but I do worry it will break things. Perhaps the best approach is in fact to do nothing at all. After all, these are largely two different things. What we've done here, with this PR, is effectively use the deserializer hints we weren't using before. But this isn't and perhaps shouldn't be related to how a Value is parsed from a string.

By the same token, we've now added meaning to certain strings where nothing like it existed before. In particular, the lossy boolean strings on/off, 1/0, yes/no.

I'm a bit concerned about these, and I'm now reminded that we don't have to support this in Figment itself. Specifically, you're always free to write a custom deserializer for your boolean fields that accept the strings you want. In fact, we provide one that's almost identical to what we've done here in figment itself:

#[derive(Deserialize)]
struct Config {
    #[serde(deserialize_with = "figment::util::bool_from_str_or_int")]
    cli_colors: bool,
}

So I believe my real suggestion here is to remove the lossy string -> bool in the deserializer that we've added here. Perhaps even scrap the PR entirely. All of this can be done on the deserialization side without us needing to perhaps unexpectedly do a "dynamic cast" of sorts during deserialization. In fact, this might be particularly concerning as this PR makes it impossible to reject a string that we've now made represent a bool or number. For example, it's impossible to write:

#[derive(Deserialize)]
struct Config {
    flag: bool,
}

Such that flag rejects the strings "on" and "off".

All of this leads me to believe that while the "lossy" conversions we've added here are useful, perhaps they're best suited as util functions like the existing bool_from_str_or_int so we can leave the decision to the user. We should also add more documentation to make this clear.

@nmathewson
Copy link
Contributor Author

Hm. That approach would work, but the trouble is that we would need to annotate every one of our bool and numeric options with deserialize_with... and if we forgot one, we would have inconsistent semantics for that option. (Right now we have dozens of configuration values of each type, and our configuration structures are spread across the several crates that they configure, so it would be easy to forget.)

But I agree that we shouldn't break things.

Maybe instead it would make sense to provide an alternative Deserializer that would make this behavior the default? If you don't think that's reasonable, I can give it a try.

The implementation would probably look something like:

pub struct ConfiguredValueDe<'c, F=DefaultFlavor> {...}

where F is zero-sized-type that decides how to handle the primitive types.

nmathewson added a commit to nmathewson/Figment that referenced this pull request Apr 2, 2024
@nmathewson
Copy link
Contributor Author

I have opened a new PR, #100, to show what I mean.

nmathewson added a commit to nmathewson/Figment that referenced this pull request Apr 2, 2024
SergioBenitez added a commit to nmathewson/Figment that referenced this pull request Apr 18, 2024
This commit adds `Figment::extract_{inner_}lossy()`, variants of the
existing methods that convert string representations of booleans and
integers into their boolean and integer forms. The original string form
is lost and is not directly recoverable.

Methods that performs the same conversion are added to `Value` types:

  * `Value::to_num_lossy()`
  * `Value::to_bool_lossy()`
  * `Num::to_u128_lossy()`
  * `Num::from_str()`

Closes SergioBenitez#97.

Co-authored-by: Sergio Benitez <sb@sergio.bz>
SergioBenitez added a commit to nmathewson/Figment that referenced this pull request Apr 18, 2024
This commit adds `Figment::extract_{inner_}lossy()`, variants of the
existing methods that convert string representations of booleans and
integers into their boolean and integer forms. The original string form
is lost and is not directly recoverable.

Methods that performs the same conversion are added to `Value` types:

  * `Value::to_num_lossy()`
  * `Value::to_bool_lossy()`
  * `Num::to_u128_lossy()`
  * `Num::from_str()`

Closes SergioBenitez#97.

Co-authored-by: Sergio Benitez <sb@sergio.bz>
SergioBenitez added a commit to nmathewson/Figment that referenced this pull request Apr 18, 2024
This commit adds `Figment::extract_{inner_}lossy()`, variants of the
existing methods that convert string representations of booleans and
integers into their boolean and integer forms. The original string form
is lost and is not directly recoverable.

Methods that performs the same conversion are added to `Value` types:

  * `Value::to_num_lossy()`
  * `Value::to_bool_lossy()`
  * `Num::to_u128_lossy()`
  * `Num::from_str()`

Closes SergioBenitez#97.

Co-authored-by: Sergio Benitez <sb@sergio.bz>
SergioBenitez added a commit to nmathewson/Figment that referenced this pull request Apr 18, 2024
This commit adds `Figment::extract_{inner_}lossy()`, variants of the
existing methods that convert string representations of booleans and
integers into their boolean and integer forms. The original string form
is lost and is not directly recoverable.

Methods that performs the same conversion are added to `Value` types:

  * `Value::to_num_lossy()`
  * `Value::to_bool_lossy()`
  * `Num::to_u128_lossy()`
  * `Num::from_str()`

Closes SergioBenitez#97.

Co-authored-by: Sergio Benitez <sb@sergio.bz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants