Flexible inputs #97

nmathewson · 2024-03-21T13:57:04Z

Hello!

Motivation

We're converting a pre-existing application from config-rs to Figment. As it stands, our users have a number of pre-existing configuration files that Figment will not accept, because those files:

represent booleans as numbers (1, 0)
represent booleans as strings ("true", "false")
represent numbers as strings ("5", "10").

Previously, all of these representations were accepted. Thus, we can't convert to Figment without breaking our existing users.

This PR

This PR adapts the deserialize implementations in Figment to accept these representations as well.

Questions for the reviewer:

Would you prefer that this behavior not be the default? If so, please let me know what kind of option you would like it to be controlled by.
I've written more tests and documentation this time, but I'm not sure I have as much documentation as you'd like. Just let me know! :) If you point me to the kind of tests and documentation you would prefer, I'll try to add it.

When a bool is expected, we now accept the numbers 1 and 0, and the case-insensitive strings "true", "false", "on", "off", "yes", "no", "1", and "0". When a number is expected, we now accept a string that parses to a number. New Value::to_flexible_{bool,num} functions. For many configuration formats, it's convenient to accept the string `"true"` is if it were the boolean `true`, or the string `"7"` as if it were the number `7`. This is part of making that possible.

SergioBenitez · 2024-03-26T23:02:08Z

This is great. Going to make a few edits and then push it back your way for some final comments.

SergioBenitez · 2024-03-27T00:01:34Z

src/value/value.rs

@@ -563,6 +717,27 @@ impl Num {
            Num::F64(v) => Actual::Float(v as f64),
        }
    }
+
+    // /// Given a number, return the narrowest representation of that number.
+    // fn compact(self) -> Self {


Previously, to_num_lossy() used this method to get the lowest fitting integer type. I've now changed it to use usize or isize irrespective of the narrowest size -- this is commensurate with how values are parsed from environment variables (see value/parse.rs). I'm wondering, however, if the compacting behavior was important for applications I can't think of at the moment. And if it is, then we should change both to to_lossy() impl and the value parser.

I don't 100% remember why I had this method; I think it should be okay to leave values in a wider form, unless I am also missing something. Perhaps I wrote a test that behaved badly when the result was truly u128?

If we want to choose the smallest representation, I think that's okay. I don't know if there's any precedence for doing this or not. Deserialization libraries like json and toml simply choose a representation like i64 and use it for everything.

Whatever we do, we should be consistent everywhere. I think the way to go here is to implement FromStr for Num and have that be the canonical implementation. I'll do that now and compact in it, and we can figure out if that's the most sensible thing or not.

SergioBenitez · 2024-03-27T00:02:38Z

Okay, I've made those edits. It's mostly a simplification of what you'd written. There is one "large" change though, which I need your input for in the comment above.

src/value/de.rs

nmathewson · 2024-03-27T12:26:41Z

src/value/value.rs

@@ -563,6 +717,27 @@ impl Num {
            Num::F64(v) => Actual::Float(v as f64),
        }
    }
+
+    // /// Given a number, return the narrowest representation of that number.
+    // fn compact(self) -> Self {


I don't 100% remember why I had this method; I think it should be okay to leave values in a wider form, unless I am also missing something. Perhaps I wrote a test that behaved badly when the result was truly u128?

nmathewson · 2024-03-27T12:30:03Z

src/value/value.rs

+                if let Ok(n) = s.parse::<usize>() {
+                    Some(n.into())
+                } else if let Ok(n) = s.parse::<isize>() {
+                    Some(n.into())


If I understand right, using only usize/isize here means that 128-bit values can't be configured as strings (and neither can 64-bit values on 32-bit systems). That seems like a surprising behavior for users.

In other words,

# this works x = 1 # and this works x = "1" # and this works x = 300000000000000000000 # but this doesn't work x = "300000000000000000000"

That seems like it's going to confuse somebody at some point.

That's true, but this is the way the value parser works right now. My point is not to say it should work this way, but that all conversions from strings to Values of any kind in the library should all work the same. Any reasonable and consistent behavior is okay.

That makes sense; and I like the new approach. It seems to simplify things in a few ways.

nmathewson · 2024-03-27T12:31:19Z

Hi! I've left some comments and questions. They may be offbase; you know your codebase better than I do, so feel free to do what you think is best if I'm wrong here. ;)

SergioBenitez · 2024-03-27T22:57:37Z

Hi! I've left some comments and questions. They may be offbase; you know your codebase better than I do, so feel free to do what you think is best if I'm wrong here. ;)

No, totally on-point! Let me know what you think about the updates here. Feel free to make any changes yourself, of course!

nmathewson · 2024-03-28T14:33:25Z

I'm happy with where this is now; I think it's good to merge.

SergioBenitez · 2024-03-28T16:44:03Z

I think there might still be consistency issues with the Env variable parser. Can you take a look at the Env provider and ensure that the way it parses values is consistent with this new lossy mechanism? Some tests would be great too.

nmathewson · 2024-03-28T18:45:42Z

Sure, I can give it a shot. Can you give me a little more clarity on what exactly I should be checking? I see three ways ahead, and you may see more.

My working guess is that what I should be checking is that whenever an option is set via APP_XYZ=STR in Env, we get the same result as if that same option were set via xyz = "STR" in a configuration file?

As I'm reading the code, that conversion from "STR" to a Value in env happens in <Env as Provider>::data where it calls v::parse(), which uses <Value as FromStr>, which is implemented in value/parse.rs.

This makes me think there are a few possibilities for testing/consistency:

Possibility 1

Write a test to ensure that for all strings s:

If Value::from_str(s) is a bool, then Value::String(s).to_bool_lossy() is the same bool.
If Value::String(s).to_bool_lossy() is a bool, then Value::from_str(s) is the same bool.
If Value::from_str(s) is a number, then Value::String(s).to_num_lossy() is the same number.
If Value::String(s).to_num_lossy() is a number, then Value::from_str(s) is the same number.

But note that property 2 is not true: "TRUE" and "on" and "1" will get handled as true by to_bool_lossy but not by Value::from_str.

So if we're going to take this approach, we'll need to tolerate some inconsistency, or we'll need to cause things like "APP_VAR=1" to get parsed as Value::Bool, which is probably not a great idea.

Possiblility 2

Check something other than the four properties above. For example, we could make sure that Value::from_str(s) and Value::String(s) deserialize into the same booleans and numbers for every s.

This would amount to saying that we allow Value::from_str(s) to produce best-guess Values, so long as they eventually deserialize to the same thing as if those values had come in as Value::String(s).

Possibility 3

Revise Value::from_str so that it always converts its string inputs into Value::String, and never Value::Bool or Value::Num. Allow conversion to bool or num to come later, during deserialization.

This is my favorite option. On the positive side, it would remove redundant parsing code. It would also make Env more consistent in some cases. For example, the current Env parser can be lossy if it parses a number that was supposed to be a string, as in "APP_USERNAME=012345". (IIUC, this will parse to the number 12345 rather than the string "012345". If the application wanted a string here, there is no way to find the leading 0 that was originally there.)

On the minus side, this would break compatibility with anybody who is currently relying on the Values returned from Env to do early conversion into Num and Bool.

What do you think?

SergioBenitez · 2024-04-02T02:20:52Z

Possibility 3 is interesting indeed, but I do worry it will break things. Perhaps the best approach is in fact to do nothing at all. After all, these are largely two different things. What we've done here, with this PR, is effectively use the deserializer hints we weren't using before. But this isn't and perhaps shouldn't be related to how a Value is parsed from a string.

By the same token, we've now added meaning to certain strings where nothing like it existed before. In particular, the lossy boolean strings on/off, 1/0, yes/no.

I'm a bit concerned about these, and I'm now reminded that we don't have to support this in Figment itself. Specifically, you're always free to write a custom deserializer for your boolean fields that accept the strings you want. In fact, we provide one that's almost identical to what we've done here in figment itself:

#[derive(Deserialize)]
struct Config {
    #[serde(deserialize_with = "figment::util::bool_from_str_or_int")]
    cli_colors: bool,
}

So I believe my real suggestion here is to remove the lossy string -> bool in the deserializer that we've added here. Perhaps even scrap the PR entirely. All of this can be done on the deserialization side without us needing to perhaps unexpectedly do a "dynamic cast" of sorts during deserialization. In fact, this might be particularly concerning as this PR makes it impossible to reject a string that we've now made represent a bool or number. For example, it's impossible to write:

#[derive(Deserialize)]
struct Config {
    flag: bool,
}

Such that flag rejects the strings "on" and "off".

All of this leads me to believe that while the "lossy" conversions we've added here are useful, perhaps they're best suited as util functions like the existing bool_from_str_or_int so we can leave the decision to the user. We should also add more documentation to make this clear.

nmathewson · 2024-04-02T13:02:20Z

Hm. That approach would work, but the trouble is that we would need to annotate every one of our bool and numeric options with deserialize_with... and if we forgot one, we would have inconsistent semantics for that option. (Right now we have dozens of configuration values of each type, and our configuration structures are spread across the several crates that they configure, so it would be easy to forget.)

But I agree that we shouldn't break things.

Maybe instead it would make sense to provide an alternative Deserializer that would make this behavior the default? If you don't think that's reasonable, I can give it a try.

The implementation would probably look something like:

pub struct ConfiguredValueDe<'c, F=DefaultFlavor> {...}

where F is zero-sized-type that decides how to handle the primitive types.

(These are taken from their final forms in SergioBenitez#97.)

nmathewson · 2024-04-02T14:19:53Z

I have opened a new PR, #100, to show what I mean.

(These are taken from their final forms in SergioBenitez#97.)

This commit adds `Figment::extract_{inner_}lossy()`, variants of the existing methods that convert string representations of booleans and integers into their boolean and integer forms. The original string form is lost and is not directly recoverable. Methods that performs the same conversion are added to `Value` types: * `Value::to_num_lossy()` * `Value::to_bool_lossy()` * `Num::to_u128_lossy()` * `Num::from_str()` Closes SergioBenitez#97. Co-authored-by: Sergio Benitez <sb@sergio.bz>

nmathewson force-pushed the flexible_inputs branch from 2abaa6d to 4c9258a Compare March 21, 2024 14:00

Simplify lossy (née flexible) values impl.

7c98fae

SergioBenitez force-pushed the flexible_inputs branch from 4c9258a to 7c98fae Compare March 26, 2024 23:57

SergioBenitez reviewed Mar 27, 2024

View reviewed changes

nmathewson commented Mar 27, 2024

View reviewed changes

SergioBenitez added 2 commits March 27, 2024 15:20

Fix lossy value tag propagation.

7c2595f

consistently compact string-parsed numbers

ab8e6c1

nmathewson added a commit to nmathewson/Figment that referenced this pull request Apr 2, 2024

Add to_num_lossy and to_bool_lossy conversion functions to Value.

2eef394

(These are taken from their final forms in SergioBenitez#97.)

nmathewson mentioned this pull request Apr 2, 2024

Flexible inputs, second attempt #100

Merged

nmathewson added a commit to nmathewson/Figment that referenced this pull request Apr 2, 2024

Add to_num_lossy and to_bool_lossy conversion functions to Value.

cc06d4d

(These are taken from their final forms in SergioBenitez#97.)

SergioBenitez closed this in 2c83ffa Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flexible inputs #97

Flexible inputs #97

nmathewson commented Mar 21, 2024

SergioBenitez commented Mar 26, 2024

SergioBenitez Mar 27, 2024

nmathewson Mar 27, 2024

SergioBenitez Mar 27, 2024

SergioBenitez commented Mar 27, 2024

nmathewson Mar 27, 2024

nmathewson Mar 27, 2024

SergioBenitez Mar 27, 2024

nmathewson Mar 28, 2024

nmathewson commented Mar 27, 2024

SergioBenitez commented Mar 27, 2024

nmathewson commented Mar 28, 2024

SergioBenitez commented Mar 28, 2024 •

edited

Loading

nmathewson commented Mar 28, 2024

SergioBenitez commented Apr 2, 2024 •

edited

Loading

nmathewson commented Apr 2, 2024

nmathewson commented Apr 2, 2024

Flexible inputs #97

Flexible inputs #97

Conversation

nmathewson commented Mar 21, 2024

Motivation

This PR

SergioBenitez commented Mar 26, 2024

SergioBenitez Mar 27, 2024

Choose a reason for hiding this comment

nmathewson Mar 27, 2024

Choose a reason for hiding this comment

SergioBenitez Mar 27, 2024

Choose a reason for hiding this comment

SergioBenitez commented Mar 27, 2024

nmathewson Mar 27, 2024

Choose a reason for hiding this comment

nmathewson Mar 27, 2024

Choose a reason for hiding this comment

SergioBenitez Mar 27, 2024

Choose a reason for hiding this comment

nmathewson Mar 28, 2024

Choose a reason for hiding this comment

nmathewson commented Mar 27, 2024

SergioBenitez commented Mar 27, 2024

nmathewson commented Mar 28, 2024

SergioBenitez commented Mar 28, 2024 • edited Loading

nmathewson commented Mar 28, 2024

Possibility 1

Possiblility 2

Possibility 3

SergioBenitez commented Apr 2, 2024 • edited Loading

nmathewson commented Apr 2, 2024

nmathewson commented Apr 2, 2024

SergioBenitez commented Mar 28, 2024 •

edited

Loading

SergioBenitez commented Apr 2, 2024 •

edited

Loading