Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested builder support #254

Closed
ijackson opened this issue Apr 22, 2022 · 10 comments
Closed

Nested builder support #254

ijackson opened this issue Apr 22, 2022 · 10 comments

Comments

@ijackson
Copy link
Contributor

ijackson commented Apr 22, 2022

In Arti, we are using derive_builder for our configuration. Arti's configuration is fairly complicated, and isn't simply a flat list of keys and values. Higher-level configuration structs contain lower-level ones.

Right now we have two approaches to this. In both cases the contained ("sub-struct") derives Builder and the containing "super-struct" contains SubStruct. The two approaches are:

(i) the containing struct (super-struct) derives Builder, with no special builder instructions; so the super-struct contains Option<SubStruct>, and the user must call super.sub_struct(SubStruct::builder().sub_param(42).build()) or whatever. This is clumsy, and also makes it very awkward to adjust different sub_param of the same SubStruct other than at the same place in the calling code (and if a user tries to do so, incautiously, they may easily overwrite earlier settings). Another problem with this is that we need our configuration to be Deserialize. To centralise the defaulting, validation and compatibility code (in the build method), we deserialize the input configuration file into the builder, not the built struct. But if the super-struct contains a built sub-struct, we need to have the sub-structure's build() called in the middle of deserialisation by using special serde escape hatches, making things really quite complex (and making it hard to do good error handling, and handling of compat for old config inputs).

(ii) the containing struct contains SubStructBuilder, and we hand-write an accessor method. We hand-write SuperStruct::build, too, to call the build methods on the sub-structures. This provides a pretty good API, where you can say super.sub_struct().sub_param(42), and just call build once at the end. But it involves us handwriting the accessors and the build function. To give an idea of the size of the problem: in Arti, in two of our higher-level structs, that amounts to about 300 loc of boilerplate, which I would like to get rid of.

So, I would like to do (ii) but have the code autogenerated. derive_builder has most of the necessary code generation, but not a convenient way to specify the desired behaviour. Prior to #245 it would be impossible to combine such "nested" fields with derive-builder-generated fields and impls, but now one can do something like:

    #[builder(setter(custom), field(
        type="SubStructBuilder",
        build="self.sub_struct.build().context(\"sub_struct\")?"
    ))]
    sub_struct: SubStruct,

(and hand-write the accessor). This is pretty nasty, and not something you'd do unless you really wanted to mix sub-struct fields with leaf fields.

I considered whether it would be best for this functionality to live in derive_builder, or elsewhere. The generation of accessors could live elsewhere, although the necessary attributes would have to be smuggled through derive_builder onto the builder struct through builder(derive) and builder_field_attr. But I think the generation of appropriate builder code ought to live in derive_builder, so that it is possible to have a struct that derives Builder and which contains both nested and leaf fields, without having to write ugly derive instructions like the one I show above. (FTAOD, I don't think we currently have any structs in Arti that are mixed in this way, but it seems like it should be made possible.)

So #253 is what I ended up with. To give you an idea of what it looks like in use, here is one of the relevant commits on my corresponding branch of arti.

@TedDriggs
Copy link
Collaborator

The use-case makes sense to me. However, this is the first time I've heard of it being needed, so I'm not sure how widespread the need is.

In approach 2, would the getters described in #7 be sufficient? You'd still have to repeat the field name in the build block, but that's literally adjacent to the field declaration so you'd remove the non-local name repetition of the getter, and since an F2-rename of the field would immediately break the build I think that would be maintainable.


I considered whether it would be best for this functionality to live in derive_builder, or elsewhere. The generation of accessors could live elsewhere, although the necessary attributes would have to be smuggled through derive_builder onto the builder struct through builder(derive) and builder_field_attr

Could you do this the other way round, and have an attribute macro that did the necessary rewriting of SuperStruct?

#[sub_builder]
#[derive(Builder)]
struct SuperStruct {
    #[sub_builder(type = "SubStructBuilder")]
    sub_struct: SubStruct
}

I need to brush up on proc-macro execution order, but if #[sub_builder] runs before the derive macro then it could emit the following:

#[derive(Builder)]
struct SuperStruct {
    #[builder(field(type = "SubStructBuilder", build = "..."))]
    sub_struct: SubStruct
}

impl SuperStructBuilder {
    pub fn sub_struct(...)
}

@ijackson
Copy link
Contributor Author

(Sorry for the delay replying. I wrote this a couple of days ago and I could swear I hit the "comment" button.)

The use-case makes sense to me. However, this is the first time I've heard of it being needed, so I'm not sure how widespread the need is.

Thanks.

Well, Arti is the first project I've done significant work on that uses builder patterns for its configuration, but I understand the builder pattern is quite common. I don't think anything we're doing in Arti is particularly unique. Rust is still quite young in some of these areas, so I don't find it surprising that I'm breaking new ground.

Within Arti: I looked at my outstanding work and my tree has 24 uses of sub_builder. I am expecting to want to introduce more. I'm finding it really pleasant in practice.

In approach 2, would the getters described in #7 be sufficient?

I don't think they would be right, because the accessors need to take &mut self and return &mut SubStructBuilder.

You'd still have to repeat the field name in the build block, but that's literally adjacent to the field declaration so you'd remove the non-local name repetition of the getter, and since an F2-rename of the field would immediately break the build I think that would be maintainable.

That's true. But, I have seem soo many bugs where even adjacent repetition has been botched. If we end up writing code like this, the bug is too easy to write and too hard to spot:

    #[builder(field(
        type = "LoremOrIpsumBuilder",
        build = r#"self.lorem.build().map_err(|e| e.within("lorem")?"#
    ), setter(custom))]
    lorem: LoremOrIpsum,

    #[builder(field(
        type = "LoremOrIpsumBuilder",
        build = r#"self.lorem.build().map_err(|e| e.within("ipsum")?"#
    ), setter(custom))]
    ipsum: LoremOrIpsum,

    #[builder(field(
        type = "DolorBuilder",
        build = r#"self.dolor.build().map_err(|e| e.within("dolor")?"#
    ), setter(custom))]
    dolor: Dolor,

Could you do this the other way round, and have an attribute macro that did the necessary rewriting of SuperStruct?

I believe that would be possible, but I don't think it is a very good design.

It seems like an antipattern to set out to make a proc macro whose purpose is to massage the input for a diffeerent proc macro. The layering of abstractions becomes quite confusing, and the "wrapper" macro now has to unparse the inner proc-macro's input. I think this is a thing one would only do if there were compelling reasons why the "inner" macro couldn't be modified to add the desired features.

(It's a different matter if the two proc macros are semantically orthogonal: I often combine proc macros and am sometimes disappointed when they interact when I wish they wouldn't...)

@TedDriggs
Copy link
Collaborator

It seems like an antipattern to set out to make a proc macro whose purpose is to massage the input for a diffeerent proc macro.

I'm not sure this is true; it's still composition. It still gets the benefits of Rust's safety guarantees, though the errors may not be quite as nice.

The layering of abstractions becomes quite confusing, and the "wrapper" macro now has to unparse the inner proc-macro's input.

I'm not sure what you mean by "unparse" here - the sub_builder macro would have its own attributes that it would read, and then it would write out derive_builder attributes.

I think this is a thing one would only do if there were compelling reasons why the "inner" macro couldn't be modified to add the desired features.

serde has to deal with this, and does so by having some low-level hooks such as with and then having other crates like serde_with or the one for location tracking of errors that callers can use to get additional functionality.


I'm going to take a closer look at the usage of sub_builder in arti. I still have a vague sense that this is too focused on the specific sub-builder use-case, and that there's a more-generalized approach. One particular point of interest there is the addition of the field name to the error; since we already do this for UninitializedFieldError, I wonder if there's a pattern around "builder attempts to initialize field within a try-block, then adds field name if error is emitted."

I do have a fairly strong dislike for the fact that field(build = "...") embeds the field name - it's consistent with how the crate previously used blocks in attribute arguments, but this could potentially have been avoided if that took a path and called that function on the field rather than having the caller specify the field name.

@ijackson
Copy link
Contributor Author

ijackson commented May 3, 2022

(oops, pasted this comment into #253 by mistake, rather than here. I will delete it there.)

I'm going to take a closer look at the usage of sub_builder in arti.

Thanks.

One thing you might want to look at is the art MR I just made where I am trying to sort out our config API for "list-like" config fields. Some of these are plain values, and some are themselves things with builders.

Here's the macro_rules I currently have to help drive sub_builder:

https://gitlab.torproject.org/tpo/core/arti/-/blob/06e51f63843b83506543f7055f220cb4ff0b2538/crates/tor-config/src/list_builder.rs

I think this could usefully be done by a proc macro, perhaps in derive_builder, but it has quite a large API surface.

I think this is a thing one would only do if there were compelling reasons why the "inner" macro couldn't be modified to add the desired features.

serde has to deal with this, and does so by having some low-level hooks such as with and then having other crates like serde_with or the one for location tracking of errors that callers can use to get additional functionality.

Yes. However, of course, serde is already very complicated and is much more at risk of becoming an incoherent pile of random and badly-interoperating features than derive_builder is.

I found the fact that the proc_macro code in #253 is so small quite encouraging.

I still have a vague sense that this is too focused on the specific sub-builder use-case,

Well, the sub_builder attribute that I add in my MR is of course precisely focused on that. But it's just a shorthand for a longer set of attributes, and avoids repetition there. I think it's appropriate for a macro package like derive_builder to have convenience affordances for common patterns.

But I guess maybe you are also looking at the new error type I introduced. You're right that it is quite specific to the sub-builder case. Possibly it would be better to separate out the functions of "error occurred in a particular field" and "error was a sub-builder error" - but IME flat error conversion hierarchies are normally easier to work with than repeated conversions.

So having a specific error for precisely "sub builder for field failed" doesn't seem too bad. The logical conclusion of this argument is is that if derive_builder grew built-in support for Option<Vec<ItemBuilder>> like I have in my macro_rules, there would be a specific error precisely for "sub builder for entry in list in field failed", containing both the field name and the index.

I wonder if there's a pattern around "builder attempts to initialize field within a try-block, then adds field name if error is emitted."

That sounds like a reasonable feature to me. But there are of course two needless occurrences of the field name in the "just use #[builder(field(...))] approach: the one in the error handling, and the one for the sub-field in the outer builder (ie, the input value).

I do have a fairly strong dislike for the fact that field(build = "...") embeds the field name - it's consistent with how the crate previously used blocks in attribute arguments, but this could potentially have been avoided if that took a path and called that function on the field rather than having the caller specify the field name.

field(build = ) lets one make an output field out of multiple input fields. ISTM that there are all sorts of reasons why that might be useful, (particularly in cases where one is trying to do some kind of backward compatibilty). It's the fully general escape hatch which one ought only to use if absolutely necessary.


Anyway, thanks for your consideration. FYI, currently arti is using a git dependency for derive_builder. We can't release in that state, so it is possible that we may need to upload a fork of derive_builder to crates.io. My colleagues are not worried about the implications of having a pet fork of derive_builder, but I would very much like to avoid it.

@TedDriggs
Copy link
Collaborator

Having looked at the linked file and slept on this for a few days, I'm still not feeling that this has enough general value to merit the API expansion. While I'm not as close to the arti code as you are, I think in general it's better for builders to be simple, and to leverage each setters, or try_ setters if needed for intermediate configuration build-up.


Anyway, thanks for your consideration. FYI, currently arti is using a git dependency for derive_builder. We can't release in that state, so it is possible that we may need to upload a fork of derive_builder to crates.io. My colleagues are not worried about the implications of having a pet fork of derive_builder, but I would very much like to avoid it.

If you go this route, please name it something like arti-builder, and replace the README and crate-level docs with clear messaging that it's not the official builder derivation.


I'm going to leave this issue open in case others hit similar use-cases.

@ijackson
Copy link
Contributor Author

ijackson commented May 4, 2022

OK, thanks. (Your suggestion to use try_ setters wouldn't work for us because we need the builder to be deserialize, so the builder has to contain sub-builders, not validated structs.)

I haven't spoken to my colleages, but my usual practice is to name forks "originalname-fork-ourproject". That seems a widely adopted convention in the ecosystem,. so in this case that would be something like "derive-builder-fork-arti". Would that be OK with you?

And yes of course I'll adjust the docs.

@lcmgh
Copy link

lcmgh commented Jun 8, 2022

I'd also love to see examples in the doc on how config-rs can be combined with builder patterns / common builder libs.

@cowlicks
Copy link

I recently started using derive_builder for the first time. My usecase is that I have a top level struct that is built from other structs nested 3 levels deep which I've used derive_builder to generate builders for. There are very few attributes that can't actually be set to defaults in most cases. So I went looking for something like what @ijackson is describing. It seems think that having "Nested Builders" would be a natural way to make API's like mine much more ergonomic for users.

My existing code looks like:

    ClientBuilder::default()
        .api_url("my_api_url"),
        .auth_handler(
            AuthHandlerBuilder::default()
                .jwt_factory(
                    JwtFactoryBuilder::default()
                        .config(
                            ConfigBuilder::default()
                                .jwt_audience("jwt audience")
                                .build()?,
                        )
                        .encoding_key_from_bytes("my encoding key".as_bytes())
                        .build()?,
                )
                .build()?
        )
    );

This is what I'd like it to look like:

    let mut cb = ClientBuilder::default();                      // ClientBuilder

    cb.api_url("my_api_url")                                    // ClientBuilder
        .auth_handler_builder()                                 // AuthHandlerBuilder
        .jwt_factory_builder()                                  // JwtFactoryBuilder
        .encoding_key_from_bytes("my encoding key".as_bytes())  // JwtFactoryBuilder
        .config_builder()                                       // ConfigBuilder
        .jwt_audience("jwt audience");                          // ConfigBuilder

    cb.build();                                                 // Client

@ijackson
Copy link
Contributor Author

@cowlicks: This is roughly what we have with derive-builder-fork-arti, which contains the code from #253 and which we're using in Arti. (I'm not sure itt looks quite like your example - but it's closer.)

The docs there for the feature have an example.

If others had interest in this, I could be persuaded to perhaps make more regular releases of the fork.

@TedDriggs
Copy link
Collaborator

@ijackson I would consider download/star/dependent crate volume for the fork as evidence of a need for this in the main builder package, so if there appears to be substantial demand we can reopen this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants