Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace tera template with serde_starlark #1734

Merged
merged 3 commits into from
Dec 31, 2022

Conversation

dtolnay
Copy link
Contributor

@dtolnay dtolnay commented Dec 29, 2022

This PR is a proposal to move crate_universe away from using tera / jinja2 templates for BUILD files, to using serde + a serde data format for serializing Rust data structures directly to well-formatted Starlark. This approach was developed originally for use in https://github.com/facebookincubator/reindeer and has been used there happily for >3 years.

In this PR, I've started with changing only the generation of BUILD.bazel (i.e. not yet touching BUILD.$crate-$version.bazel). If, based on this PR the approach seems like something the maintainers would be on board with, then I can commit to updating BUILD.$crate-$version.bazel to be generated using serde also, in a following PR.

Overview

To illustrate the basic idea, generating something like the following target:

alias(
    name = "serde_starlark",
    actual = "@cui__serde_starlark-0.1.5//:serde_starlark",
    tags = ["manual"],
)

involves defining a Rust data structure that represents the content of the Starlark call:

#[derive(Serialize)]
#[serde(rename = "alias")]
pub struct Alias {
    pub name: String,
    pub actual: String,
    pub tags: Set<String>,
}

instantiating it:

let alias = Alias {
    name: "serde_starlark".to_owned(),
    actual: "@cui__serde_starlark-0.1.5//:serde_starlark".to_owned(),
    tags: Set::from(["manual".to_owned()]),
};

and then just serializing it:

let starlark = serde_starlark::to_string(&alias)?; // String

Motivation

  1. Just write plain old Rust code. Comparing the implementation before and after this PR, consider this part that deals with creating targets for all the bins:

    {%- set crate = context.crates | get(key=id) %}
    {%- for rule in crate.targets %}
    {%- for rule_type, target in rule %}
    {%- if rule_type in ["Binary"] %}
    alias(
    ...
    {%- endif %}
    {%- endfor %}
    {%- endfor %}

    This isn't Rust code; it's a separate DSL that someone needs to learn. In it, all the conveniences of working with the underlying data structure natively in Rust are gone, like unwrap_or or iterator combinators. Enums and pattern matching / if let are gone. Indentation doesn't work; we can't use it because indenting the body of the loops would end up visible in the output and we don't want that. IDE support is gone because everything is just dynamically typed. In general all compile-time type safety is gone: you need to run the template to find out whether there are basic mistakes like mistyped variables or invalid jinja2 template syntax.

    The equivalent new code:

    let krate = &context.crates[crate_id];
    for rule in &krate.targets {
        if let Rule::Binary(bin) = rule {
            binaries.push(Alias {
                ...
            });
        }
    }

    It's basic obvious Rust control flow that anyone could write.

  2. Generate good-looking, easily reviewable output. Look at crate_universe/3rdparty/crates/BUILD.syn-1.0.105.bazel. The tera-based renderer's output is full of meaningless noise like:

    proc_macro_deps = [
    ] + select({
        "//conditions:default": [
        ],
    }),
    rustc_env = {
    },
    rustc_env_files = select_with_or({
        "//conditions:default": [
        ],
    }),

    because it's simply too onerous to do something friendlier given the limited programming environment of these templates.

    Reindeer's output is much cleaner and compact. Simple things like #[serde(skip_serializing_if = "Foo::is_empty")] go a long way. Simple things like "don't include a # Binaries comment if there are no binaries targets being generated" are so easy in Rust and pretty much hopeless in the templates.

Copy link
Collaborator

@UebelAndre UebelAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing crate! You're a legend!

Would serde_starlark be capable of replacing all uses of tera? Any blockers in doing so?

crate_universe/src/context.rs Show resolved Hide resolved

#[derive(Serialize)]
#[serde(rename = "glob")]
pub struct Glob(pub Set<String>);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the content here seems to be simpler versions of things in crate::utils::starlark. Any reason not to move it to the same module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dtolnay
Copy link
Contributor Author

dtolnay commented Dec 30, 2022

Would serde_starlark be capable of replacing all uses of tera?

Yes.

Old clippy:

    warning: unknown lint: `clippy::overly_complex_bool_expr`
      --> src/utils/starlark/glob.rs:23:13
       |
    23 |     #[allow(clippy::overly_complex_bool_expr)]
       |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       |
       = note: `#[warn(unknown_lints)]` on by default

    error: this boolean expression contains a logic bug
      --> src/utils/starlark/glob.rs:28:12
       |
    28 |         if false && self.exclude.is_empty() {
       |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: it would look like the following: `false`
       |
       = note: `#[deny(clippy::logic_bug)]` on by default
    help: this expression can be optimized out by applying boolean operations to the outer expression
      --> src/utils/starlark/glob.rs:28:21
       |
    28 |         if false && self.exclude.is_empty() {
       |                     ^^^^^^^^^^^^^^^^^^^^^^^
       = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#logic_bug

New clippy:

    warning: lint `clippy::logic_bug` has been renamed to `clippy::overly_complex_bool_expr`
      --> src/utils/starlark/glob.rs:24:13
       |
    24 |     #[allow(clippy::logic_bug)] // clippy 1.64 and older
       |             ^^^^^^^^^^^^^^^^^ help: use the new name: `clippy::overly_complex_bool_expr`
       |
       = note: `#[warn(renamed_and_removed_lints)]` on by default
Copy link
Collaborator

@UebelAndre UebelAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much! Hopefully we can migrate everything to this crate in the near future 😄

@UebelAndre UebelAndre merged commit 31073ff into bazelbuild:main Dec 31, 2022
@dtolnay dtolnay deleted the serdestarlark branch December 31, 2022 22:51
illicitonion pushed a commit that referenced this pull request Jan 4, 2023
Context: see #1734.

This PR converts all of `crate_build_file.j2`, `partials/crate/*.j2` and `partials/starlark/*.j2` to serde_starlark.

### Remarks

- I kept the output of `bazel run //crate_universe/3rdparty:crates_vendor` out of this PR to keep GitHub's code review interface usable. You can find the output generated code in 44c264a.

- I tried to keep the conversion 1-to-1 to the largest extent possible. As an example, build_script.j2 duplicated most of the content of common_attrs.j2 even though it probably could have used that template via include, and I have mirrored the same here by not using `#[serde(flatten)] common: CommonAttrs` inside of `CargoBuildScript`. There is probably plenty of opportunity for cleaning up things in followup PRs.

- I came across several bugs in the existing code. Again, I mostly stayed away from these to keep the PR bounded in scope. For example the fact that `rustc_flags` is stored as a BTreeSet\<String\> is not correct; compiler flags are ordered and sorting them is not correct. For example `-C debuginfo=2`...

    https://github.com/bazelbuild/rules_rust/blob/532e60ff0ee3ac318333b3ae05392c220b17ce28/crate_universe/src/context/crate_context.rs#L178

- One bug that I fixed because in the new implementation it was easier to fix than to not fix, is that all existing uses of selectable_list.j2 and selectable_dict.j2 were broken because they all had the same failure mode as #1417. The generated code using `selects.with_or` could end up generating a map with duplicate keys if multiple distinct Cargo cfg expressions mapped to the same subset of supported platform triples. My implementation follows the same approach as #1647, but that PR only applied it to `deps` and `aliases`, not e.g. `rustc_env` and everything else.

- Anecdotally the new implementation has strong "if it compiles, it works" property. I mostly typed out the implementation of this PR top to bottom without running `crates_vendor`, and as soon as the whole thing was written and the compiler was okay with it, it was generating the output I wanted. I have not done much work in the templates in the past but I do not imagine they have this property.

### Remaining work

- `select_with_or` in private/selects.bzl is no longer used by the code generator's output. I had to keep it in this PR because all the generated BUILD files still include calls to it. It can be deleted after all those are regenerated.

- One of the tera templates still uses the old serialization of `SelectList` so the starlark serialization for it isn't a normal `Serialize` impl but instead an associated function. This causes there to be a bunch of `serialize_with = "SelectList::serialize_starlark"` attributes needed. This can be cleaned up after the last tera template is converted.

- ~~A lot of the contents of `crate::context` can be cleaned up after they're no longer used by tera. Tera needs those data structures to be serializable, whereas with serde_starlark we're not serializing anything from `crate::context`, only from `crate::utils::starlark`. For now I've just added a bunch of `serde(skip)` attributes in `crate::context` on all the parts tera is no longer using.~~ *Edit: apparently we still need to serialize all the fields in `Context` so that they end up in cargo-bazel-lock.json.*

- There are 2 remaining tera templates after this PR, module_bzl.j2 and vendor_module.j2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants