Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge LLVM line count, quadratic compile time for derive(JsonSchema) #246

Open
adamchalmers opened this issue Sep 13, 2023 · 8 comments
Open

Comments

@adamchalmers
Copy link
Contributor

adamchalmers commented Sep 13, 2023

Hi there!

Firstly, thanks for this library. It's really helped a lot of the Rust web ecosystem.

I've been using derive(schemars::JsonSchema) on large enums for a while. By "large" I mean 200 or 300 enum variants, e.g. an enum CountryCode with variants for each ISO-3166 country code like US, AU, CN etc.

See here for an example enum (250 variants) which derives JsonSchema.

When I do this, I've noticed that the impl JsonSchema outputs three orders of magnitude more LLVM takes up 99% of my codebase's LLVM lines -- it really outputs a lot of LLVM.

  Lines                 Copies            Function name
  -----                 ------            -------------
  117114                18                (TOTAL)
  115921 (99.0%, 99.0%)  1 (5.6%,  5.6%)  playground::_::<impl schemars::JsonSchema for playground::CountryCode>::json_schema
     318 (0.3%, 99.3%)   1 (5.6%, 11.1%)  alloc::alloc::Global::alloc_impl
     171 (0.1%, 99.4%)   1 (5.6%, 16.7%)  <schemars::schema::SchemaObject as core::default::Default>::default
     164 (0.1%, 99.5%)   2 (11.1%, 27.8%) <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
     155 (0.1%, 99.7%)   1 (5.6%, 33.3%)  alloc::slice::hack::into_vec
      99 (0.1%, 99.8%)   1 (5.6%, 38.9%)  <schemars::schema::SubschemaValidation as core::default::Default>::default
      80 (0.1%, 99.8%)   1 (5.6%, 44.4%)  <schemars::schema::Metadata as core::default::Default>::default
      65 (0.1%, 99.9%)   1 (5.6%, 50.0%)  alloc::alloc::exchange_malloc
      56 (0.0%, 99.9%)   1 (5.6%, 55.6%)  <alloc::alloc::Global as core::alloc::Allocator>::deallocate
      25 (0.0%, 99.9%)   1 (5.6%, 61.1%)  alloc::boxed::Box<T>::new
      25 (0.0%,100.0%)   1 (5.6%, 66.7%)  alloc::str::<impl alloc::borrow::ToOwned for str>::to_owned
       8 (0.0%,100.0%)   1 (5.6%, 72.2%)  <T as core::convert::Into<U>>::into
       8 (0.0%,100.0%)   1 (5.6%, 77.8%)  <playground::_::<impl serde::de::Deserialize for playground::CountryCode>::deserialize::__FieldVisitor as serde::de::Visitor>::expecting
       8 (0.0%,100.0%)   1 (5.6%, 83.3%)  <playground::_::<impl serde::de::Deserialize for playground::CountryCode>::deserialize::__Visitor as serde::de::Visitor>::expecting
       8 (0.0%,100.0%)   1 (5.6%, 88.9%)  alloc::slice::<impl [T]>::into_vec
       2 (0.0%,100.0%)   1 (5.6%, 94.4%)  playground::_::<impl schemars::JsonSchema for playground::CountryCode>::schema_name
       1 (0.0%,100.0%)   1 (5.6%,100.0%)  <bool as core::default::Default>::default

The good news is, the derive(JsonSchema) macro outputs LLVM lines linear with the number of enum variants. So there's no hidden exponential or quadratic behaviour in the macro. Unfortunately, according to @jyn514, llvm optimizing is quadratic in number of lines in a function. So the derive(JsonSchema) outputs a huge number of LLVM lines, and it takes quadratic time to compile them. This means compiling the example I linked above takes an insane amount of time!

Screenshot 2023-09-12 at 6 38 29 PM

Luckily this behaviour only manifests in release mode. Debug builds are very quick!

I'm not familiar with LLVM and so I'm not really sure why JsonSchema derive expands to such a huge amount of LLVM lines. By comparison, the serde derives output 5 orders of magnitude less LLVM lines.

I guess you could view this as a problem with derive(JsonSchema) (that it outputs so much LLVM) or with LLVM (that it should not take quadratic time to compile in release builds). But we can probably fix schemars easier than LLVM.

Suggestions to fix:

  • Hacky workaround: break the large fn json_schema generated function into several smaller functions, to avoid the quadratic-in-lines-of-code behaviour from LLVM. Many small functions should be faster to compile than one large function.
  • Proper fix: figure out why fn json_schema generated function compiles into so many LLVM lines.
@adamchalmers adamchalmers changed the title Exponential LLVM line count in derive(JsonSchema) on enums Huge LLVM line count, quadratic compile time for derive(JsonSchema) Sep 13, 2023
@jyn514
Copy link

jyn514 commented Sep 13, 2023

I'm not familiar with LLVM and so I'm not really sure why JsonSchema derive expands to such a huge amount of LLVM lines. By comparison, the serde derives output 5 orders of magnitude less LLVM lines.

it would be interesting to see the amount of generated MIR for each - the expansion you showed me had a lot of calls to ..SchemaObject::default and i wonder if it's generating a new assignment for every field in SchemaObject

you can use -Z unpretty=mir to see what the MIR is before LLVM lowering

@adamchalmers
Copy link
Contributor Author

@jyn514 How do I use that flag? I've tried

cargo +nightly -Z unpretty=mir build
cargo +nightly build -Z unpretty=mir

and other combinations but it always just says "unknown -Z flag specified: unpretty"

@jyn514
Copy link

jyn514 commented Sep 21, 2023

@adamchalmers it's a rustc flag - try something like cargo +nightly rustc -- -Z unpretty=mir

@adamchalmers
Copy link
Contributor Author

adamchalmers commented Oct 21, 2023

Thanks, here's the expansion.

The majority of MIR is made up of code like this (this pattern occurs 64744 times):

    bb64701 (cleanup): {
        drop(_300) -> [return: bb64702, unwind terminate(cleanup)];
    }

    bb64702 (cleanup): {
        drop(_278) -> [return: bb64703, unwind terminate(cleanup)];
    }

    bb64703 (cleanup): {
        drop(_256) -> [return: bb64704, unwind terminate(cleanup)];
    }

This takes up the vast majority of lines.

GREsau added a commit that referenced this issue Nov 11, 2023
This reduces size of MIR output, which should somewhat mitigate #246
@GREsau
Copy link
Owner

GREsau commented Nov 11, 2023

Could you try schemars 0.8.16? It contains a small change that puts a temporary value in a variable instead of passing it directly as an argument to a function - when I tested it locally with your CountryCode example, this change reduced MIR output size by ~30%

I'm sure many further improvements could be made, but it seemed worth getting a quick minimal improvement out for now

@adamchalmers
Copy link
Contributor Author

Thanks very much for that improvement -- now compiling kittycad takes 57 seconds down from 90 seconds, a big improvement!

If it's OK with you I'm going to keep this issue open so we can discuss further improvements -- I really appreciate the dramatic improvement so far!

@saethlin
Copy link

I don't know if this is already well-known, but I was reading Adam's great blog post about this situation: https://blog.adamchalmers.com/crazy-compile-time/ and I'm pretty sure that the compile time here would be effectively linear if the derive macro used a loop to build the array of variants.

Currently this code:

#[derive(schemars::JsonSchema, serde::Deserialize, serde::Serialize)]
pub enum CountryCode {
    #[serde(rename = "AF")]
    Af,
    #[serde(rename = "AX")]
    Ax
}

Expands to (I'm sure that's actually a vec!):

fn json_schema(
    gen: &mut schemars::gen::SchemaGenerator,
) -> schemars::schema::Schema {
    schemars::schema::Schema::Object(schemars::schema::SchemaObject {
        instance_type: Some(schemars::schema::InstanceType::String.into()),
        enum_values: Some(
            <[_]>::into_vec(
                #[rustc_box]
                ::alloc::boxed::Box::new(["AF".into(), "AX".into()]),
            ),  
        ),  
        ..Default::default()
    })  
}   

But I'm suggesting that it expand to something like this:

fn json_schema(
    gen: &mut schemars::gen::SchemaGenerator,
) -> schemars::schema::Schema {
    schemars::schema::Schema::Object(schemars::schema::SchemaObject {
        instance_type: Some(schemars::schema::InstanceType::String.into()),
        enum_values: Some(
            ["AF", "AX"].into_iter().map(|v| v.into()).collect()
        ),  
        ..Default::default()
    })  
}

I know that's much easier to write in surface Rust than to make happen in a macro.

@adamchalmers
Copy link
Contributor Author

adamchalmers commented May 7, 2024

On my real-world project (i.e. https://github.com/KittyCAD/kittycad.rs/), compile time is VASTLY improved!

0.8.19
real	36.21s
user	205.51s
sys	10.14s
maxmem  1,134,992k

0.8.17
real	68.42s
user	239.05s
sys     9.93s
maxmem	1,246,608k

Thank you so much @icewind1991 and @GREsau.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants