Huge LLVM line count, quadratic compile time for derive(JsonSchema) #246

adamchalmers · 2023-09-13T00:41:37Z

Hi there!

Firstly, thanks for this library. It's really helped a lot of the Rust web ecosystem.

I've been using derive(schemars::JsonSchema) on large enums for a while. By "large" I mean 200 or 300 enum variants, e.g. an enum CountryCode with variants for each ISO-3166 country code like US, AU, CN etc.

See here for an example enum (250 variants) which derives JsonSchema.

When I do this, I've noticed that the impl JsonSchema outputs three orders of magnitude more LLVM takes up 99% of my codebase's LLVM lines -- it really outputs a lot of LLVM.

  Lines                 Copies            Function name
  -----                 ------            -------------
  117114                18                (TOTAL)
  115921 (99.0%, 99.0%)  1 (5.6%,  5.6%)  playground::_::<impl schemars::JsonSchema for playground::CountryCode>::json_schema
     318 (0.3%, 99.3%)   1 (5.6%, 11.1%)  alloc::alloc::Global::alloc_impl
     171 (0.1%, 99.4%)   1 (5.6%, 16.7%)  <schemars::schema::SchemaObject as core::default::Default>::default
     164 (0.1%, 99.5%)   2 (11.1%, 27.8%) <alloc::boxed::Box<T,A> as core::ops::drop::Drop>::drop
     155 (0.1%, 99.7%)   1 (5.6%, 33.3%)  alloc::slice::hack::into_vec
      99 (0.1%, 99.8%)   1 (5.6%, 38.9%)  <schemars::schema::SubschemaValidation as core::default::Default>::default
      80 (0.1%, 99.8%)   1 (5.6%, 44.4%)  <schemars::schema::Metadata as core::default::Default>::default
      65 (0.1%, 99.9%)   1 (5.6%, 50.0%)  alloc::alloc::exchange_malloc
      56 (0.0%, 99.9%)   1 (5.6%, 55.6%)  <alloc::alloc::Global as core::alloc::Allocator>::deallocate
      25 (0.0%, 99.9%)   1 (5.6%, 61.1%)  alloc::boxed::Box<T>::new
      25 (0.0%,100.0%)   1 (5.6%, 66.7%)  alloc::str::<impl alloc::borrow::ToOwned for str>::to_owned
       8 (0.0%,100.0%)   1 (5.6%, 72.2%)  <T as core::convert::Into<U>>::into
       8 (0.0%,100.0%)   1 (5.6%, 77.8%)  <playground::_::<impl serde::de::Deserialize for playground::CountryCode>::deserialize::__FieldVisitor as serde::de::Visitor>::expecting
       8 (0.0%,100.0%)   1 (5.6%, 83.3%)  <playground::_::<impl serde::de::Deserialize for playground::CountryCode>::deserialize::__Visitor as serde::de::Visitor>::expecting
       8 (0.0%,100.0%)   1 (5.6%, 88.9%)  alloc::slice::<impl [T]>::into_vec
       2 (0.0%,100.0%)   1 (5.6%, 94.4%)  playground::_::<impl schemars::JsonSchema for playground::CountryCode>::schema_name
       1 (0.0%,100.0%)   1 (5.6%,100.0%)  <bool as core::default::Default>::default

The good news is, the derive(JsonSchema) macro outputs LLVM lines linear with the number of enum variants. So there's no hidden exponential or quadratic behaviour in the macro. Unfortunately, according to @jyn514, llvm optimizing is quadratic in number of lines in a function. So the derive(JsonSchema) outputs a huge number of LLVM lines, and it takes quadratic time to compile them. This means compiling the example I linked above takes an insane amount of time!

Luckily this behaviour only manifests in release mode. Debug builds are very quick!

I'm not familiar with LLVM and so I'm not really sure why JsonSchema derive expands to such a huge amount of LLVM lines. By comparison, the serde derives output 5 orders of magnitude less LLVM lines.

I guess you could view this as a problem with derive(JsonSchema) (that it outputs so much LLVM) or with LLVM (that it should not take quadratic time to compile in release builds). But we can probably fix schemars easier than LLVM.

Suggestions to fix:

Hacky workaround: break the large fn json_schema generated function into several smaller functions, to avoid the quadratic-in-lines-of-code behaviour from LLVM. Many small functions should be faster to compile than one large function.
Proper fix: figure out why fn json_schema generated function compiles into so many LLVM lines.

The text was updated successfully, but these errors were encountered:

jyn514 · 2023-09-13T01:21:51Z

I'm not familiar with LLVM and so I'm not really sure why JsonSchema derive expands to such a huge amount of LLVM lines. By comparison, the serde derives output 5 orders of magnitude less LLVM lines.

it would be interesting to see the amount of generated MIR for each - the expansion you showed me had a lot of calls to ..SchemaObject::default and i wonder if it's generating a new assignment for every field in SchemaObject

you can use -Z unpretty=mir to see what the MIR is before LLVM lowering

adamchalmers · 2023-09-21T16:36:09Z

@jyn514 How do I use that flag? I've tried

cargo +nightly -Z unpretty=mir build
cargo +nightly build -Z unpretty=mir

and other combinations but it always just says "unknown -Z flag specified: unpretty"

jyn514 · 2023-09-21T16:53:44Z

@adamchalmers it's a rustc flag - try something like cargo +nightly rustc -- -Z unpretty=mir

adamchalmers · 2023-10-21T18:04:39Z

Thanks, here's the expansion.

The majority of MIR is made up of code like this (this pattern occurs 64744 times):

    bb64701 (cleanup): {
        drop(_300) -> [return: bb64702, unwind terminate(cleanup)];
    }

    bb64702 (cleanup): {
        drop(_278) -> [return: bb64703, unwind terminate(cleanup)];
    }

    bb64703 (cleanup): {
        drop(_256) -> [return: bb64704, unwind terminate(cleanup)];
    }

This takes up the vast majority of lines.

This reduces size of MIR output, which should somewhat mitigate #246

GREsau · 2023-11-11T20:40:13Z

Could you try schemars 0.8.16? It contains a small change that puts a temporary value in a variable instead of passing it directly as an argument to a function - when I tested it locally with your CountryCode example, this change reduced MIR output size by ~30%

I'm sure many further improvements could be made, but it seemed worth getting a quick minimal improvement out for now

adamchalmers · 2023-11-15T04:20:38Z

Thanks very much for that improvement -- now compiling kittycad takes 57 seconds down from 90 seconds, a big improvement!

If it's OK with you I'm going to keep this issue open so we can discuss further improvements -- I really appreciate the dramatic improvement so far!

saethlin · 2023-11-29T01:56:33Z

I don't know if this is already well-known, but I was reading Adam's great blog post about this situation: https://blog.adamchalmers.com/crazy-compile-time/ and I'm pretty sure that the compile time here would be effectively linear if the derive macro used a loop to build the array of variants.

Currently this code:

#[derive(schemars::JsonSchema, serde::Deserialize, serde::Serialize)]
pub enum CountryCode {
    #[serde(rename = "AF")]
    Af,
    #[serde(rename = "AX")]
    Ax
}

Expands to (I'm sure that's actually a vec!):

fn json_schema(
    gen: &mut schemars::gen::SchemaGenerator,
) -> schemars::schema::Schema {
    schemars::schema::Schema::Object(schemars::schema::SchemaObject {
        instance_type: Some(schemars::schema::InstanceType::String.into()),
        enum_values: Some(
            <[_]>::into_vec(
                #[rustc_box]
                ::alloc::boxed::Box::new(["AF".into(), "AX".into()]),
            ),  
        ),  
        ..Default::default()
    })  
}

But I'm suggesting that it expand to something like this:

fn json_schema(
    gen: &mut schemars::gen::SchemaGenerator,
) -> schemars::schema::Schema {
    schemars::schema::Schema::Object(schemars::schema::SchemaObject {
        instance_type: Some(schemars::schema::InstanceType::String.into()),
        enum_values: Some(
            ["AF", "AX"].into_iter().map(|v| v.into()).collect()
        ),  
        ..Default::default()
    })  
}

I know that's much easier to write in surface Rust than to make happen in a macro.

adamchalmers · 2024-05-07T21:04:24Z

On my real-world project (i.e. https://github.com/KittyCAD/kittycad.rs/), compile time is VASTLY improved!

0.8.19
real	36.21s
user	205.51s
sys	10.14s
maxmem  1,134,992k

0.8.17
real	68.42s
user	239.05s
sys     9.93s
maxmem	1,246,608k

Thank you so much @icewind1991 and @GREsau.

adamchalmers changed the title ~~Exponential LLVM line count in derive(JsonSchema) on enums~~ Huge LLVM line count, quadratic compile time for derive(JsonSchema) Sep 13, 2023

GREsau added a commit that referenced this issue Nov 11, 2023

Put schema value into a variable before calling apply_metadata

ae9544a

This reduces size of MIR output, which should somewhat mitigate #246

icewind1991 mentioned this issue Jan 31, 2024

simplify the code generated for unit, externally tagged and internally tagged enums #266

Closed

valkum mentioned this issue Feb 5, 2024

Optimize object proc-macro codegen async-graphql/async-graphql#1470

Merged

GREsau mentioned this issue May 6, 2024

Simplify generated enum code #286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge LLVM line count, quadratic compile time for derive(JsonSchema) #246

Huge LLVM line count, quadratic compile time for derive(JsonSchema) #246

adamchalmers commented Sep 13, 2023 •

edited

Loading

jyn514 commented Sep 13, 2023 •

edited

Loading

adamchalmers commented Sep 21, 2023

jyn514 commented Sep 21, 2023

adamchalmers commented Oct 21, 2023 •

edited

Loading

GREsau commented Nov 11, 2023

adamchalmers commented Nov 15, 2023

saethlin commented Nov 29, 2023

adamchalmers commented May 7, 2024 •

edited

Loading

Huge LLVM line count, quadratic compile time for derive(JsonSchema) #246

Huge LLVM line count, quadratic compile time for derive(JsonSchema) #246

Comments

adamchalmers commented Sep 13, 2023 • edited Loading

jyn514 commented Sep 13, 2023 • edited Loading

adamchalmers commented Sep 21, 2023

jyn514 commented Sep 21, 2023

adamchalmers commented Oct 21, 2023 • edited Loading

GREsau commented Nov 11, 2023

adamchalmers commented Nov 15, 2023

saethlin commented Nov 29, 2023

adamchalmers commented May 7, 2024 • edited Loading

adamchalmers commented Sep 13, 2023 •

edited

Loading

jyn514 commented Sep 13, 2023 •

edited

Loading

adamchalmers commented Oct 21, 2023 •

edited

Loading

adamchalmers commented May 7, 2024 •

edited

Loading