-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge LLVM line count, quadratic compile time for derive(JsonSchema) #246
Comments
it would be interesting to see the amount of generated MIR for each - the expansion you showed me had a lot of calls to ..SchemaObject::default and i wonder if it's generating a new assignment for every field in SchemaObject you can use -Z unpretty=mir to see what the MIR is before LLVM lowering |
@jyn514 How do I use that flag? I've tried
and other combinations but it always just says "unknown -Z flag specified: unpretty" |
@adamchalmers it's a rustc flag - try something like |
Thanks, here's the expansion. The majority of MIR is made up of code like this (this pattern occurs 64744 times):
This takes up the vast majority of lines. |
This reduces size of MIR output, which should somewhat mitigate #246
Could you try schemars 0.8.16? It contains a small change that puts a temporary value in a variable instead of passing it directly as an argument to a function - when I tested it locally with your I'm sure many further improvements could be made, but it seemed worth getting a quick minimal improvement out for now |
Thanks very much for that improvement -- now compiling kittycad takes 57 seconds down from 90 seconds, a big improvement! If it's OK with you I'm going to keep this issue open so we can discuss further improvements -- I really appreciate the dramatic improvement so far! |
I don't know if this is already well-known, but I was reading Adam's great blog post about this situation: https://blog.adamchalmers.com/crazy-compile-time/ and I'm pretty sure that the compile time here would be effectively linear if the derive macro used a loop to build the array of variants. Currently this code: #[derive(schemars::JsonSchema, serde::Deserialize, serde::Serialize)]
pub enum CountryCode {
#[serde(rename = "AF")]
Af,
#[serde(rename = "AX")]
Ax
} Expands to (I'm sure that's actually a fn json_schema(
gen: &mut schemars::gen::SchemaGenerator,
) -> schemars::schema::Schema {
schemars::schema::Schema::Object(schemars::schema::SchemaObject {
instance_type: Some(schemars::schema::InstanceType::String.into()),
enum_values: Some(
<[_]>::into_vec(
#[rustc_box]
::alloc::boxed::Box::new(["AF".into(), "AX".into()]),
),
),
..Default::default()
})
} But I'm suggesting that it expand to something like this: fn json_schema(
gen: &mut schemars::gen::SchemaGenerator,
) -> schemars::schema::Schema {
schemars::schema::Schema::Object(schemars::schema::SchemaObject {
instance_type: Some(schemars::schema::InstanceType::String.into()),
enum_values: Some(
["AF", "AX"].into_iter().map(|v| v.into()).collect()
),
..Default::default()
})
} I know that's much easier to write in surface Rust than to make happen in a macro. |
On my real-world project (i.e. https://github.com/KittyCAD/kittycad.rs/), compile time is VASTLY improved!
Thank you so much @icewind1991 and @GREsau. |
Hi there!
Firstly, thanks for this library. It's really helped a lot of the Rust web ecosystem.
I've been using
derive(schemars::JsonSchema)
on large enums for a while. By "large" I mean 200 or 300 enum variants, e.g. anenum CountryCode
with variants for each ISO-3166 country code like US, AU, CN etc.See here for an example enum (250 variants) which derives JsonSchema.
When I do this, I've noticed that the
impl JsonSchema
outputs three orders of magnitude more LLVM takes up 99% of my codebase's LLVM lines -- it really outputs a lot of LLVM.The good news is, the
derive(JsonSchema)
macro outputs LLVM lines linear with the number of enum variants. So there's no hidden exponential or quadratic behaviour in the macro. Unfortunately, according to @jyn514, llvm optimizing is quadratic in number of lines in a function. So thederive(JsonSchema)
outputs a huge number of LLVM lines, and it takes quadratic time to compile them. This means compiling the example I linked above takes an insane amount of time!Luckily this behaviour only manifests in release mode. Debug builds are very quick!
I'm not familiar with LLVM and so I'm not really sure why JsonSchema derive expands to such a huge amount of LLVM lines. By comparison, the serde derives output 5 orders of magnitude less LLVM lines.
I guess you could view this as a problem with
derive(JsonSchema)
(that it outputs so much LLVM) or with LLVM (that it should not take quadratic time to compile in release builds). But we can probably fix schemars easier than LLVM.Suggestions to fix:
fn json_schema
generated function into several smaller functions, to avoid the quadratic-in-lines-of-code behaviour from LLVM. Many small functions should be faster to compile than one large function.fn json_schema
generated function compiles into so many LLVM lines.The text was updated successfully, but these errors were encountered: