Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow compile and cargo check #783

Open
clemens-msupply opened this issue Jan 12, 2022 · 32 comments
Open

Slow compile and cargo check #783

clemens-msupply opened this issue Jan 12, 2022 · 32 comments
Labels
question Further information is requested

Comments

@clemens-msupply
Copy link

We have a relatively small graphql schema but it takes a long time for cargo check to complete (~20s). Is this a common problem or is there any known issue that can causes the long check times? Happy to provide more details if needed.

Sources are at: https://github.com/openmsupply/remote-server/tree/develop/graphql/src/schema

@clemens-msupply clemens-msupply added the question Further information is requested label Jan 12, 2022
@sunli829
Copy link
Collaborator

Really quite slow, very strange. 😂

@sunli829
Copy link
Collaborator

AFAIK rustc can be very slow in some cases, but I don't know exactly why.

@clemens-msupply
Copy link
Author

thanks I will try to narrow done which endpoint/feature makes it slow

@sunli829
Copy link
Collaborator

sunli829 commented Jan 13, 2022

On my computer:

rustc 1.57.0 (f1edd0429 2021-11-29)

first time

Finished dev [unoptimized + debuginfo] target(s) in 46.65s

second time

Finished dev [unoptimized + debuginfo] target(s) in 0.30s

change the code

Finished dev [unoptimized + debuginfo] target(s) in 6.49s

That doesn't seem to be slow?

@clemens-msupply
Copy link
Author

Thanks for looking into it.

I just tried cargo check again and I get similar numbers on my machine (not sure if it takes longer if there are more errors). However, I noticed that rust analyzer in VSCode seem to take longer than just doing a cargo check on the command line; it seems like it goes through some other crates as well, not sure why...

~6s is quite a long time for a type check though... Developing graphql becomes a bit of a pain with these long check times :(

I tried some profiling as described here: https://fasterthanli.me/articles/why-is-my-rust-build-so-slow
but I couldn't find anything that jumped out. It just spends a lot of time doing the type checking... Is async graphql generating much code/types? or does this looks odd?

image

@sunli829
Copy link
Collaborator

Yes, Async-graphql's macro generates a lot of code.
Also, the development experience of Clion+Rust plugin is much better than VSCode+RA.

@clemens-msupply
Copy link
Author

Do you think it could help to split an endpoint over multiple mods/crates and then use #[derive(MergedObject, Default)] to put them all together? (is it actually possible to use an Object from a different crate?)

@sunli829
Copy link
Collaborator

sunli829 commented Jan 17, 2022

I don't think it makes much sense to do this, in fact I never bothered with this problem, usually I only check it once with cargo clippy after I have written the code and tested it.

@sunli829
Copy link
Collaborator

I think this problem is caused by some bug in the compiler, not because of too much code generated by the macro.

@clemens-msupply
Copy link
Author

clemens-msupply commented Jan 19, 2022

I really hope this is the case :-) do you have any special bug in mind?

A small update: I just added a small endpoint and after fixing all syntax errors cargo check took ~25s which is closer to the time I mentioned earlier. When just touching a file cargo check only takes ~6s.

@sunli829
Copy link
Collaborator

sunli829 commented Jan 19, 2022

I've been trying to figure out which code block is caused check to be slow by removing code.

@sunli829
Copy link
Collaborator

Split large crates into smaller pieces.

This should reduce compilation time, try to split Query and Mutation first.

Tips for addressing compile times:

Look for slow dependencies.
Check if they have features that you may wish to consider disabling.
Consider trying to remove the dependency completely.
Look for a crate being built multiple times with different versions. Try to remove the older versions from the dependency graph.
Split large crates into smaller pieces.
If there are a large number of crates bottlenecked on a single crate, focus your attention on improving that one crate to improve parallelism.

@clemens-msupply
Copy link
Author

I tried to split our big Query struct into two structs and then merge them back like in https://async-graphql.github.io/async-graphql/en/merging_objects.html, but this didn't help, i.e. editing the smaller new Query struct resulted in the same long check times.

I manage to reproduce the long cargo check times though: when I make a significant change, e.g. adding a method or removing one I get long (~20s) compile times. Just touching a file results in ~6s compile times. So cargo check is smart enough to detect when nothing in the AST has changed (but I don't know why it need 6s for that haha).

I haven't tried to split our graphql crate into multiple crates yet.

@sunli829
Copy link
Collaborator

sunli829 commented Jan 20, 2022

I tried to split our big Query struct into two structs and then merge them back like in https://async-graphql.github.io/async-graphql/en/merging_objects.html, but this didn't help, i.e. editing the smaller new Query struct resulted in the same long check times.

This definitely does not reduce check time. 😂

I haven't tried to split our graphql crate into multiple crates yet.

But this is possible, splitting into multiple crates can make full use of multi-core cpu.

@Venryx
Copy link
Contributor

Venryx commented Mar 11, 2022

I also have been having very slow "cargo check" times in my Rust project: https://github.com/debate-map/app/tree/master/Packages/app-server-rs/src/db

In Rust 1.58, merely changing the string in a println would take 6 seconds for the cargo check call to complete (as called by rust-analyzer automatically on file-save).

In Rust 1.59 (just updated to see if it would help), this same thing now takes 17 seconds! (I cannot wait 17 seconds every time I change one letter just to see syntax/type errors!)

I eventually narrowed down one of the slowdowns to being due to the SimpleObject macro.

Specifically, given these structs:

Code for database structs

#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct Map {
    id: ID,
    accessPolicy: String,
    name: String,
    note: Option<String>,
    noteInline: Option<bool>,
    rootNode: String,
    defaultExpandDepth: i32,
    nodeAccessPolicy: Option<String>,
    featured: Option<bool>,
    editors: Vec<String>,
    creator: String,
    createdAt: i64,
    edits: i32,
    editedAt: Option<i64>,
    extras: serde_json::Value,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct Term {
    id: ID,
    accessPolicy: String,
    creator: String,
    createdAt: i64,
    name: String,
    forms: Vec<String>,
    disambiguation: Option<String>,
    r#type: String,
    definition: String,
    note: Option<String>,
    attachments: Vec<serde_json::Value>,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct AccessPolicy {
    id: ID,
    creator: String,
    createdAt: i64,
    name: String,
    permissions: serde_json::Value,
    permissions_userExtends: serde_json::Value,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct Media {
    id: ID,
    accessPolicy: String,
    creator: String,
    createdAt: i64,
    name: String,
    r#type: String,
    url: String,
    description: String,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct CommandRun {
    id: ID,
    actor: String,
    runTime: i64,
    public_base: bool,
    commandName: String,
    commandPayload: serde_json::Value,
    returnData: serde_json::Value,
    rlsTargets: serde_json::Value,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct Proposal {
    id: ID,
    r#type: String,
    title: String,
    text: String,
    creator: String,
    createdAt: i64,
    editedAt: Option<i64>,
    completedAt: Option<i64>,
}

#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct UserInfo {
    id: ID,
    proposalsOrder: Vec<String>,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct MapNodeEdit {
    id: ID,
    map: String,
    node: String,
    time: i64,
    r#type: String,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct NodeChildLink {
    id: ID,
    creator: String,
    createdAt: i64,
    parent: String,
    child: String,
    group: String,
    orderKey: String,
    form: Option<String>,
    seriesAnchor: Option<bool>,
    seriesEnd: Option<bool>,
    polarity: Option<String>,
    c_parentType: Option<String>,
    c_childType: Option<String>,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct MapNodePhrasing {
    id: ID,
    creator: String,
    createdAt: i64,
    node: String,
    r#type: String,
    text_base: String,
    text_negation: Option<String>,
    text_question: Option<String>,
    note: Option<String>,
    terms: Vec<serde_json::Value>,
    references: Vec<String>,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct NodeRating {
    id: ID,
    accessPolicy: String,
    node: String,
    r#type: String,
    creator: String,
    createdAt: i64,
    value: f32,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct MapNodeRevision {
    id: ID,
    node: String,
    creator: String,
    createdAt: i64,
    phrasing: serde_json::Value,
    phrasing_tsvector: String,
    note: Option<String>,
    displayDetails: Option<serde_json::Value>,
    equation: Option<serde_json::Value>,
    references: Option<serde_json::Value>,
    quote: Option<serde_json::Value>,
    media: Option<serde_json::Value>,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct MapNodeTag {
    id: ID,
    creator: String,
    createdAt: i64,
    nodes: Vec<String>,
    labels: Option<serde_json::Value>,
    mirrorChildrenFromXToY: Option<serde_json::Value>,
    xIsExtendedByY: Option<serde_json::Value>,
    mutuallyExclusiveGroup: Option<serde_json::Value>,
    restrictMirroringOfX: Option<serde_json::Value>,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct MapNode {
    id: ID,
    creator: String,
    createdAt: i64,
    r#type: String,
    rootNodeForMap: Option<String>,
    c_currentRevision: Option<String>,
    accessPolicy: String,
    multiPremiseArgument: Option<bool>,
    argumentType: Option<String>,
    extras: serde_json::Value,
}
#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct Share {
    id: ID,
    creator: String,
    createdAt: i64,
    name: String,
    r#type: String,
    mapID: Option<String>,
    mapView: serde_json::Value,
}

If I make no changes except removing the SimpleObject macro from the above structures, it speeds up my cargo check times by ~4.5 seconds.

4.5 seconds doesn't sound that bad? Well, it's acceptable if the 4.5 seconds was occurring only when I am running cargo build or the like. But that 4.5 seconds becomes painful when it's slowing down every instance of basic syntax/type checking in my IDE.

Also keep in mind that the 4.5 seconds is only one part of the overhead -- there is more overhead, eg. in the Object macro used for adding queries/subscriptions/mutations fields to the graphql API, but I'm ignoring those parts for now to focus on the SimpleObject slowdown. Altogether, the "db/XXX" files end up adding 12+ seconds to my cargo check time on rust 1.59, which is just too much.

If more details are desired, here is a text log I wrote detailing my cargo-check timing process:

Log of slowdown investigation

  1. Took 14 seconds for cargo-check to complete, from a simple string change in pgclient.rs. (time with "cargo check" text showing)
  2. Did it more times, and got: 6s, 5s, 5s
  3. Added a println!("...") call in start_streaming_changes func, and took the same (6s).
  4. Updated from Rust 1.58 to 1.59, and now the fastest I can get is 17s!
  5. When I mod-exclude all the files under "src/db" (as well as the imports/usages of them), the cargo-check takes ~1.5s.
  6. When I add back the mod-include for "db/_general": ~2.5s
  7. From now on, I'm doing the string-change test using a simple println!("TestXXX"); line in main.rs, just before the "mod ..." lines. (makes it easier to test impact of commenting files out)
  8. With all db files mod-included again (though imports/usages of them still commented): ~13s
  9. With 9 of the 19 db files mod-excluded: ~8s
  10. With 13 of the 19 db files mod-excluded: ~5s
  11. With only "db/_general" and "db/terms" left mod-included: ~3s
  12. With Term struct (with From impl) duplicated to 5 copies: ~4s [duplicates removed for future steps]
  13. With only users, user_hiddens, and global_data removed: ~11s
  14. With all code after the main struct+FromImpl (in each db file) removed: ~7s
  15. With all macros used on/in the main db-structs removed: ~2s
  16. With just the serde Serialize and Deserialize macros added-back: ~3s
  17. With the Clone macro also added back: ~3s
  18. With the SimpleObject macro also added back: ~7.5s

@Venryx
Copy link
Contributor

Venryx commented Mar 11, 2022

I tried updating from Rust 1.59 to Rust 1.60.0-beta.3 (as per a comment on the Discord saying 1.59 had incremental-compilation broken), but the 4.5 second cargo-check cost from the SimpleObject macro was still present.

However, in a terminal I then ran cargo clean, followed by cargo check, and from that point on, I was getting much better timings, for the stripped down code:

  • ~3.6s with the SimpleObject macros in use
  • ~1.8s with the SimpleObject macros removed

So on Rust 1.60, it seems the overhead of using SimpleObject is 1.8s, rather than 4.5s as seen on 1.59.

When I revert all the stripping-down I did, the overall time for cargo check (after the change of a println string) is now back to the ~6s I was getting on 1.59.

For me, that still feels quite slow; is 6s considered acceptable for cargo-check for even smallish programs like this? (I'm thinking of TypeScript for example, where syntax/type checking generally shows up in the IDE in under 1 second, even in much larger codebases)

Anyway, I will see if I can speed it up somehow...

@sunli829
Copy link
Collaborator

sunli829 commented Mar 11, 2022

Haha, I think the only way is to upgrade the computer. 😂
Or, as @clemens-msupply did, split a project into many sub crates.

@clemens-msupply
Copy link
Author

@Venryx we don't use SimpleObject maybe it would be worth trying to use #[Object] impl MyNode { ... } instead? even though it will be a bit more work...

Small update on splitting graphql into smaller crates: it actually works very well and developing graphql becomes fun again (even though you still have to wait some secs for cargo check but much better now). I wouldn't call the issue fixed though since it leads to some unnecessary overhead like managing multiple crates and keeping their Cargo.toml files in sync. On the plus site our graphql code is slightly better organized now since cyclic dependencies are not possible using crates.

@sunli829
Copy link
Collaborator

The SimpleObject macro doesn't generate any more code than Object macro, so I think using Object might not make the check faster. 😁

@Venryx
Copy link
Contributor

Venryx commented Mar 13, 2022

Haha, I think the only way is to upgrade the computer. 😂

Never give up! lol

I found a way to reduce that 6.1s down to 3.5s.

The trick? I'll just paste my comments from Discord:


The proof of concept works! By caching the tokens that the SimpleObject macro compiles to, and just reading that from disk (when there is a cache hit, from identical input tokens), cargo check processes a given (test) module in 1.5 seconds instead of 5.5 seconds!

The approach I used is also quite generic, so it should be usable for a fair number of other macros, not just async-graphql's SimpleObject.

As a user of the macro, basically all you need to do is wrap your macro-using code in cached_expand! {}, like so:

cached_expand!{

#[derive(SimpleObject, Clone, Serialize, Deserialize)]
struct MyStruct {
    field1: String,
    field2: String,
    field3: String,
}

}

The approach I used to expand the macros inside is somewhat "clumsy": it spawns a new process that runs cargo expand (see here), then parses the output for the generated code between the two "marker lines" it inserted at the start and end of its block. But so far it appears to work; and I'm willing to try out hacks like this if it means I can chip away at that painful 6-second cargo check time in VSCode.

After testing the bare-bones proof-of-concept, I then tried using it in my actual app-server, which was having cargo check times of 6.1 seconds. With the use of the new cached_expand macro, that time is now down to 3.5 seconds! (and 1.5 seconds of that is "base" compilation-time, ie. time-taken for files that cached_expand is not used in)

And I think I should be able to bring that number down even more, if I can improve the macro to handle more situations.

Right now it is working for these macros: SimpleObject, Object, Clone, Serialize, Deserialize
But it is not working for these macros: Subscription (had an error relating to the stream! macro IIRC), and possibly others (haven't tried it on all yet)

Anyway, I'm pretty pleased with the results so far, and will see if I can get the proof-of-concept polished up enough to be worth using long-term.

@Venryx
Copy link
Contributor

Venryx commented Mar 13, 2022

I just realized something: the execution of the Object and SimpleObject macros maybe isn't even necessary at cargo check time, ie. in the VSCode IDE.

It is needed during your actual build, because your GraphQL endpoints need that metadata. But it seems like it's not something even needed for the IDE type-checking stage. (Alternately, maybe it would need the same generated "shell", but could have the details of the GraphQL machinery underneath it left ungenerated.)

Anyway, what does this imply? Well, I could modify my wrapper macro to say, "If the current compilation is merely cargo-checking, then completely ignore all of async-graphql's macros, because we don't need the code it generates at this stage." That should improve performance even further!

Granted, you may lose some functionality (eg. compiler verification that the return type of the query/mutation/subscription methods are correct). But if the latency reductions for cargo check are high enough, I am willing to forgo that and just rely on following memory/conventions on it.

Anyway, I will try experimenting with this follow-up idea tomorrow.

@Venryx
Copy link
Contributor

Venryx commented Mar 14, 2022

It works! My crazy idea to have an "outer layer" macro that strips out all the async-graphql macros (when cargo check is running in rust-analyzer) actually works! (this maybe is not surprising to experienced rustacians, but as a newcomer from JS/TS, Rust's macro system feels really empowering)

The cargo check command is now down to 2.5 seconds, which is such a relief relative to the 6 second timings I was getting just a couple days ago.

What drawbacks does it have?

Not much! Basically, it just means that when you hover over the async-graphql macros in your IDE, you no longer see rust-doc descriptions and such.

That's really not a big deal though, because there are only six macros that get stripped (graphql, Object, Subscription, SimpleObject, MergedObject, MergedSubscription), so once you know what those 6 macros do, there is not much point to having their rust-doc info show up in your IDE like that.

How do you use it?

Pretty straightforward: any structs where you use the async-graphql macros, simple wrap that area of code with the wrap_async_graphql macro, like so:

wrap_async_graphql!{

#[derive(SimpleObject, Clone, Serialize, Deserialize)]
pub struct CommandRun {
    field1: String;
    #[graphql(name = "field99")]
    field2: i64;
}

}

There is one other change you must make:

// replace this
Schema::build(QueryRoot::default(), MutationRoot::default(), SubscriptionRoot::default())

// with this
wrap_agql_schema_build!{
    Schema::build(QueryRoot::default(), MutationRoot::default(), SubscriptionRoot::default())
}

This wrap_agql_schema_build macro is necessary so that when the rest of the async-graphql macros are stripped, the code that's meant to call the schema-building still compiles. All it does is that, when the macro-stripping is enabled, it replaces the Schema::build expression with this:

Schema::build(EmptyMutation, EmptyMutation, EmptySubscription)

(If a new macro seems overkill for this, it may be possible to use a more standard replacement approach, like the built-in cfg! macro or something...)

Anyway, while the output-token-caching approach in my previous post is more flexible, this "dumber" macro-stripping is probably better for async-graphql, since it gives even better speeds, avoids some complications with caching, and you don't lose much. (since the macros are almost completely just for outputting that GraphQL API, not for in-IDE introspection or the like -- at least for the way I'm currently using it)

While I will likely work more on the output-token-caching approach eventually, for now this solves my concerns well, and lets me resume work on my actual project without frustration.

Anyway, I hope to eventually make a separate crate for this system, so that other users of async-graphql can easily speed up their cargo-check times as well.

Until then, you can find the initial source-code for it here (in the rust-macros sub-crate of my larger project): https://github.com/debate-map/app/blob/d55d0043b4cdaea558adf70840ab6d902df04cf8/Packages/rust-macros/src/wrap_async_graphql.rs

Other than the instructions already mentioned, the only other thing needed is to have STRIP_ASYNC_GRAPHQL=1 as an environment-variable for your in-IDE cargo check executions. In VSCode, this is accomplished by adding the following to your project's .vscode/settings.json:

"rust-analyzer.server.extraEnv": {
    "STRIP_ASYNC_GRAPHQL": "1",
},

(I might be able to remove this requirement later on, if I figure out a way to detect the rust-analyzer execution-context automatically.)

@sunli829
Copy link
Collaborator

sunli829 commented Mar 14, 2022

👏🏻 This solution looks interesting, looking forward to it. 😁@Venryx

@clemens-msupply
Copy link
Author

@Venryx thanks for sharing your solution!
@sunli829 I just had a look at the simple_object.rs macro code but it doesn't seem that it generates that much code and it all looks straight forward to me (if I don't miss something) Do you have an idea why the rust compiler struggles so much with it?

@Venryx
Copy link
Contributor

Venryx commented Mar 14, 2022

@clemens-msupply Rust-analyzer has a function that expands a given macro into its final code-form: click on a macro-usage in your source, then run the Rust Analyzer: Expand macro recursively command.

Doing so for Object and SimpleObject does seem to produce quite a lot of code, making the compile-time impact more understandable.

Example source:

#[derive(SimpleObject)]
struct Test1 {
    field1: String,
    field2: String,
    field3: String,
}
Expanded code of SimpleObject macro

// Recursive expansion of SimpleObject! macro
// ===========================================

#[allow(clippy::all,clippy::pedantic)]
impl Test1 {
  #[inline]
  #[allow(missing_docs)]
  async fn field1(&self,ctx: &async_graphql::Context< '_>) -> async_graphql::Result< &String>{
    ::std::result::Result::Ok(&self.field1)
  }
  #[inline]
  #[allow(missing_docs)]
  async fn field2(&self,ctx: &async_graphql::Context< '_>) -> async_graphql::Result< &String>{
    ::std::result::Result::Ok(&self.field2)
  }
  #[inline]
  #[allow(missing_docs)]
  async fn field3(&self,ctx: &async_graphql::Context< '_>) -> async_graphql::Result< &String>{
    ::std::result::Result::Ok(&self.field3)
  }

  }
#[allow(clippy::all,clippy::pedantic)]
#[async_graphql::async_trait::async_trait]
impl async_graphql::resolver_utils::ContainerType for Test1 {
  async fn resolve_field(&self,ctx: &async_graphql::Context< '_>) -> async_graphql::ServerResult< ::std::option::Option<async_graphql::Value>>{
    if ctx.item.node.name.node=="field1" {
      let f = async move {
        self.field1(ctx).await.map_err(|err|err.into_server_error(ctx.item.pos))
      };
      let obj = f.await.map_err(|err|ctx.set_error_path(err))? ;
      let ctx_obj = ctx.with_selection_set(&ctx.item.node.selection_set);
      return async_graphql::OutputType::resolve(&obj, &ctx_obj,ctx.item).await.map(::std::option::Option::Some);
      
    }if ctx.item.node.name.node=="field2" {
      let f = async move {
        self.field2(ctx).await.map_err(|err|err.into_server_error(ctx.item.pos))
      };
      let obj = f.await.map_err(|err|ctx.set_error_path(err))? ;
      let ctx_obj = ctx.with_selection_set(&ctx.item.node.selection_set);
      return async_graphql::OutputType::resolve(&obj, &ctx_obj,ctx.item).await.map(::std::option::Option::Some);
      
    }if ctx.item.node.name.node=="field3" {
      let f = async move {
        self.field3(ctx).await.map_err(|err|err.into_server_error(ctx.item.pos))
      };
      let obj = f.await.map_err(|err|ctx.set_error_path(err))? ;
      let ctx_obj = ctx.with_selection_set(&ctx.item.node.selection_set);
      return async_graphql::OutputType::resolve(&obj, &ctx_obj,ctx.item).await.map(::std::option::Option::Some);
      
    }::std::result::Result::Ok(::std::option::Option::None)
  }

  }
#[allow(clippy::all,clippy::pedantic)]
#[async_graphql::async_trait::async_trait]
impl async_graphql::OutputType for Test1 {
  fn type_name() ->  ::std::borrow::Cow< 'static, ::std::primitive::str>{
    ::std::borrow::Cow::Borrowed("Test1")
  }
  fn create_type_info(registry: &mut async_graphql::registry::Registry) ->  ::std::string::String {
    registry.create_output_type:: <Self,_>(|registry|async_graphql::registry::MetaType::Object {
      name: ::std::borrow::ToOwned::to_owned("Test1"),description: ::std::option::Option::None,fields:{
        let mut fields = async_graphql::indexmap::IndexMap::new();
        fields.insert(::std::borrow::ToOwned::to_owned("field1"),async_graphql::registry::MetaField {
          name: ::std::borrow::ToOwned::to_owned("field1"),description: ::std::option::Option::None,args: ::std::default::Default::default(),ty: <String as async_graphql::OutputType> ::create_type_info(registry),deprecation:async_graphql::registry::Deprecation::NoDeprecated,cache_control:async_graphql::CacheControl {
            public:true,max_age:0usize,
          },external:false,provides: ::std::option::Option::None,requires: ::std::option::Option::None,visible: ::std::option::Option::None,compute_complexity: ::std::option::Option::None,
        });
        fields.insert(::std::borrow::ToOwned::to_owned("field2"),async_graphql::registry::MetaField {
          name: ::std::borrow::ToOwned::to_owned("field2"),description: ::std::option::Option::None,args: ::std::default::Default::default(),ty: <String as async_graphql::OutputType> ::create_type_info(registry),deprecation:async_graphql::registry::Deprecation::NoDeprecated,cache_control:async_graphql::CacheControl {
            public:true,max_age:0usize,
          },external:false,provides: ::std::option::Option::None,requires: ::std::option::Option::None,visible: ::std::option::Option::None,compute_complexity: ::std::option::Option::None,
        });
        fields.insert(::std::borrow::ToOwned::to_owned("field3"),async_graphql::registry::MetaField {
          name: ::std::borrow::ToOwned::to_owned("field3"),description: ::std::option::Option::None,args: ::std::default::Default::default(),ty: <String as async_graphql::OutputType> ::create_type_info(registry),deprecation:async_graphql::registry::Deprecation::NoDeprecated,cache_control:async_graphql::CacheControl {
            public:true,max_age:0usize,
          },external:false,provides: ::std::option::Option::None,requires: ::std::option::Option::None,visible: ::std::option::Option::None,compute_complexity: ::std::option::Option::None,
        });
        fields
      },cache_control:async_graphql::CacheControl {
        public:true,max_age:0usize,
      },extends:false,keys: ::std::option::Option::None,visible: ::std::option::Option::None,is_subscription:false,rust_typename: ::std::any::type_name:: <Self>(),
    })
  }
  async fn resolve(&self,ctx: &async_graphql::ContextSelectionSet< '_> ,_field: &async_graphql::Positioned<async_graphql::parser::types::Field>) -> async_graphql::ServerResult<async_graphql::Value>{
    async_graphql::resolver_utils::resolve_container(ctx,self).await
  }

  }
impl async_graphql::ObjectType for Test1{}

I maybe wouldn't expect the expansion above to take quite as long as it does, but it is a fair amount of code being generated.

@Venryx
Copy link
Contributor

Venryx commented Mar 15, 2022

So today I thought:

If I can use higher-level macros to strip out async-graphql's macro during cargo-check, why can't I do the same thing for other slow macros like Serialize and Deserialize?

Turns out, you absolutely can. I now have a combined wrap_slow_macros macro (source here; new wrap_serde_macros macro source here) that I wrap around all my structs (well, the ones that use Serialize, Deserialize, or any of the async-graphql macros).

It strips out the async-graphql macros, and replaces Serialize and Deserialize with "stubs", that just produce this:

impl serde::Serialize for #struct_name {
    fn serialize<__S>(&self, __serializer: __S) -> serde::__private::Result<__S::Ok, __S::Error> where __S: serde::Serializer {
        Err(serde::ser::Error::custom("This is a placeholder generated by the Serialize_Stub macro, for quick resolution during cargo-check. You should not be seeing this at runtime."))
    }
}
impl <'de> serde::Deserialize<'de> for #struct_name {
    fn deserialize<__D>(__deserializer:__D) -> serde::__private::Result<Self, __D::Error> where __D: serde::Deserializer<'de> {
        Err(serde::de::Error::custom("This is a placeholder generated by the Deserialize_Stub macro, for quick resolution during cargo-check. You should not be seeing this at runtime."))
    }
}

With the optimization above, my cargo check time is now down to a pleasant 1.5 seconds (it was 6.1s just a few days ago), without having to split my codebase into subcrates or the like. 😄

I'm quite relieved! This was my biggest concern with Rust (slow macros for important crates like serde and async-graphql), but the flexibility of Rust's macro-system has enabled me to just skip/stub whatever macros I want to at cargo-check time -- retaining fast in-IDE type-checking.

@github-actions
Copy link

github-actions bot commented Sep 8, 2022

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Sep 8, 2022
@github-actions
Copy link

This issue was closed because it has been stalled for 5 days with no activity.

Venryx added a commit to debate-map/app that referenced this issue Dec 24, 2022
…or gets very close to expiring), the jwt is destroyed, and the user is prompted to sign-in again.

* MS when user signs-in using google provider, the popup window automatically closes. (according to a stackoverflow comment, the approach used may not work on modern Safari though; not end of world, as user can close manually, but good to resolve in the future if possible)
* Removed sign-in panel for js backend. (will probably remove app-server-js in next commit; in any case, soon)

Note: To see how much porting the commands to rust increased the app-server-rs compile time, I did a test.
==========
CheckAction: Changing string in main.rs, time to recompile (ie. the time listed for the main "cargo build" step)

Results:
WithOnly2CommandsAddedToRust: Finished dev [unoptimized + debuginfo] target(s) in 12.43s <line-break, followed by docker line...> [done: 14.423s]
WithAllCommandsAddedToRust:   Finished dev [unoptimized + debuginfo] target(s) in 22.71s <line-break, followed by docker line...> [done: 25.079s]

So, adding the commands to app-server-rs did increase it's "incremental compile time" by ~10 seconds. Not wonderful of course, but okay:
* The `cargo check` time is what's most important, and it's still fast.
* I'm pretty sure most of the slowdown is from the async-graphql macros (as mentioned at: async-graphql/async-graphql#783), and presumably Rust will someday have a way for macros to be incrementally compiled as well. (if not, there may be a way to cache-results, even in the docker builds -- along the lines of my attempt in the thread above, except done more robustly)
* There are some workarounds to try to manage slow compilation. (eg. splitting up project into multiple crates; granted, that probably won't be so helpful in this case, since async-graphql needs to compile "after" most code, and most likely needs to remain "in a single crate")
@frederikhors
Copy link
Contributor

Dear brilliant @Venryx, are you still using those macros? Do you have a better solution?

@Venryx
Copy link
Contributor

Venryx commented Mar 26, 2024

Dear brilliant @Venryx, are you still using those macros? Do you have a better solution?

Indeed, I am still using the macros. I tried with them off again briefly, and can confirm that they still have a significant impact on making the "cargo check" times more acceptable. (~4.2s with my wrap_slow_macros macro, vs ~7.5s with them disabled; not sure if other regressions are the reason for my cargo-check times increasing from the prior ~1.5s to ~4.2s; perhaps it's just due to my adding more code to the server over the months/years)

That said, the release build times for my project remain very high (since my "trick" only works for cargo check, not for actual builds, since the macros need to all run there). You can see some of my (very slow) benchmarkings here: debate-map/app#285 (comment)

That thread also shows me trying a (non-open-source atm) fork of rust that someone made that caches the result of proc-macro expansions. Oddly though, that did not yield a meaningful improvement in the release build times for my project. (it did give a slight improvement on cargo check times, but interestingly, seemingly not as much as my "wrap_slow_macros" helper achieves if we extrapolate based on relative time-reductions between the two [did not directly test this particular matchup, and the timings I do have are on different machines])

Anyway, redoing the release build with -- -Ztime-passes showed that my local machine takes tons of time during the LLVM phase. I tried enabling incremental compilation for my release builds, and that did speeds things up significantly (brought the recompile times from ~5m to ~3m10s), but that is apparently not recommended for production.

So I'm not sure what the path forward is for getting faster release builds. The thread above did prove that it's possible to get much faster rebuilds than I am currently getting (the build server I was connecting to while testing the modified rust was taking only 1m20s for release builds rather than my ~5m, which is odd since its cpu is supposedly only slightly faster than mine), so maybe I can get the release builds closer to that 1m20s mark by:

  • Trying to do the builds on my machine, except with Linux as the host OS (eg. using dual-boot).
  • Upgrading my CPU. (either to the one on that build server, or maybe one that's newer / higher-spec than it)
  • Testing out the different optimization levels of Rust+LLVM (among other tweaks), to perhaps find a value that gives a better balance between compile-time and runtime performance.

@frederikhors
Copy link
Contributor

If you're on Windows can you try linux on windows and mold?

@Miaxos Miaxos reopened this Mar 29, 2024
@Miaxos Miaxos removed the Stale label Mar 29, 2024
@Venryx
Copy link
Contributor

Venryx commented Apr 2, 2024

If you're on Windows can you try linux on windows and mold?

I know you saw this already, but for others reading the thread, my timings on WSL2 (with the mold linker) can be seen here: debate-map/app#285 (comment)

Brief summary: Within WSL2, my timings were ~3m for a release build -- which is better than the ~5m on Windows itself (maybe because of NTFS slowness), but still a far cry from the 1m20s I was seeing on the build server.

I've bought a new development laptop which should arrive soon, which I plan to set up with dual-boot for Windows and Linux. So will be interesting to see what timings I get there. (eg. doing a release build on the exact same hardware and no VM, but just different OS and filesystem [NTFS vs ext4])

EDIT: I found that using Cranelift (and incremental compilation), I was able to drastically improve the speed of my release compiles (3m24s+ to ~20s) -- of course, others' projects may not have the same bottlenecks. More info on it (along with a compile-times table) in my comment here: #1287 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants