Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build times have deteriorated significantly #3610

Closed
Nutomic opened this issue Jul 13, 2023 · 19 comments
Closed

Build times have deteriorated significantly #3610

Nutomic opened this issue Jul 13, 2023 · 19 comments

Comments

@Nutomic
Copy link
Member

Nutomic commented Jul 13, 2023

I noticed that incremental builds are now much slower than they used to be, so I did some testing. Im doing this with the following commands:

# make sure dependencies are built, can be cancelled after it reaches lemmy_utils
cargo build
# should be a full incremental build, dont reuse any 
cargo clean -p lemmy_utils
# measure build performance, disable sccache
RUSTC_WRAPPER='' cargo build --timings

I only did one run per commit and the results arent exactly consistent, but we can still draw some conclusions:

  • current main (6310f55): 6m 42s
  • after tracingcall in spawn_try_task , rebased on main: 3m 29s

So #3581 doesnt improve compile time at all.

Edit: Looks like this actually helps a lot but I want to do more testing.

  • after connections work with both conn and pool (1d38aad): 6m 45s
  • before connections work with both conn and pool (73492af): 2m 47s

#3420 is the worst offender, the build time is almost than doubled. We definitely need to find a solution for this, or otherwise revert the change

  • tag 0.18.1: 179s
  • tag 0.18.1 with strip symbols and disable debug info: 118s
  • tag 0.18.0: 47s

0.18.1 is much slower, specifically for compiling lemmy_server according to the html report. This is most likely from adding additional code in that crate like prometheus metrics. The solution is to move as much as possible into lemmy_routes so that it can be compiled in parallel with api and apub crates.

cc @dullbananas

Edit: All measurements updated so they are not polluted by sccache. They should be quite accurate now.

@phiresky
Copy link
Collaborator

That looks bad. Did you also try turning the [profile.dev] debug =0 option back on and see if that helps? i changed that in some pr last week and thought it didn't affect it, but not 100% sure

@phiresky
Copy link
Collaborator

screeenshot of timings on current main:
image
image

@Nutomic
Copy link
Member Author

Nutomic commented Jul 13, 2023

Okay the timings above are all crap, turns out sccache wasnt actually disabled because I have it configured via ~/.cargo/config.toml and not env var. Gonna have to redo the measurements.

That looks bad. Did you also try turning the [profile.dev] debug =0 option back on and see if that helps? i changed that in some pr last week and thought it didn't affect it, but not 100% sure

Good point, I didnt remember that change. I tried this on 0.18.0 (for faster builds):

  • with strip symbols and debug 0: 43s
  • without: 169s

So this is a huge difference and the dev profile should definitely be added back. Debug symbols are very rarely needed, and if so the lines can be temporarily commented out.

Nutomic added a commit that referenced this issue Jul 13, 2023
This significantly speeds up builds:
- with strip symbols and debug 0: 43s
- without: 169s
@phiresky
Copy link
Collaborator

Doesn't look extremely useful, but:

$ cargo rustc --bin lemmy_server -- -Zself-profile
$ summarize summarize lemmy_server-3845390.mm_profdata  -p 3
+------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| Item                         | Self time | % of total time | Time     | Item count | Incremental load time | Incremental result hashing time |
+------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| run_linker                   | 19.24s    | 81.818          | 19.24s   | 1          | 0.00ns                | 0.00ns                          |
+------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| LLVM_module_codegen_emit_obj | 1.63s     | 6.948           | 1.63s    | 42         | 0.00ns                | 0.00ns                          |
+------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| LLVM_passes                  | 708.47ms  | 3.012           | 708.47ms | 1          | 0.00ns                | 0.00ns                          |
+------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
Total cpu time: 23.519738655s
Filtered results account for 91.778% of total time.
+----------------------------+------------------+
| Item                       | Artifact Size    |
+----------------------------+------------------+
| cgu_instructions           | 142648 bytes     |
+----------------------------+------------------+
| codegen_unit_size_estimate | 185917 bytes     |
+----------------------------+------------------+
| dep_graph                  | 21596791 bytes   |
+----------------------------+------------------+
| linked_artifact            | 2133937888 bytes |
+----------------------------+------------------+
| object_file                | 37797040 bytes   |
+----------------------------+------------------+
| query_cache                | 5940646 bytes    |
+----------------------------+------------------+
| work_product_index         | 14816 bytes      |
+----------------------------+------------------+

and:

$ cargo rustc --lib -- -Zself-profile
+---------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| Item                                  | Self time | % of total time | Time     | Item count | Incremental load time | Incremental result hashing time |
+---------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| LLVM_module_codegen_emit_obj          | 4.41s     | 37.757          | 4.41s    | 40         | 0.00ns                | 0.00ns                          |
+---------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| LLVM_passes                           | 2.26s     | 19.314          | 2.26s    | 1          | 0.00ns                | 0.00ns                          |
+---------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| codegen_module                        | 1.05s     | 8.987           | 1.85s    | 40         | 0.00ns                | 0.00ns                          |
+---------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| monomorphization_collector_graph_walk | 454.77ms  | 3.893           | 1.34s    | 1          | 0.00ns                | 0.00ns                          |
+---------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
| evaluate_obligation                   | 453.54ms  | 3.882           | 471.67ms | 14493      | 0.00ns                | 0.00ns                          |
+---------------------------------------+-----------+-----------------+----------+------------+-----------------------+---------------------------------+
Total cpu time: 11.681932637s
Filtered results account for 73.833% of total time.
+----------------------------+-----------------+
| Item                       | Artifact Size   |
+----------------------------+-----------------+
| cgu_instructions           | 417456 bytes    |
+----------------------------+-----------------+
| codegen_unit_size_estimate | 880614 bytes    |
+----------------------------+-----------------+
| crate_metadata             | 2808877 bytes   |
+----------------------------+-----------------+
| dep_graph                  | 49658931 bytes  |
+----------------------------+-----------------+
| linked_artifact            | 303985238 bytes |
+----------------------------+-----------------+
| object_file                | 74554352 bytes  |
+----------------------------+-----------------+
| query_cache                | 16796196 bytes  |
+----------------------------+-----------------+
| work_product_index         | 14786 bytes     |
+----------------------------+-----------------+

@JustAnotherSoftwareDeveloper

Hey just out of curiosity are you guys building your dev environments directly from your device or is there some standardization?

dessalines pushed a commit that referenced this issue Jul 13, 2023
…3611)

* Add dev profile to strip symbols and disable debug info (ref #3610)

This significantly speeds up builds:
- with strip symbols and debug 0: 43s
- without: 169s

* add comment, no strip symbols
@dullbananas
Copy link
Collaborator

dullbananas commented Jul 13, 2023

I measured the impact of #3420 with a more consistent measurement method.

I made one folder contain the contents of the repository before the merge, and another folder with the contents after the merge:

cd lemmy
git checkout 73492af
git worktree add ../lemmy-after 1d38aad
git -C ../lemmy-after* submodule update
cd ..

I created this script called time.sh:

set -o errexit -o nounset
cd $1

# Parallelism might cause worse consistency. For example, jobs could become ide for different durations, and crates could be compiled in different parallel groups.
export CARGO_BUILD_JOBS=1

# Disk space runs out unless I set these environment variables.
export CARGO_PROFILE_DEV_DEBUG=0
export CARGO_BUILD_INCREMENTAL=false

cargo clean

cargo build --bin lemmy_server --timings

I ran test.sh for both folders in parallel and made the outputs visible in text files:

((bash time.sh lemmy) &> out1.txt) &
((bash time.sh lemmy-after*) &> out2.txt) &

Running one compilation at a time could cause the 2 results to be affected by different conditions (such as different CPU usage of other processes).

For both compilations, the timing was extremely similar for all dependencies and some lemmy crates. When switching the editor view between out1.txt and out2.txt, I wasn't totally sure that they were different processes until I saw the inconsistent ordering.

The build time of lemmy_apub changed from 72.2 secs to 78.8 secs. I couldn't find significant differences in other crates.

lemmy timings.zip

@Nutomic
Copy link
Member Author

Nutomic commented Jul 14, 2023

@dullbananas You are testing full builds while Im testing incremental builds so its a completely different thing. Also the env vars you are setting might affect the results. And of course running two builds in parallel will slow both of them down too. So you should check with the steps described above.

One problem Im having is that the timings are extremely inconsistent, one run is over 5 minutes and the next identical build is only 2 minutes. Almost like theres still some caching active.

@phiresky What does that mean?

@JustAnotherSoftwareDeveloper I dont understand, this requires only cargo build as dev environment.

@phiresky
Copy link
Collaborator

phiresky commented Jul 14, 2023

@phiresky What does that mean?

Nothing changeable for this issue I guess, you can ignore it.

I also had the thing where incremental builds sometimes took 2min and sometimes 4min, even though there was no real difference. Also sometimes it started rebuilding from scratch when changing these settings and sometimes it didn't.

@dullbananas
Copy link
Collaborator

dullbananas commented Jul 15, 2023

I ran it again without CARGO_BUILD_INCREMENTAL=false. Here's the big increases in seconds:

db_views: 25.1 to 390.5

db_views_actor: 20.5 to 196.8

lemmy incremental timings.zip

@dullbananas
Copy link
Collaborator

I changed time.sh to run this after cd ..:

# Delete previous output and succeed even if no files match
rm *.mm_profdata || true

# Running `clean -p lemmy_db_views` then building that crate with `cargo rustc` doesn't cause a lengthy rebuild
cargo +nightly clean
RUSTFLAGS="-Zself-profile" CARGO_PROFILE_DEV_DEBUG=0 cargo +nightly build --lib

After compiling twice in parallel like before, the profiling data for a single crate can be compared by running this:

# Install `summarize` (stable branch is not used because it does a segmentation fault)
cargo install --git https://github.com/rust-lang/measureme.git summarize

# Compare for `lemmy_db_views`
summarize diff {lemmy,lemmy-after*}/lemmy_db_views-*.mm_profdata > diff.txt

Here's the top part of the diff for lemmy_db_views:

+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| Item                                                | Self Time       | Self Time Change | Time            | Time Change | Item count | Cache hits | Blocked time | Incremental load time | Incremental hashing time |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| evaluate_obligation                                 | +211.016411783s | +1882.86%        | +211.186271064s | +1829.21%   | +2808      | +0         | +0ns         | +0ns                  | +1.894187ms              |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| type_op_prove_predicate                             | +61.559385085s  | +9331.05%        | +73.60123312s   | +8078.20%   | +562       | +0         | +0ns         | +0ns                  | +111.482µs               |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| normalize_projection_ty                             | +50.426422858s  | +5991.19%        | +78.595032244s  | +5779.51%   | +226       | +0         | +0ns         | +0ns                  | +706.391µs               |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| type_op_ascribe_user_type                           | +2.099559137s   | +1226.18%        | +2.104304975s   | +1210.39%   | +19        | +0         | +0ns         | +0ns                  | +48.362µs                |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| typeck                                              | +1.144748815s   | +6.12%           | +172.281216328s | +583.21%    | +0         | +0         | +0ns         | +0ns                  | +153.797µs               |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| incr_comp_encode_dep_graph                          | +319.49277ms    | +227.74%         | +319.49277ms    | +227.74%    | +214448    | +0         | +0ns         | +0ns                  | +0ns                     |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| specialization_graph_of                             | -273.819877ms   | -81.77%          | -495.97361ms    | -67.81%     | +0         | +0         | +0ns         | +0ns                  | -3.74261ms               |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| LLVM_module_codegen_emit_obj                        | +254.782767ms   | +67.54%          | +254.782767ms   | +67.54%     | +1         | +0         | +0ns         | +0ns                  | +0ns                     |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| implementations_of_trait                            | -142.80069ms    | -74.84%          | -125.632089ms   | -54.91%     | +0         | +0         | +0ns         | +0ns                  | -71.97774ms              |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| impl_trait_ref                                      | -140.059498ms   | -78.04%          | -152.183986ms   | -56.85%     | +4         | +0         | +0ns         | +0ns                  | -136.168477ms            |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| incr_comp_intern_dep_graph_node                     | -133.852969ms   | -56.58%          | -102.34653ms    | -33.99%     | +9516      | +0         | +0ns         | +0ns                  | +0ns                     |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| expand_proc_macro                                   | -123.962352ms   | -65.35%          | -123.963342ms   | -65.35%     | +0         | +0         | +0ns         | +0ns                  | +0ns                     |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| metadata_register_crate                             | -117.498081ms   | -78.83%          | -262.029345ms   | -71.98%     | +0         | +0         | +0ns         | +0ns                  | +0ns                     |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| LLVM_passes                                         | +91.742232ms    | +51.62%          | +91.742232ms    | +51.62%     | +0         | +0         | +0ns         | +0ns                  | +0ns                     |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| mir_borrowck                                        | +83.383766ms    | +45.83%          | +308.602548347s | +6752.16%   | +0         | +0         | +0ns         | +0ns                  | -15.478µs                |
+-----------------------------------------------------+-----------------+------------------+-----------------+-------------+------------+------------+--------------+-----------------------+--------------------------+
| metadata_decode_entry_exported_symbols              | -75.321233ms    | -40.33%          | -75.321233ms    | -40.33%     | +0         | +0         | +0ns         | +0ns                  | +0ns                     |

lemmy_db_views diff.zip

@dullbananas
Copy link
Collaborator

Maybe this is because of the increased amount of lifetimes caused by replacing &'a DbPool with &'a mut DbPool<'b>. I will try to fix this by create a type alias that equals &'a mut DbPool<'a> and remove the need for 2 separate lifetimes by reborrowing the reference inside of DbPool whenever a smaller lifetime is needed.

@googleben
Copy link
Contributor

This is a difficult one for sure. Spent a few hours trying some tricks and nothing made a dent. My guess is some weird interaction with Diesel types.

@googleben
Copy link
Contributor

It's not perfect, but I came up with a solution that compiles and I think should work, with the caveat that the unsafe is only sound as long as you always treat its origin as borrowed (not great, I know). Took my compile times on a Ryzen 5600x from around 150 seconds for lemmy_db_views_actor down to about 15 seconds, and ~280 to ~19 for lemmy_db_views. Here's the new DbPool with accoutrements:

pub type ActualDbPool = Pool<AsyncPgConnection>;

pub struct DbPool {
  pub pool: ActualDbPool,
  conn: Option<DbPoolConn>
}

pub enum DbPoolConn {
  Orig(PooledConnection<AsyncPgConnection>),
  Ref(*mut AsyncPgConnection)
}

unsafe impl Send for DbPoolConn {}
unsafe impl Sync for DbPoolConn {}

impl Deref for DbPoolConn {
  type Target = AsyncPgConnection;
  fn deref(&self) -> &Self::Target {
    match self {
      Self::Orig(conn) => conn,
      Self::Ref(conn) => unsafe { &**conn }
    }
  }
}

impl DerefMut for DbPoolConn {
  fn deref_mut(&mut self) -> &mut Self::Target {
    match self {
      Self::Orig(conn) => conn,
      Self::Ref(conn) => unsafe { &mut **conn }
    }
  }
}

impl DbPool {
  pub fn new(pool: ActualDbPool) -> Self {
    Self {
      pool,
      conn: None
    }
  }
  pub async fn connect(&mut self) -> Result<DbPoolConn, DieselError> {
    if self.conn.is_none() {
        self.conn = Some(DbPoolConn::Orig(self.pool.get().await.map_err(|e| QueryBuilderError(e.into()))?));
    }
    Ok(self.conn.take().unwrap())
  }
  pub fn is_connected(&self) -> bool {
    self.conn.is_some()
  }
  pub fn from_connection(pool: &DbPool, conn: DbPoolConn) -> Self {
    Self {
      pool: pool.pool.clone(),
      conn: Some(conn)
    }
  }
  /// # Safety
  /// You MUST treat `conn` as fully mutably borrowed (unusable) for the lifetime of the returned `DbPool`.
  pub unsafe fn from_raw_connection(pool: &DbPool, conn: &mut AsyncPgConnection) -> Self {
    Self {
      pool: pool.pool.clone(),
      conn: Some(DbPoolConn::Ref(conn as *mut AsyncPgConnection))
    }
  }
}

#[macro_export]
macro_rules! try_join_with_pool {
  ($pool:ident => ($($func:expr),+)) => {{
    // Check type
    let _: &mut $crate::utils::DbPool = $pool;

    if $pool.is_connected() {
      // Run sequentially
      async {
        Ok(($({
          // `?` prevents the error type from being inferred in an `async` block, so `match` is used instead
          match ($func)($pool).await {
            ::core::result::Result::Ok(__v) => __v,
            ::core::result::Result::Err(__v) => return ::core::result::Result::Err(__v),
          }
        }),+))
      }.await
    } else {
      // Run concurrently with `try_join`
      ::futures::try_join!(
        $(async {
          let mut tmp = $crate::utils::DbPool::new($pool.pool.clone());
          ($func)(&mut tmp).await
        }),+
      )
    }
  }};
}

It mostly works like the old version. And the unsafe+raw pointer are only necessary for one thing as far as I can tell, which is when you need to use a DbPool inside conn.build_transaction().run(|conn| {}), e.g. https://github.com/LemmyNet/lemmy/blob/main/crates/db_schema/src/impls/federation_blocklist.rs#L24. Maybe someone better than me at Rust can fix the unsoundness problem, but I'm really not sure it's possible without either closure wizardry or the lifetime parameters that caused this issue.

Also I don't have an environment set up yet so I haven't tested it - there may be bugs.

@dullbananas
Copy link
Collaborator

Unsafe code should be a last resort.

I'm currently trying to reduce compile time by removing use of TypedBuilder, which generates lots of code. It's almost finished.

Also, I recently realized that &mut AsyncPgConnection can be converted to tokio::sync::Mutex<&mut AsyncPgConnection>. Maybe &mut DbPool<'_> can be replaced with DbPool<'_> if DbPool is defined like this:

#[derive(Clone, Copy)]
enum DbPool<'a> {
  Pool(&'a DbPool),
  Conn(&'a Mutex<&'a mut AsyncPgConnection>),
}

@phiresky
Copy link
Collaborator

phiresky commented Jul 17, 2023

@dullbananas Maybe instead of those structs it would be easier to make the get_conn function a trait?
something like...

trait GetConn {
     fn get_conn(&mut self) -> &mut AsyncPgConnection;
}
impl GetConn for Pool<AsyncPgConnection> { ...self.get() }
impl GetConn for Object<AsyncPgConnection> { &mut self }

and then just pass around &dyn GetConn? That would get rid of all the monomorphization and complexity for a bit of overhead.

@Nutomic
Copy link
Member Author

Nutomic commented Jul 17, 2023

#3637 is a major improvement, it improves build time from 2m 16s to 42s in my measurements. This completely resolves the issue in my view, although it can never hurt to improve compile time further. Using unsafe is not an option though because we have zero experience with that, and its too risky in a web server which is exposed to the internet.

@dullbananas
Copy link
Collaborator

I think this is making it slow: diesel-rs/diesel#3223

Nutomic added a commit to cetra3/lemmy that referenced this issue Jul 19, 2023
…#3610) (LemmyNet#3611)

* Add dev profile to strip symbols and disable debug info (ref LemmyNet#3610)

This significantly speeds up builds:
- with strip symbols and debug 0: 43s
- without: 169s

* add comment, no strip symbols
Nutomic added a commit that referenced this issue Jul 21, 2023
…3611)

* Add dev profile to strip symbols and disable debug info (ref #3610)

This significantly speeds up builds:
- with strip symbols and debug 0: 43s
- without: 169s

* add comment, no strip symbols
Nutomic added a commit that referenced this issue Jul 21, 2023
…3611)

* Add dev profile to strip symbols and disable debug info (ref #3610)

This significantly speeds up builds:
- with strip symbols and debug 0: 43s
- without: 169s

* add comment, no strip symbols
@dullbananas
Copy link
Collaborator

It looks like removing TypedBuilder completely fixed the promblem caused by #3420. This is supported by the lack of significant change in build time of db_views_moderator, which didn't use TypedBuilder. So changing DbPool is unlikely to improve build time.

@Nutomic
Copy link
Member Author

Nutomic commented Jul 26, 2023

Gonna close this issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants