New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure execution speed #133

Open
bjorn3 opened this Issue Nov 2, 2018 · 29 comments

Comments

4 participants
@bjorn3
Owner

bjorn3 commented Nov 2, 2018

No description provided.

@bjorn3

This comment was marked as outdated.

Owner

bjorn3 commented Nov 3, 2018

As of b533f91:

Benchmark #1: ./target/mod_bench
  Time (mean ± σ):      5.644 s ±  0.043 s    [User: 5.617 s, System: 0.013 s]
  Range (min … max):    5.578 s …  5.696 s
 
Benchmark #2: ./target/mod_bench_llvm_0
  Time (mean ± σ):      3.474 s ±  0.030 s    [User: 3.459 s, System: 0.007 s]
  Range (min … max):    3.436 s …  3.516 s
 
Benchmark #3: ./target/mod_bench_llvm_1
  Time (mean ± σ):      1.220 s ±  0.016 s    [User: 1.211 s, System: 0.004 s]
  Range (min … max):    1.196 s …  1.242 s
 
Benchmark #4: ./target/mod_bench_llvm_2
  Time (mean ± σ):     351.7 ms ±   1.6 ms    [User: 347.3 ms, System: 1.4 ms]
  Range (min … max):   350.2 ms … 355.0 ms
 
Benchmark #5: ./target/mod_bench_llvm_3
  Time (mean ± σ):     356.1 ms ±   4.2 ms    [User: 350.9 ms, System: 1.8 ms]
  Range (min … max):   351.2 ms … 363.0 ms
 
Summary
  './target/mod_bench_llvm_2' ran
    1.01 ± 0.01 times faster than './target/mod_bench_llvm_3'
    3.47 ± 0.05 times faster than './target/mod_bench_llvm_1'
    9.88 ± 0.09 times faster than './target/mod_bench_llvm_0'
   16.05 ± 0.14 times faster than './target/mod_bench'

mod_bench is using cg_clif, mod_bench_llvm_* is using cg_llvm using the respective opt-level.

Comparing only mod_bench and mod_bench_llvm_0 the last is ~1.65 times faster.

cc @sunfishcode (You may be interested in the speed of cranelift. Please note that cg_clif itself is not yet optimized for output quality.)

@sunfishcode

This comment has been minimized.

Contributor

sunfishcode commented Nov 3, 2018

Currently in Cranelift the IR verifier is enabled by default, which can take a lot of time. Can you benchmark with the "enable_verifier" setting disabled?

@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 3, 2018

This is just execution speed.

@sunfishcode

This comment has been minimized.

Contributor

sunfishcode commented Nov 3, 2018

Ah, please update the issue title then :-). Also, you may want to try setting Cranelift's opt_level to best.

@bjorn3 bjorn3 changed the title Measure compilation speed Measure execution speed Nov 3, 2018

@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 3, 2018

Compilation speed is at decent level already. Running hyperfine with opt_level set to best right now.

Edit: doesn't seem to change much: flags_builder.set("opt_level", "best").unwrap();

@bjorn3

This comment was marked as outdated.

Owner

bjorn3 commented Nov 3, 2018

Compilation speed of libcore without optimizations for cg_clif and cg_llvm:

Benchmark #1: rustc -Zalways-encode-mir -Cpanic=abort -Zcodegen-backend=/Users/bjorn/Documents/rustc_codegen_cranelift/target/debug/librustc_codegen_cranelift.dylib -L crate=target/out --out-dir target/out target/libcore/src/libcore/lib.rs --crate-type lib --crate-name core
  Time (mean ± σ):     19.574 s ±  0.519 s    [User: 17.951 s, System: 1.171 s]
  Range (min … max):   18.884 s … 20.433 s
 
Benchmark #2: rustc -Zalways-encode-mir -Cpanic=abort target/libcore/src/libcore/lib.rs --crate-type lib --crate-name core
  Time (mean ± σ):     14.237 s ±  0.258 s    [User: 14.313 s, System: 0.539 s]
  Range (min … max):   13.845 s … 14.677 s
 
Summary
  'rustc -Zalways-encode-mir -Cpanic=abort target/libcore/src/libcore/lib.rs --crate-type lib --crate-name core' ran
    1.37 ± 0.04 times faster than 'rustc -Zalways-encode-mir -Cpanic=abort -Zcodegen-backend=/Users/bjorn/Documents/rustc_codegen_cranelift/target/debug/librustc_codegen_cranelift.dylib -L crate=target/out --out-dir target/out target/libcore/src/libcore/lib.rs --crate-type lib --crate-name core'

Edit: wait that's with semi debug mode compiled cg_clif. (cranelift itself is optimized but cg_clif is not)

Edit2: Using release mode for cg_clif:

Benchmark #1: rustc -Zalways-encode-mir -Cpanic=abort -Zcodegen-backend=/Users/bjorn/Documents/rustc_codegen_cranelift/target/release/librustc_codegen_cranelift.dylib -L crate=target/out --out-dir target/out target/libcore/src/libcore/lib.rs --crate-type lib --crate-name core
  Time (mean ± σ):     15.521 s ±  0.471 s    [User: 14.646 s, System: 0.600 s]
  Range (min … max):   14.824 s … 16.147 s
 
Benchmark #2: rustc -Zalways-encode-mir -Cpanic=abort target/libcore/src/libcore/lib.rs --crate-type lib --crate-name core
  Time (mean ± σ):     14.492 s ±  0.351 s    [User: 14.464 s, System: 0.590 s]
  Range (min … max):   13.761 s … 14.818 s
 
Summary
  'rustc -Zalways-encode-mir -Cpanic=abort target/libcore/src/libcore/lib.rs --crate-type lib --crate-name core' ran
    1.07 ± 0.04 times faster than 'rustc -Zalways-encode-mir -Cpanic=abort -Zcodegen-backend=/Users/bjorn/Documents/rustc_codegen_cranelift/target/release/librustc_codegen_cranelift.dylib -L crate=target/out --out-dir target/out target/libcore/src/libcore/lib.rs --crate-type lib --crate-name core'
@sunfishcode

This comment has been minimized.

Contributor

sunfishcode commented Nov 5, 2018

At a high level, it's not too surprising that Cranelift's execution speed on Rust would be in the ballpark of LLVM's O0 on Rust, because it's not doing any inlining. The rough short-term plan is to enable the MIR inliner to help with this.

There's probably a bunch of low-hanging fruit too, just making sure common Rust constructs are compiled well.

@lachlansneff

This comment has been minimized.

lachlansneff commented Nov 8, 2018

Are you multithreading compilation? Cranelift is inherently very good at parallel compilation,

@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 8, 2018

@lachlansneff No, rustc's TyCtxt is not thread safe (!Send + !Sync) and I believe cranelift's Module isn't either (!Sync)

@sunfishcode

This comment has been minimized.

Contributor

sunfishcode commented Nov 8, 2018

That's true, cranelift-codegen can be run with multiple instances in parallel, but cranelift-module doesn't yet make use of that.

@bjorn3

This comment was marked as outdated.

Owner

bjorn3 commented Nov 11, 2018

As of ef5d161

mod_bench_inline is build with -Zmir-opt-level=3 for both itself and sysroot.

Note: this is using a different computer than the previous benchmark

Benchmark #1: ./target/out/mod_bench
  Time (mean ± σ):      6.790 s ±  0.195 s    [User: 6.774 s, System: 0.008 s]
  Range (min … max):    6.548 s …  7.159 s
 
Benchmark #2: ./target/out/mod_bench_inline
  Time (mean ± σ):      5.600 s ±  0.136 s    [User: 5.589 s, System: 0.004 s]
  Range (min … max):    5.422 s …  5.887 s
 
Benchmark #3: ./target/out/mod_bench_llvm_0
  Time (mean ± σ):      4.402 s ±  0.197 s    [User: 4.391 s, System: 0.004 s]
  Range (min … max):    4.165 s …  4.784 s
 
Benchmark #4: ./target/out/mod_bench_llvm_1
  Time (mean ± σ):      1.627 s ±  0.036 s    [User: 1.624 s, System: 0.000 s]
  Range (min … max):    1.589 s …  1.693 s
 
Benchmark #5: ./target/out/mod_bench_llvm_2
  Time (mean ± σ):     423.6 ms ±   2.6 ms    [User: 422.0 ms, System: 0.0 ms]
  Range (min … max):   420.0 ms … 428.3 ms
 
Benchmark #6: ./target/out/mod_bench_llvm_3
  Time (mean ± σ):     422.9 ms ±   2.5 ms    [User: 420.8 ms, System: 0.4 ms]
  Range (min … max):   420.5 ms … 429.3 ms
 
Summary
  './target/out/mod_bench_llvm_3' ran
    1.00 ± 0.01 times faster than './target/out/mod_bench_llvm_2'
    3.85 ± 0.09 times faster than './target/out/mod_bench_llvm_1'
   10.41 ± 0.47 times faster than './target/out/mod_bench_llvm_0'
   13.24 ± 0.33 times faster than './target/out/mod_bench_inline'
   16.05 ± 0.47 times faster than './target/out/mod_bench'
@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 11, 2018

abc

Flamegraph for mod_bench_inline

@sunfishcode

This comment has been minimized.

Contributor

sunfishcode commented Nov 11, 2018

I took a quick look at the code. Here are some notes:

Some of the code will get better once Cranelift has more support for i8 and the associated workarounds are removed.

Is -Zmir-opt-level=3 in use when building libcore? I'm seeing things like core::cmp::impls::<impl core::cmp::PartialOrd for u32>::lt not being inlined, which is the kind of thing we're really going to want to inline.

If I'm reading this correctly, there's a small memmove in there, which the small memcpy/memmove/memset optimization should help with, once CraneStation/cranelift#586 is fixed.

There's a codegen abort when I enable opt_level=best. I'll investigate that.

@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 11, 2018

Is -Zmir-opt-level=3 in use when building libcore?

Yes, the whole sysroot.

If I'm reading this correctly, there's a small memmove in there

I am currently using my own code for copying locals:

CValue::ByRef(from, _src_layout) => {
let size = dst_layout.size.bytes() as i32;
// FIXME emit_small_memcpy has a bug as of commit CraneStation/cranelift@b2281ed
// fx.bcx.emit_small_memcpy(fx.module.target_config(), addr, from, size, layout.align.abi() as u8, src_layout.align.abi() as u8);
let mut offset = 0;
while size - offset >= 8 {
let byte =
fx.bcx
.ins()
.load(fx.pointer_type, MemFlags::new(), from, offset);
fx.bcx.ins().store(MemFlags::new(), byte, addr, offset);
offset += 8;
}
while size - offset >= 4 {
let byte = fx.bcx.ins().load(types::I32, MemFlags::new(), from, offset);
fx.bcx.ins().store(MemFlags::new(), byte, addr, offset);
offset += 4;
}
while offset < size {
let byte = fx.bcx.ins().load(types::I8, MemFlags::new(), from, offset);
fx.bcx.ins().store(MemFlags::new(), byte, addr, offset);
offset += 1;
}
}
}
}

So that memmove comes from the copy_nonoverlapping intrinsic:

fx.bcx.call_memmove(fx.module.target_config(), dst, src, byte_amount);

Which likely came fromcore::mem::swap: https://github.com/rust-lang/rust/blob/b76ee83254ec0398da554f25c2168d917ba60f1c/src/libcore/iter/range.rs#L228

There's a codegen abort when I enable opt_level=best. I'll investigate that.

😭

@bjorn3

This comment was marked as outdated.

Owner

bjorn3 commented Nov 11, 2018

Repro of that codegen abort:

test compile
set opt_level=best
target x86_64

function u0:0(i64, i64, i64) fast {
    ss0 = explicit_slot 16
    ss1 = explicit_slot 8
    ss2 = explicit_slot 16
    ss3 = explicit_slot 16
    ss4 = explicit_slot 1
    ss5 = explicit_slot 16
    ss6 = explicit_slot 16
    ss7 = explicit_slot 16
    ss8 = explicit_slot 16
    ss9 = explicit_slot 16
    ss10 = explicit_slot 8
    sig0 = (i64) -> i64 fast
    sig1 = (i64) -> i64 fast
    sig2 = (i64, i64) -> i8 fast
    sig3 = (i64, i64) -> i8 fast
    sig4 = (i64, i64, i64) fast
    sig5 = (i64, i64, i64) fast
    fn0 = colocated u0:143 sig1
    fn1 = u0:710 sig3
    fn2 = colocated u0:194 sig5

ebb0(v0: i64, v1: i64, v2: i64):
    v16 -> v2
    v20 -> v2
    nop
    v3 = stack_addr.i64 ss0
    v4 = load.i64 v1
    store v4, v3
    v5 = load.i64 v1+8
    store v5, v3+8
    v6 = stack_addr.i64 ss1
    v7 = stack_addr.i64 ss2
    v8 = stack_addr.i64 ss3
    v9 = stack_addr.i64 ss4
    v10 = stack_addr.i64 ss5
    v11 = stack_addr.i64 ss6
    v12 = stack_addr.i64 ss7
    v13 = stack_addr.i64 ss8
    v14 = stack_addr.i64 ss9
    v15 = stack_addr.i64 ss10
    jump ebb1

ebb1:
    nop
    v17 = load.i64 v3
    store v17, v7
    v18 = load.i64 v3+8
    store v18, v7+8
;
; _5 = const str::<impl str>::len(move _6)
    v19 = call fn0(v7)
    store v19, v6
    jump ebb2

ebb2:
    v21 = load.i64 v6
    v22 = icmp.i64 uge v20, v21
    v23 = bint.i8 v22
    v24 = uextend.i32 v23
    v25 = icmp_imm eq v24, 0
    brnz v25, ebb4(v16)
    jump ebb3

ebb3:
    v26 = load.i64 v3
    store v26, v8
    v27 = load.i64 v3+8
    store v27, v8+8
    v28 = iconst.i8 0
    store v28, v0
    v29 = iadd_imm.i64 v0, 8
    v30 = load.i64 v8
    store v30, v29
    v31 = load.i64 v8+8
    store v31, v29+8
    jump ebb9

ebb4(v34: i64):
    v45 -> v34
    v56 -> v34
    v38 -> v56
    v32 = load.i64 v3
    store v32, v10
    v33 = load.i64 v3+8
    store v33, v10+8
    v35 = call fn1(v10, v34)
    store v35, v9
    jump ebb6

ebb5:
    nop
    v36 = load.i64 v3
    store v36, v14
    v37 = load.i64 v3+8
    store v37, v14+8
    store.i64 v38, v15
;
; _14 = const ops::index::Index::index(move _15, move _16)
    call fn2(v13, v14, v15)
    jump ebb8

ebb6:
    nop
    v39 = load.i8 v9
    v40 = uextend.i32 v39
    v41 = icmp_imm eq v40, 0
    v42 = bint.i8 v41
    v43 = uextend.i32 v42
    v44 = icmp_imm eq v43, 0
    brnz v44, ebb5
    jump ebb7

ebb7:
    v46 = iconst.i64 1
    v47 = isub.i64 v45, v46
    jump ebb4(v47)

ebb8:
    v48 = load.i64 v13
    store v48, v12
    v49 = load.i64 v13+8
    store v49, v12+8
    v50 = load.i64 v12
    store v50, v11
    v51 = load.i64 v12+8
    store v51, v11+8
    v52 = iconst.i8 1
    store v52, v0
    v53 = iadd_imm.i64 v0, 8
    v54 = load.i64 v11
    store v54, v53
    v55 = load.i64 v11+8
    store v55, v53+8
    jump ebb9

ebb9:
    return
}
@lachlansneff

This comment has been minimized.

lachlansneff commented Nov 11, 2018

As a side note, it's interesting to see functions like index and len getting called when they should definitely be inlined.

@sunfishcode

This comment has been minimized.

Contributor

sunfishcode commented Nov 12, 2018

Another perf issue: CraneStation/cranelift#597 .

@sunfishcode sunfishcode referenced this issue Nov 12, 2018

Merged

Licm fixes #598

@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 16, 2018

Now that CraneStation/cranelift#598 is merged commit 8233ade enables opt_level=best for -Copt-level=3 (eg sysroot and mod_bench_inline)

mod_bench_inline is now faster than mod_bench_llvm_0 🎉

Benchmark #1: ./target/out/mod_bench
  Time (mean ± σ):      7.048 s ±  0.120 s    [User: 7.041 s, System: 0.000 s]
  Range (min … max):    6.944 s …  7.360 s
 
Benchmark #2: ./target/out/mod_bench_inline
  Time (mean ± σ):      3.975 s ±  0.100 s    [User: 3.972 s, System: 0.000 s]
  Range (min … max):    3.830 s …  4.122 s
 
Benchmark #3: ./target/out/mod_bench_llvm_0
  Time (mean ± σ):      4.243 s ±  0.059 s    [User: 4.240 s, System: 0.000 s]
  Range (min … max):    4.168 s …  4.329 s
 
Benchmark #4: ./target/out/mod_bench_llvm_1
  Time (mean ± σ):      1.625 s ±  0.015 s    [User: 1.622 s, System: 0.001 s]
  Range (min … max):    1.607 s …  1.649 s
 
Benchmark #5: ./target/out/mod_bench_llvm_2
  Time (mean ± σ):     422.1 ms ±   3.0 ms    [User: 419.6 ms, System: 0.0 ms]
  Range (min … max):   419.2 ms … 429.1 ms
 
Benchmark #6: ./target/out/mod_bench_llvm_3
  Time (mean ± σ):     421.5 ms ±   3.1 ms    [User: 419.2 ms, System: 0.0 ms]
  Range (min … max):   419.2 ms … 428.8 ms
 
Summary
  './target/out/mod_bench_llvm_3' ran
    1.00 ± 0.01 times faster than './target/out/mod_bench_llvm_2'
    3.86 ± 0.05 times faster than './target/out/mod_bench_llvm_1'
    9.43 ± 0.25 times faster than './target/out/mod_bench_inline'
   10.07 ± 0.16 times faster than './target/out/mod_bench_llvm_0'
   16.72 ± 0.31 times faster than './target/out/mod_bench'
@lachlansneff

This comment has been minimized.

lachlansneff commented Nov 16, 2018

@sunfishcode Are there any obvious optimizations that we're missing here?

@bjorn3

This comment was marked as outdated.

Owner

bjorn3 commented Nov 16, 2018

Compile time measure (somehow -Copt-level=2/3 for is faster than -Copt-level=0/1 for llvm. The first two are faster than clif and the later are slower):

Benchmark #1: rustc -Zalways-encode-mir -Cpanic=abort -Zcodegen-backend=/home/bjorn/Documenten/rustc_codegen_cranelift/target/debug/librustc_codegen_cranelift.so -L crate=target/out --out-dir target/out --sysroot ~/.xargo/HOST example/mod_bench.rs --crate-type bin -Zmir-opt-level=3 -Og --crate-name mod_bench_inline
  Time (mean ± σ):     105.6 ms ±   3.1 ms    [User: 75.0 ms, System: 22.7 ms]
  Range (min … max):   102.6 ms … 117.4 ms
 
  Warning: The first benchmarking run for this command was significantly slower than the rest (114.9 ms). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
 
Benchmark #2: rustc example/mod_bench.rs --crate-type bin -Copt-level=0 -o target/out/mod_bench_llvm_0 -Cpanic=abort
  Time (mean ± σ):     112.3 ms ±   5.1 ms    [User: 103.2 ms, System: 20.1 ms]
  Range (min … max):   107.7 ms … 133.1 ms
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark #3: rustc example/mod_bench.rs --crate-type bin -Copt-level=1 -o target/out/mod_bench_llvm_1 -Cpanic=abort
  Time (mean ± σ):     125.8 ms ±   3.5 ms    [User: 110.6 ms, System: 17.4 ms]
  Range (min … max):   122.5 ms … 136.5 ms
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark #4: rustc example/mod_bench.rs --crate-type bin -Copt-level=2 -o target/out/mod_bench_llvm_2 -Cpanic=abort
  Time (mean ± σ):      99.6 ms ±   2.9 ms    [User: 86.9 ms, System: 14.5 ms]
  Range (min … max):    96.8 ms … 109.0 ms
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark #5: rustc example/mod_bench.rs --crate-type bin -Copt-level=3 -o target/out/mod_bench_llvm_3 -Cpanic=abort
  Time (mean ± σ):     102.7 ms ±   4.8 ms    [User: 87.6 ms, System: 17.1 ms]
  Range (min … max):    97.4 ms … 123.2 ms
 
Summary
  'rustc example/mod_bench.rs --crate-type bin -Copt-level=2 -o target/out/mod_bench_llvm_2 -Cpanic=abort' ran
    1.03 ± 0.06 times faster than 'rustc example/mod_bench.rs --crate-type bin -Copt-level=3 -o target/out/mod_bench_llvm_3 -Cpanic=abort'
    1.06 ± 0.04 times faster than 'rustc -Zalways-encode-mir -Cpanic=abort -Zcodegen-backend=/home/bjorn/Documenten/rustc_codegen_cranelift/target/debug/librustc_codegen_cranelift.so -L crate=target/out --out-dir target/out --sysroot ~/.xargo/HOST example/mod_bench.rs --crate-type bin -Zmir-opt-level=3 -Og --crate-name mod_bench_inline'
    1.13 ± 0.06 times faster than 'rustc example/mod_bench.rs --crate-type bin -Copt-level=0 -o target/out/mod_bench_llvm_0 -Cpanic=abort'
    1.26 ± 0.05 times faster than 'rustc example/mod_bench.rs --crate-type bin -Copt-level=1 -o target/out/mod_bench_llvm_1 -Cpanic=abort'

bjorn3 added a commit that referenced this issue Nov 16, 2018

@lachlansneff

This comment has been minimized.

lachlansneff commented Nov 16, 2018

@bjorn3 It looks like that's using the debug version of the codegen backend. Shouldn't that be the release version to maximize compilation speed?

@sunfishcode

This comment has been minimized.

Contributor

sunfishcode commented Nov 16, 2018

Here's a summary of the ideas from above for how we can improve performance from here:

@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 16, 2018

It looks like that's using the debug version of the codegen backend.

Oops :) Benchmarking it in release mode atm.

This fix for the small memcpy/etc. optimization, and then update this code to make use of it.

And more importantly

// FIXME emit_small_memcpy has a bug as of commit CraneStation/cranelift@b2281ed
// fx.bcx.emit_small_memcpy(fx.module.target_config(), addr, from, size, layout.align.abi() as u8, src_layout.align.abi() as u8);

@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 16, 2018

Now with --release:

Benchmark #1: rustc -Zalways-encode-mir -Cpanic=abort -Zcodegen-backend=/home/bjorn/Documenten/rustc_codegen_cranelift/target/release/librustc_codegen_cranelift.so -L crate=target/out --out-dir target/out --sysroot ~/.xargo/HOST example/mod_bench.rs --crate-type bin -Zmir-opt-level=3 -Og --crate-name mod_bench_inline
  Time (mean ± σ):      86.0 ms ±   5.6 ms    [User: 57.3 ms, System: 20.2 ms]
  Range (min … max):    81.1 ms … 106.6 ms
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark #2: rustc example/mod_bench.rs --crate-type bin -Copt-level=0 -o target/out/mod_bench_llvm_0 -Cpanic=abort
  Time (mean ± σ):     115.0 ms ±   7.2 ms    [User: 107.4 ms, System: 19.9 ms]
  Range (min … max):   107.4 ms … 138.3 ms
 
Benchmark #3: rustc example/mod_bench.rs --crate-type bin -Copt-level=1 -o target/out/mod_bench_llvm_1 -Cpanic=abort
  Time (mean ± σ):     129.3 ms ±   7.0 ms    [User: 112.6 ms, System: 18.8 ms]
  Range (min … max):   122.4 ms … 151.2 ms
 
Benchmark #4: rustc example/mod_bench.rs --crate-type bin -Copt-level=2 -o target/out/mod_bench_llvm_2 -Cpanic=abort
  Time (mean ± σ):     102.7 ms ±   6.0 ms    [User: 88.4 ms, System: 16.8 ms]
  Range (min … max):    97.3 ms … 123.8 ms
 
Benchmark #5: rustc example/mod_bench.rs --crate-type bin -Copt-level=3 -o target/out/mod_bench_llvm_3 -Cpanic=abort
  Time (mean ± σ):     103.0 ms ±   6.5 ms    [User: 87.8 ms, System: 17.8 ms]
  Range (min … max):    97.4 ms … 125.8 ms
 
Summary
  'rustc -Zalways-encode-mir -Cpanic=abort -Zcodegen-backend=/home/bjorn/Documenten/rustc_codegen_cranelift/target/release/librustc_codegen_cranelift.so -L crate=target/out --out-dir target/out --sysroot ~/.xargo/HOST example/mod_bench.rs --crate-type bin -Zmir-opt-level=3 -Og --crate-name mod_bench_inline' ran
    1.19 ± 0.10 times faster than 'rustc example/mod_bench.rs --crate-type bin -Copt-level=2 -o target/out/mod_bench_llvm_2 -Cpanic=abort'
    1.20 ± 0.11 times faster than 'rustc example/mod_bench.rs --crate-type bin -Copt-level=3 -o target/out/mod_bench_llvm_3 -Cpanic=abort'
    1.34 ± 0.12 times faster than 'rustc example/mod_bench.rs --crate-type bin -Copt-level=0 -o target/out/mod_bench_llvm_0 -Cpanic=abort'
    1.50 ± 0.13 times faster than 'rustc example/mod_bench.rs --crate-type bin -Copt-level=1 -o target/out/mod_bench_llvm_1 -Cpanic=abort'
@lachlansneff

This comment has been minimized.

lachlansneff commented Nov 16, 2018

Yay, we are now technically a faster debug backend for rustc! 😀

There are a couple compile-time optimizations in the pipe, should improve this hopefully.

@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 16, 2018

Yes, at least on this small benchmark.

@bstrie

This comment has been minimized.

bstrie commented Nov 16, 2018

To help inform us as to how excited we ought to be, is there a document somewhere describing the path that would need to be taken to get Cranelift upstreamed into rustc for use with debug builds? As far as we random onlookers know, it could be anywhere from "oh, it's basically done, we just need to flip a switch" to "years and years away, don't hold your breath". :)

@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 16, 2018

is there a document somewhere describing the path that would need to be taken to get Cranelift upstreamed into rustc for use with debug builds?

No, getting this even close to upstreaming is blocked on at least rust-lang/rust#55627 and supporting libstd (#146). Haven't spoken to any rust devs about this. I want to get a MVP first before making this more widely known.

"oh, it's basically done, we just need to flip a switch"

This is not the case.

"years and years away, don't hold your breath"

I hope not :)

@bjorn3

This comment has been minimized.

Owner

bjorn3 commented Nov 17, 2018

Minimized some outdated benchmark results, because they are long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment