-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce threads spawned for codegen #745
Conversation
Before: 30.59user 45.65system 0:36.92elapsed 206%CPU (0avgtext+0avgdata 946896maxresident)k 0inputs+532768outputs (0major+824582minor)pagefaults 0swaps After: 29.47user 3.11system 0:22.75elapsed 143%CPU (0avgtext+0avgdata 945240maxresident)k 0inputs+531824outputs (0major+497122minor)pagefaults 0swaps
Hi! I tried your changes in Mac and they result in a slowdown. Here's what I'm doing:
You can do steps 1 to 4 for the current compiler and the compiler with your changes applies to see if there's a difference. At least in my case the "Codegen (bc+obj)" takes a few seconds more (about 4 seconds). Would there be a reason that in Mac forking many times is faster than in linux? If so, we might use an |
$ uname -a
Linux jhass-laptop 4.0.4-2-ARCH #1 SMP PREEMPT Fri May 22 03:05:23 UTC 2015 x86_64 GNU/Linux
$ rm -rf .crystal
$ time make clean crystal FLAGS="--release --stats"
rm -rf .build
rm -rf ./doc
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build --release --stats -o .build/crystal src/compiler/crystal.cr
Parse: 00:00:01.0255180
Type inference: 00:00:07.6041620
Codegen (crystal): 00:00:01.5148260
Codegen (bc+obj): 00:03:01.5516630
Codegen (clang): 00:00:03.8879810
make clean crystal FLAGS="--release --stats" 190.57s user 2.04s system 98% cpu 3:16.47 total
$ hub checkout https://github.com/manastech/crystal/pull/745
...
$ rm -rf .crystal/
$ time make clean crystal FLAGS="--release --stats"
rm -rf .build
rm -rf ./doc
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build --release --stats -o .build/crystal src/compiler/crystal.cr
Parse: 00:00:00.1183640
Type inference: 00:00:04.6174130
Codegen (crystal): 00:00:01.4481470
Codegen (bc+obj): 00:03:00.8868770
Codegen (clang): 00:00:01.4102650
make clean crystal FLAGS="--release --stats" 189.60s user 1.61s system 101% cpu 3:08.88 total |
This does not seem to provide a performance improvement in release mode as there is only one unit and
timing(msg) do
if units.length == 1
puts "only one"
first_unit = units.first
codegen_single_unit(first_unit, target_triple, multithreaded)
if emit = @emit
first_unit.emit(emit, original_output_filename || output_filename)
end
else
puts "many"
codegen_many_units(units, target_triple, multithreaded)
end
end While I'm unable to track down exactly where this happens, I assume it's because LLVM needs access to the whole program at the same time if it's going to perform optimizations like inlining or devirtualization. My results at the top represent compilation of the compiler in debug mode, which is significantly sped up. |
Right, disregard my above test, it's flawed anyway since not actually using the compiler from this branch. |
Yes, please profile this without @jhass I think you also erase the temporary compiler by doing |
|
$ rm -rf .crystal
$ time make clean crystal FLAGS="--release --stats"
rm -rf .build
rm -rf ./doc
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build --release --stats -o .build/crystal src/compiler/crystal.cr
Parse: 00:00:00.1133590
Type inference: 00:00:08.4715020
Codegen (crystal): 00:00:01.5372070
Codegen (bc+obj): 00:03:28.9384850
Codegen (clang): 00:00:06.4842070
make clean crystal FLAGS="--release --stats" 220.28s user 2.54s system 98% cpu 3:46.21 total
$ touch src/compiler/crystal.cr
$ rm -rf .crystal
$ time make crystal FLAGS="--stats"
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build --stats -o .build/crystal src/compiler/crystal.cr
Using compiled compiler at .build/crystal
Parse: 00:00:00.4365590
Type inference: 00:00:09.4770200
Codegen (crystal): 00:00:03.9863990
Codegen (bc+obj): 00:00:12.5514720
Codegen (clang): 00:00:01.9006570
make crystal FLAGS="--stats" 37.61s user 26.60s system 222% cpu 28.824 total
$ hub checkout https://github.com/manastech/crystal/pull/745
...
$ rm -rf .crystal
$ time make clean crystal FLAGS="--release --stats"
rm -rf .build
rm -rf ./doc
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build --release --stats -o .build/crystal src/compiler/crystal.cr
Parse: 00:00:00.1392960
Type inference: 00:00:04.9105870
Codegen (crystal): 00:00:01.5402750
Codegen (bc+obj): 00:03:31.4144840
Codegen (clang): 00:00:01.4782640
make clean crystal FLAGS="--release --stats" 220.10s user 1.79s system 100% cpu 3:40.01 total
$ touch src/compiler/crystal.cr
$ rm -rf .crystal
$ time make crystal FLAGS="--stats"
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build --stats -o .build/crystal src/compiler/crystal.cr
Using compiled compiler at .build/crystal
Parse: 00:00:00.1106770
Type inference: 00:00:04.9089220
Codegen (crystal): 00:00:01.8094630
Codegen (bc+obj): 00:00:07.0417690
Codegen (clang): 00:00:01.8946120
make crystal FLAGS="--stats" 29.84s user 2.28s system 197% cpu 16.236 total So ~16s vs ~28s for me. |
Taking a look back on this, running with recent $ uname -a
Darwin wing 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 12 15:56:47 PST 2016; root:xnu-3248.40.183~6/RELEASE_X86_64 x86_64
$ time make clean crystal FLAGS="--release --stats"
time make clean crystal FLAGS="--release --stats"
rm -rf .build
rm -rf ./doc
rm -rf src/llvm/ext/llvm_ext.o
rm -rf src/ext/sigfault.o src/ext/libcrystal.a
c++ -c -o src/llvm/ext/llvm_ext.o src/llvm/ext/llvm_ext.cc `/usr/local/bin/llvm-config-3.5 --cxxflags`
cc -fPIC -c -o src/ext/sigfault.o src/ext/sigfault.c
ar -rcs src/ext/libcrystal.a src/ext/sigfault.o
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build --release --stats -o .build/crystal src/compiler/crystal.cr -D without_openssl -D without_zlib
Parse: 00:00:00.1382580 ( 50.41MB)
Semantic (top level): 00:00:00.0799790 ( 66.41MB)
Semantic (abstract def check): 00:00:00.0009700 ( 66.41MB)
Semantic (type declarations): 00:00:00.0077000 ( 74.41MB)
Semantic (main): 00:00:07.1716160 ( 562.85MB)
Semantic (cleanup): 00:00:00.0001270 ( 562.85MB)
Semantic (recursive struct check): 00:00:00.0011330 ( 562.85MB)
Codegen (crystal): 00:00:02.9087470 ( 699.35MB)
Codegen (bc+obj): 00:04:30.0751160 ( 699.35MB)
Codegen (linking): 00:00:00.6957770 ( 699.35MB)
real 4m42.225s
user 4m36.965s
sys 0m3.485s
$ touch src/compiler/crystal.cr
$ rm -rf .crystal
$ time make crystal FLAGS="--stats"
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build --stats -o .build/crystal src/compiler/crystal.cr -D without_openssl -D without_zlib
Using compiled compiler at .build/crystal
Parse: 00:00:00.1474360 ( 50.28MB)
Semantic (top level): 00:00:00.1068450 ( 74.28MB)
Semantic (abstract def check): 00:00:00.0009770 ( 74.28MB)
Semantic (type declarations): 00:00:00.0087020 ( 74.28MB)
Semantic (initializers): 00:00:00.0207510 ( 74.28MB)
Semantic (main): 00:00:06.2693800 ( 626.71MB)
Semantic (cleanup): 00:00:00.0001230 ( 626.71MB)
Semantic (recursive struct check): 00:00:00.0011770 ( 626.71MB)
Codegen (crystal): 00:00:03.6329920 ( 676.21MB)
Codegen (bc+obj): 00:00:15.6289320 ( 676.21MB)
Codegen (linking): 00:00:01.1216190 ( 676.21MB)
real 0m27.443s
user 1m2.879s
sys 0m7.495s Manually applying the proposed patch (no longer mergable directly, needed some syntax changes), and running the same test: $ time make clean crystal FLAGS="--release --stats"
time make clean crystal FLAGS="--release --stats"
rm -rf .build
rm -rf ./doc
rm -rf src/llvm/ext/llvm_ext.o
rm -rf src/ext/sigfault.o src/ext/libcrystal.a
c++ -c -o src/llvm/ext/llvm_ext.o src/llvm/ext/llvm_ext.cc `/usr/local/bin/llvm-config-3.5 --cxxflags`
cc -fPIC -c -o src/ext/sigfault.o src/ext/sigfault.c
ar -rcs src/ext/libcrystal.a src/ext/sigfault.o
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build --release --stats -o .build/crystal src/compiler/crystal.cr -D without_openssl -D without_zlib
Parse: 00:00:00.1490020 ( 50.51MB)
Semantic (top level): 00:00:00.0842270 ( 66.51MB)
Semantic (abstract def check): 00:00:00.0008460 ( 66.51MB)
Semantic (type declarations): 00:00:00.0073360 ( 74.51MB)
Semantic (main): 00:00:05.0246380 ( 562.95MB)
Semantic (cleanup): 00:00:00.0001560 ( 562.95MB)
Semantic (recursive struct check): 00:00:00.0012300 ( 562.95MB)
Codegen (crystal): 00:00:02.6866090 ( 691.45MB)
Codegen (bc+obj): 00:04:32.6508970 ( 691.45MB)
Codegen (linking): 00:00:01.2189160 ( 691.45MB)
real 4m42.940s
user 4m35.450s
sys 0m3.778s
$ touch src/compiler/crystal.cr
$ rm -rf .crystal
$ time make crystal FLAGS="--stats"
time make crystal FLAGS="--stats"
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build --stats -o .build/crystal src/compiler/crystal.cr -D without_openssl -D without_zlib
Using compiled compiler at .build/crystal
Parse: 00:00:00.3199260 ( 50.27MB)
Semantic (top level): 00:00:00.1099910 ( 74.27MB)
Semantic (abstract def check): 00:00:00.0009440 ( 74.27MB)
Semantic (type declarations): 00:00:00.0089740 ( 74.27MB)
Semantic (initializers): 00:00:00.0211150 ( 74.27MB)
Semantic (main): 00:00:07.1027550 ( 594.71MB)
Semantic (cleanup): 00:00:00.0001190 ( 594.71MB)
Semantic (recursive struct check): 00:00:00.0013980 ( 594.71MB)
Codegen (crystal): 00:00:03.0466980 ( 739.21MB)
Codegen (bc+obj): 00:00:16.1062540 ( 739.21MB)
Codegen (linking): 00:00:01.1141030 ( 739.21MB)
real 0m28.352s
user 0m53.466s
sys 0m2.463s So now it looks pretty comparative in speed. Note: I never saw the Not certain what the future of this PR should be. |
If someone wants to do this, please open a separate PR that prevents many forks, but only in linux systems. On mac it should keep doing the same. |
I finally implemented something similar at c792139 . It's consistently faster on mac, and it should be much faster on linux |
Before:
After:
Note that the system time is reduced from 45 seconds to 3 seconds: this time is almost pure overhead spent copying pages.