Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: System time is twice as slow vs 1.7.0 #45929

Open
PallHaraldsson opened this issue Jul 4, 2022 · 5 comments
Open

Regression: System time is twice as slow vs 1.7.0 #45929

PallHaraldsson opened this issue Jul 4, 2022 · 5 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Jul 4, 2022

$ hyperfine '~/julia-1.9-DEV-a60c76ea57/bin/julia -O3 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt'
Benchmark 1: ~/julia-1.9-DEV-a60c76ea57/bin/julia -O3 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt
  Time (mean ± σ):     11.567 s ±  0.057 s    [User: 11.673 s, System: 1.651 s]
  Range (min … max):   11.495 s … 11.666 s    10 runs

$ hyperfine 'julia -O3 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt'
Benchmark 1: julia -O3 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt
  Time (mean ± σ):     11.163 s ±  0.106 s    [User: 11.198 s, System: 0.858 s]
  Range (min … max):   10.978 s … 11.292 s    10 runs 

[You must first run other benchmark fasta, to get the input file.]

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/knucleotide-julia-8.html

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/fasta-julia-8.html

This benchmark is usually run multi-threaded, and there's it's also 2x slower, the total 7% slower:

$ hyperfine '~/julia-1.9-DEV-a60c76ea57/bin/julia -O3 -t4 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt'
Benchmark 1: ~/julia-1.9-DEV-a60c76ea57/bin/julia -O3 -t4 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt
  Time (mean ± σ):      5.610 s ±  0.065 s    [User: 12.680 s, System: 1.636 s]
  Range (min … max):    5.536 s …  5.754 s    10 runs

$ hyperfine 'julia -O3 -t4 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt'
Benchmark 1: julia -O3 -t4 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt
  Time (mean ± σ):      5.332 s ±  0.089 s    [User: 12.071 s, System: 0.877 s]
  Range (min … max):    5.170 s …  5.514 s    10 runs

Actually the above benchmarking was made with one line changed to:

isinteractive() || main(open("/dev/stdin"; lock=false), stdout)

but with the benchmark as in the original I get very similar:

$ hyperfine '~/julia-1.9-DEV-a60c76ea57/bin/julia -O3 -t4 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt'
Benchmark 1: ~/julia-1.9-DEV-a60c76ea57/bin/julia -O3 -t4 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt
  Time (mean ± σ):      5.645 s ±  0.122 s    [User: 12.755 s, System: 1.633 s]
  Range (min … max):    5.438 s …  5.887 s    10 runs

$ hyperfine 'julia -O3 -t4 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt'
Benchmark 1: julia -O3 -t4 --cpu-target=ivybridge --math-mode=ieee knucl.jl < knucleotide-input25000000.txt
  Time (mean ± σ):      5.314 s ±  0.103 s    [User: 12.069 s, System: 0.873 s]
  Range (min … max):    5.168 s …  5.507 s    10 runs
@PallHaraldsson PallHaraldsson changed the title Regression: System time is twice as slow Regression: System time is twice as slow vs 1.7.0 Jul 4, 2022
@JeffBezanson
Copy link
Sponsor Member

Thanks, would be interesting to bisect.

@ViralBShah ViralBShah added regression Regression in behavior compared to a previous version performance Must go faster labels Jul 17, 2022
@inkydragon
Copy link
Sponsor Member

inkydragon commented Aug 16, 2022

For me, user time is a little bit slow.

woclass@wos-PC:/mnt/v/tmp$ hyperfine --runs 20 'julia-latest  -O3 --math-mode=ieee  -- knucleotide.jl 0 < fasta.txt'
Benchmark 1: julia-latest  -O3 --math-mode=ieee  -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.836 s ±  0.038 s    [User: 7.066 s, System: 0.321 s]
  Range (min … max):    7.703 s …  7.890 s    20 runs

woclass@wos-PC:/mnt/v/tmp$ hyperfine --runs 30 'julia-latest  -O3 --math-mode=ieee  -- knucleotide.jl 0 < fasta.txt'
Benchmark 1: julia-latest  -O3 --math-mode=ieee  -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.849 s ±  0.029 s    [User: 7.055 s, System: 0.342 s]
  Range (min … max):    7.806 s …  7.910 s    30 runs

woclass@wos-PC:/mnt/v/tmp$ hyperfine --runs 30 'julia-1.8  -O3 --math-mode=ieee  -- knucleotide.jl 0 < fasta.txt'
Benchmark 1: julia-1.8  -O3 --math-mode=ieee  -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.345 s ±  0.067 s    [User: 6.541 s, System: 0.350 s]
  Range (min … max):    7.278 s …  7.579 s    30 runs

woclass@wos-PC:/mnt/v/tmp$ hyperfine 'julia-1.7  -O3 --math-mode=ieee  -- knucleotide.jl 0 < fasta.txt'
Benchmark 1: julia-1.7  -O3 --math-mode=ieee  -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.577 s ±  0.040 s    [User: 6.785 s, System: 0.346 s]
  Range (min … max):    7.527 s …  7.673 s    10 runs

woclass@wos-PC:/mnt/v/tmp$ hyperfine 'julia-1.6  -O3 --math-mode=ieee  -- knucleotide.jl 0 < fasta.txt'
Benchmark 1: julia-1.6  -O3 --math-mode=ieee  -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.435 s ±  0.090 s    [User: 6.674 s, System: 0.343 s]
  Range (min … max):    7.368 s …  7.657 s    10 runs

test steps

  • fasta.jl: julia -O3 --math-mode=ieee -- fasta.jl 25000000 > fasta.txt
  • knucleotide.jl: julia -O3 --math-mode=ieee -- knucleotide.jl 0 < fasta.txt
  • hyperfine 'julia-1.7 -O3 --math-mode=ieee -- knucleotide.jl 0 < fasta.txt'

env

  • julia-latest == 1.9.0-DEV.1131 (2022-08-16) ec98087
  • julia-1.8 == 1.8.0-rc4 (2022-08-08)
  • julia-1.7 == 1.7.3 (2022-05-06)
  • julia-1.6 == 1.6.7 (2022-07-19)
$ uname -a
Linux wos-PC 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ hyperfine --version
hyperfine 1.14.0

julia> versioninfo()
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-9400F CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Aug 16, 2022

Your timing "7.278 s" being best on 1.8.0-rc4, is good to know, and your worst, only 5.8% slower (and 7.8x for "User" but likely based of "mean", not "min"), on 1.9.0-DEV.1131, but could you just be measuring noise? For sure web browsers etc closed?

At least you do not have the "2x slower" System time regression I reported, at best a minor other regression (on a non-supported WSL2 platform).

Windows is a supported platform, so it's helpful to test it (without WSL2, but also with while WSL2 not an officially supported platform?), and for someone to retest on Linux, to confirm my regression went away.

Actually I see I posted for "--cpu-target=ivybridge" (because used in Debian's benchmark game), but not for the non-regressed numbers... I doubt it affected System, but best to test both with (and without).

@inkydragon
Copy link
Sponsor Member

inkydragon commented Aug 17, 2022

On Windows

  1. run test
PS V:\tmp> hyperfine --runs 30 `
>> 'julia-latest -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt' `
>> 'julia  +1.8  -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt' `
>> 'julia  +1.7  -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt' `
>> 'julia  +1.6  -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt'
Benchmark 1: julia-latest -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.644 s ±  0.023 s    [User: 0.005 s, System: 0.008 s]
  Range (min … max):    7.614 s …  7.749 s    30 runs

Benchmark 2: julia  +1.8  -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.261 s ±  0.043 s    [User: 0.006 s, System: 0.019 s]
  Range (min … max):    7.231 s …  7.477 s    30 runs

Benchmark 3: julia  +1.7  -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.225 s ±  0.092 s    [User: 0.005 s, System: 0.018 s]
  Range (min … max):    7.175 s …  7.669 s    30 runs

Benchmark 4: julia  +1.6  -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.050 s ±  0.045 s    [User: 0.006 s, System: 0.020 s]
  Range (min … max):    6.991 s …  7.235 s    30 runs

Summary
  'julia  +1.6  -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt' ran
    1.02 ± 0.01 times faster than 'julia  +1.7  -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt'
    1.03 ± 0.01 times faster than 'julia  +1.8  -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt'
    1.08 ± 0.01 times faster than 'julia-latest -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt'

Note: I'm using juliaup, so julia +1.8 == julia-1.8

  1. -e "VERSION"
PS V:\tmp> hyperfine --runs 30 `
>> 'julia-latest -O3 --cpu-target=ivybridge --math-mode=ieee -e "VERSION"' `
>> 'julia  +1.8  -O3 --cpu-target=ivybridge --math-mode=ieee -e "VERSION"' `
>> 'julia  +1.7  -O3 --cpu-target=ivybridge --math-mode=ieee -e "VERSION"' `
>> 'julia  +1.6  -O3 --cpu-target=ivybridge --math-mode=ieee -e "VERSION"' `
>> 'julia  +1.0  -O3 --cpu-target=ivybridge --math-mode=ieee -e "VERSION"'
Benchmark 1: julia-latest -O3 --cpu-target=ivybridge --math-mode=ieee -e VERSION
  Time (mean ± σ):     223.8 ms ±   4.6 ms    [User: 5.8 ms, System: 10.7 ms]
  Range (min … max):   218.3 ms … 238.4 ms    30 runs

Benchmark 2: julia  +1.8  -O3 --cpu-target=ivybridge --math-mode=ieee -e VERSION
  Time (mean ± σ):     257.5 ms ±   2.9 ms    [User: 6.3 ms, System: 22.1 ms]
  Range (min … max):   254.1 ms … 267.5 ms    30 runs

Benchmark 3: julia  +1.7  -O3 --cpu-target=ivybridge --math-mode=ieee -e VERSION
  Time (mean ± σ):     244.9 ms ±  10.3 ms    [User: 4.9 ms, System: 23.7 ms]
  Range (min … max):   231.5 ms … 272.7 ms    30 runs

Benchmark 4: julia  +1.6  -O3 --cpu-target=ivybridge --math-mode=ieee -e VERSION
  Time (mean ± σ):     236.6 ms ±   2.5 ms    [User: 4.4 ms, System: 25.8 ms]
  Range (min … max):   232.3 ms … 241.4 ms    30 runs

Benchmark 5: julia  +1.0  -O3 --cpu-target=ivybridge --math-mode=ieee -e VERSION
  Time (mean ± σ):     243.7 ms ±   3.2 ms    [User: 6.8 ms, System: 23.7 ms]
  Range (min … max):   237.0 ms … 251.3 ms    30 runs

Summary
  'julia-latest -O3 --cpu-target=ivybridge --math-mode=ieee -e VERSION' ran
    1.06 ± 0.02 times faster than 'julia  +1.6  -O3 --cpu-target=ivybridge --math-mode=ieee -e VERSION'
    1.09 ± 0.03 times faster than 'julia  +1.0  -O3 --cpu-target=ivybridge --math-mode=ieee -e VERSION'
    1.09 ± 0.05 times faster than 'julia  +1.7  -O3 --cpu-target=ivybridge --math-mode=ieee -e VERSION'
    1.15 ± 0.03 times faster than 'julia  +1.8  -O3 --cpu-target=ivybridge --math-mode=ieee -e VERSION'

WSL (Ubuntu)

$ hyperfine --runs 30 \
> 'julia-latest -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt' \
> 'julia-1.8    -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt' \
> 'julia-1.7    -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt' \
> 'julia-1.6    -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt'
Benchmark 1: julia-latest -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.937 s ±  0.100 s    [User: 7.158 s, System: 0.347 s]
  Range (min … max):    7.811 s …  8.292 s    30 runs

Benchmark 2: julia-1.8    -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.444 s ±  0.102 s    [User: 6.598 s, System: 0.382 s]
  Range (min … max):    7.364 s …  7.931 s    30 runs

Benchmark 3: julia-1.7    -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.650 s ±  0.045 s    [User: 6.854 s, System: 0.356 s]
  Range (min … max):    7.520 s …  7.757 s    30 runs

Benchmark 4: julia-1.6    -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt
  Time (mean ± σ):      7.536 s ±  0.106 s    [User: 6.749 s, System: 0.348 s]
  Range (min … max):    7.410 s …  7.965 s    30 runs

Summary
  'julia-1.8    -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt' ran
    1.01 ± 0.02 times faster than 'julia-1.6    -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt'
    1.03 ± 0.02 times faster than 'julia-1.7    -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt'
    1.07 ± 0.02 times faster than 'julia-latest -O3 --cpu-target=ivybridge --math-mode=ieee -- knucleotide.jl 0 < fasta.txt'

Note: My CPU is i5-9400, it's "Coffee Lake" not "Ivy Bridge".

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Aug 19, 2022

Doubling of systime is still there on real (Ubuntu) Linux (as opposed to WSL2), but at least overall time isn't higher:

$ time julia-1.8.0 knucl.jl 0 < knucleotide-input25000000.txt
real	0m11.229s
user	0m11.628s
sys	0m1.345s

$ time julia-1.7.3 knucl.jl 0 < knucleotide-input25000000.txt
[..]
real	0m11.406s
user	0m11.536s
sys	0m0.765s

so I didn't test 1.9, not sure if this is a worry, I want in general to get systime down, most concerned with combined (and startup cost). I'm not up-to-speed on bisecting, if still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

4 participants