Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[benchmark] Janitor Duty, Legacy Factor: A-C #20861

Merged
merged 18 commits into from
Dec 7, 2018

Conversation

palimondo
Copy link
Contributor

@palimondo palimondo commented Nov 29, 2018

This PR is first part of benchmarks clean-up to enable robust performance measurements by adjusting workloads to run in reasonable time (< 1000 μs), minimizing the accumulated error. To maintain long-term performance tracking, it applies legacy factor where necessary.

Legacy factor for benchmarks that had non-linear workload scaling (ArrayAppend...) is tuned for -O builds, which means some changes in measured times for -Osize and -Onone builds are unavoidable and to be expected, as a consequence of these workload scaling loops being fixed.

@palimondo palimondo force-pushed the a-tall-white-fountain-played branch 3 times, most recently from 3decde6 to ff688ab Compare December 4, 2018 14:42
@palimondo
Copy link
Contributor Author

@swift-ci please benchmark
cc @shahmishal

@palimondo palimondo changed the title [WIP][benchmark] Janitor Duty: Legacy Factor [benchmark] Janitor Duty, Legacy Factor: A-C Dec 4, 2018
@shahmishal
Copy link
Member

@palimondo Can you try again? The systems should now allow you to trigger swift-ci.

@palimondo
Copy link
Contributor Author

@swift-ci please benchmark

@palimondo
Copy link
Contributor Author

@swift-ci please test

@shahmishal
Copy link
Member

@palimondo Thanks!

@palimondo
Copy link
Contributor Author

palimondo commented Dec 4, 2018

@shahmishal Oh, thank YOU! 🙏😊
Mood: „By the Power of Grayskull…”

@swift-ci

This comment has been minimized.

@swift-ci

This comment has been minimized.

@palimondo palimondo removed the request for review from eeckstein December 5, 2018 08:15
@palimondo
Copy link
Contributor Author

There are expected errors in lit tests because I've enabled Ackermann. Now I want to see if smoke test still skips running them…

@swift-ci please smoke test

@palimondo palimondo changed the title [benchmark] Janitor Duty, Legacy Factor: A-C [WIP][benchmark] Janitor Duty, Legacy Factor: A-C Dec 5, 2018
Lowered the default sample cap from 2k to 200. (This doesn’t effect manually specified `--num-samples` argument in the driver.)

Swift benchmarks have pretty constant performance profile over time. It’s more beneficial to get multiple independent measurements faster, than more samples from the same run.
Reintroduce Ackermann benchmark with reasonably sized workload. Since this one was tagged `.unstable`, there’s no need to go through `legacyFactor`.

Adjusted `lit` test for Benchmark_O now that Ackermann isn’t marked `.unstable` anymore.

Removed incorrect `asserts` requirement from the benchmark lit tests.
Cleaned up doubled inner loops and applied legacyFactor to the resulting, more rational, workload sizes.

Factors for string appending benchmarks are not 10, but 11 to compensate for smaller amortized Array resizes. Similarily ArrayPlusEqualFiveElementCollection’s factor is 49.
@swift-ci

This comment has been minimized.

Lowered the workload of string appending bechmarks a bit further, to get them under 1ms in `O` and 10ms in `Onone` builds.

Fine tuned the legacy factor to match original `O` runtimes — the changes are non-linear because of complications from double loop and amortized array growth costs.

These string appending benchmarks along with
* ArrayPlusEqualFiveElementCollection and
* ArrayPlusEqualSingleElementCollection
are likely to show different performance in `Osize` and `Onone` builds because of the non-linearity of their original implementation.
The `legacyFactor` is 11 instead of 10, because of the wierd way the `repeatCount` translates to the workload size based on the number of lines in the `workloadBase`… it’s just so.
Lowered the workload to get under 1ms runtime.
Also fixed variable workload size (`N`-dependent).
…plus an attempt at reducing wide range of memory used between independent, repeated measurements in the `CharacterPropertiesPrecomputed` benchmark by reserving the appropriate set capacity in advance.
@palimondo
Copy link
Contributor Author

@swift-ci please benchmark

@palimondo
Copy link
Contributor Author

@swift-ci please test

@palimondo
Copy link
Contributor Author

@eeckstein Please review 🙏

@palimondo palimondo changed the title [WIP][benchmark] Janitor Duty, Legacy Factor: A-C [benchmark] Janitor Duty, Legacy Factor: A-C Dec 7, 2018
@swift-ci
Copy link
Contributor

swift-ci commented Dec 7, 2018

Build comment file:

Performance: -O

TEST MIN MAX MEAN MAX_RSS
Added
Ackermann 225 231 229

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
ChainedFilterMap.o 3165 3374 +6.6% 0.94x
Improvement
ArraySubscript.o 3972 3892 -2.0% 1.02x
ArrayAppend.o 39272 38776 -1.3% 1.01x
AnyHashableWithAClass.o 3012 2980 -1.1% 1.01x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
ChainedFilterMap 9685 8451 -12.7% 1.15x
Added
Ackermann 274 290 283

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
ChainedFilterMap.o 3204 3494 +9.1% 0.92x
Improvement
ArraySubscript.o 4275 4163 -2.6% 1.03x
ArrayAppend.o 32412 31676 -2.3% 1.02x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
ArrayPlusEqualSingleElementCollection 206979 243272 +17.5% 0.85x
ArrayAppendAsciiSubstring 50448 54828 +8.7% 0.92x
ArrayAppendLatin1Substring 51071 55332 +8.3% 0.92x
ArrayAppendUTF16Substring 50594 54612 +7.9% 0.93x
Improvement
CharIteration_ascii_unicodeScalars 367844 269616 -26.7% 1.36x
ChainedFilterMap 378862 335826 -11.4% 1.13x
Added
Ackermann 2628 2857 2726
Benchmark Check Report
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

Copy link
Contributor

@eeckstein eeckstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants