Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove legacyFactor and adopt blackHole where appropriate in Diffing and Myers benchmarks #26104

Merged

Conversation

numist
Copy link
Member

@numist numist commented Jul 12, 2019

Per post-merge feedback in #25808

@numist numist requested a review from palimondo July 12, 2019 02:18
@swift-ci
Copy link
Contributor

Performance: -O

Improvement OLD NEW DELTA RATIO
FlattenListFlatMap 10806 7869 -27.2% 1.37x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSString 824 889 +7.9% 0.93x (?)

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@palimondo
Copy link
Contributor

palimondo commented Jul 12, 2019

The benchmark validation on CI runs only for the newly added tests, so we don’t see those results here now. Could you please also post your local result from running Benchmark_Driver check -f Diff -f Myers --format markdown? Otherwise the removal of legacyFactor and blackHole are good.

Copy link
Contributor

@palimondo palimondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please refactor to use a shared run function, parametrized with the workload String? You should probably also touch the lazily initialized global constants in setUpFunction using the blackHole, too. See AngryPhonebook benchmarks for an example of this.

@numist
Copy link
Member Author

numist commented Jul 12, 2019

Benchmark Check Report
⚠️ Myers execution took at least 8040 μs.
Decrease the workload of Myers by a factor of 16 (10), to be less than 1000 μs.
⚠️Ⓜ️ Myers has very wide range of memory used between independent, repeated measurements.
Myers mem_pages [i1, i2]: min=[2140, 2672] 𝚫=532 R=[1074, 569]

After touching the globals and adjusting the inputs of Myers to match Diffing.Similar in 16087175b7:

Benchmark Check Report
⚠️Ⓜ️ Myers has very wide range of memory used between independent, repeated measurements.
Myers mem_pages [i1, i2]: min=[35, 43] 𝚫=8 R=[20, 41]

Not sure why Myers would exhibit this problem and not Diffing.Similar as the two are practically identical (until I land further perf improvements to Collection.difference(from:), anyway).

As an aside, could you help me understand why everything gets blackHoled? I understand why you'd want to mask out setup and the like, but AngryPhoneBook even blackholes the work that is supposed to be benchmarked so it seems like I don't have the right idea around what it's for.

@palimondo
Copy link
Contributor

As I understand it, theblackHole is used as barrier to prevent sufficiently smart future compiler from eliminating calls with unused results as dead code. But thinking about this particular case… the let _ = should guarantee the same thing, right? 🤔 @lorentey, @eeckstein?

We started to use it in setUpFunctions to pre-initialize lazy globals that take significant time, so that their initialization does not spoil the first sample.

@numist
Copy link
Member Author

numist commented Jul 12, 2019

Sounds reasonable. Regardless, there's value in conforming to the existing patterns.

@numist
Copy link
Member Author

numist commented Jul 12, 2019

@swift-ci please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - ad3b488cd2ce44a78fdb206e3b294ab54dee946e

@palimondo
Copy link
Contributor

Myers mem_pages [i1, i2]: min=[35, 43] 𝚫=8 R=[20, 41]

This seems fine to me. The memory heuristic is quite sensitive.

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - ad3b488cd2ce44a78fdb206e3b294ab54dee946e

Copy link
Contributor

@palimondo palimondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scott, I apologize for the being skittish in my review… I’m on the road and doing all this on a phone.

Could you please also change the naming of these benchmarks to follow the Naming Convention? I think the benchmark family name here is clearly Diff, so putting a period after that would we a good first step. If I understand this correctly, Myers is an alternative diffing algorithm. In your last commit, you’ve changed the workloads in Myers to match those in DiffSimilar. Therefore it would make sense to try and make the results directly comparable by also applying the concept of variants from the benchmark naming convention. Could you also please:

  • change the variable names of workloads in Myers.swift to match those from Diffing.swift, so that it is clearer it is the same workload,
  • rename the file Myers.swift to DiffingMyers.swift or DiffMyers.swift,
  • rename Myers benchmark to Diff.Similar.Myers, to make it clear it is a variant of the same workload, using a different algorithm.

@palimondo
Copy link
Contributor

palimondo commented Jul 13, 2019

Tiny nitpick: I’d declare the reversed workloads like this:

let alphabetsReversed = Array(alphabets.reversed())
let loremReverse = Array(loremIpsum.reversed())

…and probably call the latter loremReversed… 😜

Also, your shared runFunction is great now, but there’s no need for the run_ prefix anymore, so I’d drop that, too.

@numist
Copy link
Member Author

numist commented Jul 15, 2019

@swift-ci Please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 16087175b772fe5e7590e54fc0c8a70abdcd4e20

@numist numist requested a review from palimondo July 15, 2019 23:12
@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 16087175b772fe5e7590e54fc0c8a70abdcd4e20

@palimondo
Copy link
Contributor

palimondo commented Jul 16, 2019

The changes in DiffingMyers and Diffing look great, but something went very wrong during the merge. It now has 148 changed files instead of 4.

@numist numist force-pushed the numist/diff-related-benchmark-cleanup branch from a184e3e to 3e2e4f8 Compare July 16, 2019 21:39
@numist
Copy link
Member Author

numist commented Jul 16, 2019

Something was extremely broken with my fork. Should be fixed now.

@palimondo
Copy link
Contributor

@swift-ci please benchmark

Copy link
Contributor

@palimondo palimondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for your patience Scott!

public let Myers = [
BenchmarkInfo(name: "Myers", runFunction: run_Myers, tags: [.algorithm]),
]
// The DiffingMyers test benchmarks Swift's performance running the algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this explanation!

@palimondo
Copy link
Contributor

@swift-ci please smoke test

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
CharacterLiteralsLarge 97 108 +11.3% 0.90x
FlattenListLoop 4339 4824 +11.2% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSDateRef 4200 3680 -12.4% 1.14x (?)
RandomShuffleLCG2 768 704 -8.3% 1.09x
Array2D 7520 6928 -7.9% 1.09x
MapReduce 397 368 -7.3% 1.08x (?)
MapReduceAnyCollection 398 369 -7.3% 1.08x (?)
 
Added MIN MAX MEAN MAX_RSS
Diffing.Disparate 0 0 0
Diffing.Myers.Similar 0 0 0
Diffing.PangramToAlphabet 0 0 0
Diffing.Pangrams 0 0 0
Diffing.ReversedAlphabets 0 0 0
Diffing.ReversedLorem 0 0 0
Diffing.Same 0 0 0
Diffing.Similar 0 0 0
 
Removed MIN MAX MEAN MAX_RSS
DiffDisparate 0 0 0
DiffPangramToAlphabet 0 0 0
DiffPangrams 0 0 0
DiffReversedAlphabets 0 0 0
DiffReversedLorem 0 0 0
DiffSame 0 0 0
DiffSimilar 0 0 0
Myers 0 0 0

Code size: -O

Regression OLD NEW DELTA RATIO
Diffing.o 8958 9134 +2.0% 0.98x

Performance: -Osize

Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSStringRef 172 160 -7.0% 1.07x (?)
InsertCharacterEndIndexNonASCII 59 55 -6.8% 1.07x (?)
 
Added MIN MAX MEAN MAX_RSS
Diffing.Disparate 0 0 0
Diffing.Myers.Similar 0 0 0
Diffing.PangramToAlphabet 0 0 0
Diffing.Pangrams 0 0 0
Diffing.ReversedAlphabets 0 0 0
Diffing.ReversedLorem 0 0 0
Diffing.Same 0 0 0
Diffing.Similar 0 0 0
 
Removed MIN MAX MEAN MAX_RSS
DiffDisparate 0 0 0
DiffPangramToAlphabet 0 0 0
DiffPangrams 0 0 0
DiffReversedAlphabets 0 0 0
DiffReversedLorem 0 0 0
DiffSame 0 0 0
DiffSimilar 0 0 0
Myers 0 0 0

Code size: -Osize

Regression OLD NEW DELTA RATIO
Diffing.o 7966 8478 +6.4% 0.94x

Performance: -Onone

Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubFromNSDateRef 4450 4140 -7.0% 1.07x (?)
 
Added MIN MAX MEAN MAX_RSS
Diffing.Disparate 0 0 0
Diffing.Myers.Similar 0 0 0
Diffing.PangramToAlphabet 0 0 0
Diffing.Pangrams 0 0 0
Diffing.ReversedAlphabets 0 0 0
Diffing.ReversedLorem 0 0 0
Diffing.Same 0 0 0
Diffing.Similar 0 0 0
 
Removed MIN MAX MEAN MAX_RSS
DiffDisparate 0 0 0
DiffPangramToAlphabet 0 0 0
DiffPangrams 0 0 0
DiffReversedAlphabets 0 0 0
DiffReversedLorem 0 0 0
DiffSame 0 0 0
DiffSimilar 0 0 0
Myers 0 0 0

Code size: -swiftlibs

Benchmark Check Report
⛔️⏱ Diffing.Disparate execution took 0 μs.
Ensure the workload of Diffing.Disparate has a properly measurable size (runtime > 20 μs) and is not eliminated by the compiler (use blackHole function if necessary).
⛔️⏱ Diffing.Pangrams execution took 0 μs.
Ensure the workload of Diffing.Pangrams has a properly measurable size (runtime > 20 μs) and is not eliminated by the compiler (use blackHole function if necessary).
⛔️⏱ Diffing.Similar execution took 0 μs.
Ensure the workload of Diffing.Similar has a properly measurable size (runtime > 20 μs) and is not eliminated by the compiler (use blackHole function if necessary).
⛔️⏱ Diffing.PangramToAlphabet execution took 0 μs.
Ensure the workload of Diffing.PangramToAlphabet has a properly measurable size (runtime > 20 μs) and is not eliminated by the compiler (use blackHole function if necessary).
⛔️⏱ Diffing.Same execution took 0 μs.
Ensure the workload of Diffing.Same has a properly measurable size (runtime > 20 μs) and is not eliminated by the compiler (use blackHole function if necessary).
⛔️⏱ Diffing.Myers.Similar execution took 0 μs.
Ensure the workload of Diffing.Myers.Similar has a properly measurable size (runtime > 20 μs) and is not eliminated by the compiler (use blackHole function if necessary).
⛔️⏱ Diffing.ReversedLorem execution took 0 μs.
Ensure the workload of Diffing.ReversedLorem has a properly measurable size (runtime > 20 μs) and is not eliminated by the compiler (use blackHole function if necessary).
⛔️⏱ Diffing.ReversedAlphabets execution took 0 μs.
Ensure the workload of Diffing.ReversedAlphabets has a properly measurable size (runtime > 20 μs) and is not eliminated by the compiler (use blackHole function if necessary).
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@numist numist merged commit 8a849b3 into swiftlang:master Jul 17, 2019
@numist
Copy link
Member Author

numist commented Jul 17, 2019

How did @swift-ci come up with zero run time for the diffing benchmarks after the local benchmark check report indicated everything was fine?

@numist
Copy link
Member Author

numist commented Jul 17, 2019

Ohhh, I bet it was not actually appropriate to use blackHole around the work function. @palimondo?

@lorentey
Copy link
Member

At worst, blackHole should slow things down, not speed things up. The use of blackHole in the run_ functions seems entirely appropriate to me.

(It looks somewhat out of place in the setup closures, but it won't cause harm there, either.)

@lorentey
Copy link
Member

What's likely happening is that the optimizer is clever enough to recognize that the same input is repeatedly passed to the same pure function, and it moves the function call outside the loop. If so, the identity helper should prevent that:

blackHole(diff(identity(arg1), identity(arg2))

@lorentey
Copy link
Member

This highlights an interesting point -- since the Myers algorithm is implemented internal to the benchmark, it has an unfair advantage over the stdlib: it can be specialized to the concrete element type, which can considerably speed things up.

@palimondo
Copy link
Contributor

I’m pretty sure the zero runtimes are caused by CI machines running on stable macOS 10.14 and not the 10.15 Beta, so it is the expected result of the availability checks we’ve been discussing in #25808.

@numist
Copy link
Member Author

numist commented Jul 18, 2019

Oh I see, the blackHole function prevents code elimination by the compiler.

This highlights an interesting point -- since the Myers algorithm is implemented internal to the benchmark, it has an unfair advantage over the stdlib: it can be specialized to the concrete element type, which can considerably speed things up.

I think this is ok since the Myers benchmark measures Swift vs the Diffing benchmark, which measures Collection.difference()

I’m pretty sure the zero runtimes are caused by CI machines running on stable macOS 10.14 and not the 10.15 Beta, so it is the expected result of the availability checks we’ve been discussing in #25808.

That makes sense, especially since that I got no such warnings from Benchmark_Driver check on a 10.15 system. We could put the BenchmarkInfo instances inside an availability check to omit them from the results, or just revisit this once 10.15 has GMed if the zeroes aren't hurting anything.

@palimondo
Copy link
Contributor

The zeros are not hurting anything directly, it’s just that these benchmarks also provide no value here until the machines upgraded to 10.15. So I still question the way we do the availability checks in these benchmarks.

What @lorentey explained in #25808 makes sense for app developers using the new diffing API, but I’m not convinced it applies equally to these benchmarks. I think the Swift Benchmark Suite’s use and usefulness outside of guarding compiler and stdlib development is rather theoretical and should not be used as argument to hamper this primary use case.

Can we please re-examine this issue? CC @eeckstein @jrose-apple

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants