Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[benchmark] Add insert(_:Character) benchmark with ASCII and non-ASCII characters #19941

Merged
merged 3 commits into from Oct 25, 2018

Conversation

BalestraPatrick
Copy link
Contributor

This PR adds some new benchmarks for insert(_:Character). This is a small part of an effort described in SR-8905 to expand the benchmarking suite.
I've tried my best to keep every benchmark ~1000μs as suggested.

Here you can find a short explanation of every benchmark for context:

  • InsertCharacterEndIndex: adds a character to the end of the String. This is a stretch benchmark since it's covered by ArrayAppend but I think it could still be useful to make sure they keep the same performance.
  • InsertCharacterTowardsEndIndex: as suggested by @milseman, this benchmark inserts a character towards the end of the collection. The pointer of insertion is moved back to the end every 1000 insertions to avoid a quadratic complexity of the operation.
  • InsertCharacterStartIndex: as discussed in the comments of the JIRA ticket, this is a possible way of benchmarking the insertion at the start. We do the same operation multiple times by resetting the string to the initial value. In this way, only the inner loop has quadratic complexity to shift all the characters when inserting at the start. The outer loop is then used by the driver to scale linearly the test.
  • InsertCharacterEndIndexNonASCII: similar to InsertCharacterEndIndex but inserting an emoji comprised of 4 unicode scalars.
  • InsertCharacterTowardsEndIndexNonASCII: similar to InsertCharacterTowardsEndIndex but inserting an emoji comprised of 4 unicode scalars.
  • InsertCharacterStartIndexNonASCII: similar to InsertCharacterStartIndex but inserting an emoji comprised of 4 unicode scalars.

This is my first PR to the project, so let me know if there's anything I've missed. Any kind of feedback is appreciated 👍

Resolves SR-8908.

@harlanhaskins
Copy link
Collaborator

@swift-ci please benchmark

@harlanhaskins
Copy link
Collaborator

Hi @BalestraPatrick! Welcome to the Swift project! I've kicked off a benchmark to see how we do, and I've tagged @moiseev and @stephentyrone as reviewers. Thanks for your contribution!

@swift-ci
Copy link
Collaborator

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
Array2D 7510 6914 -7.9% 1.09x
MapReduceAnyCollection 398 370 -7.0% 1.08x
MapReduce 426 398 -6.6% 1.07x
Added
InsertCharacterEndIndex 826 829 828
InsertCharacterEndIndexNonASCII 794 802 798
InsertCharacterStartIndex 1484 1570 1513
InsertCharacterStartIndexNonASCII 1054 1055 1055
InsertCharacterTowardsEndIndex 896 918 904
InsertCharacterTowardsEndIndexNonASCII 818 819 818

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
Array2D 7208 6612 -8.3% 1.09x
Added
InsertCharacterEndIndex 830 831 831
InsertCharacterEndIndexNonASCII 813 834 820
InsertCharacterStartIndex 1507 1572 1529
InsertCharacterStartIndexNonASCII 1053 1058 1055
InsertCharacterTowardsEndIndex 920 921 921
InsertCharacterTowardsEndIndexNonASCII 852 863 856

Performance: -Onone

TEST MIN MAX MEAN MAX_RSS
Added
InsertCharacterEndIndex 324 325 324
InsertCharacterEndIndexNonASCII 501 503 502
InsertCharacterStartIndex 828 830 829
InsertCharacterStartIndexNonASCII 630 630 630
InsertCharacterTowardsEndIndex 350 351 350
InsertCharacterTowardsEndIndexNonASCII 510 511 510
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

Copy link
Member

@eeckstein eeckstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@harlanhaskins
Copy link
Collaborator

@swift-ci please test and merge

@harlanhaskins
Copy link
Collaborator

Thanks so much for contributing, @BalestraPatrick!

@harlanhaskins
Copy link
Collaborator

@swift-ci please test macOS platform

Copy link
Contributor

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really glad to see this! I have a couple concerns about hardening this to avoid unintended optimizations, and concerts about CheckResults dominating run time. Did you check a profile, and if so, did you see String.count taking up significant time vs the actual insert?

}

@inline(never)
func run_InsertCharacterEndIndex(_ N: Int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eeckstein, ideally we could make this scenario into a SILOptimizer test case demonstrating equivalent code to append, rather than a benchmark. But I don't now how to do that, and we may want resilience barriers in the way (though we can always add an inlinable fast-path). Any ideas?

let character: Character = "a"
let scale = 3000
for _ in 0..<N * scale {
workload.insert(character, at: workload.endIndex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say: workload.insert(identity(character), at: identity(workload.endIndex)). That would make this a little more realistic as then the optimizer cannot know that every iteration of the loop is inserting the exact same character into the exact same (relative) position. Maybe even add a getCharacter to TestUtils.swift if the cross-module generic adds any real cost.

Similarly for the other benchmarks, and for the var workload = str above. What do you think, @aschwaighofer and @eeckstein ?

workload.insert(character, at: workload.endIndex)
}
blackHole(workload)
CheckResults(workload.count == str.count + N * scale)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried that this check could dominate the benchmark, as count is linear on the very String we're building up, and might be a more sophisticated operation than even just append. This may just not be a check we can do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have more details on how to use the profiler to gain more insights on how much CheckResults takes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easiest way, if you have this extracted into a stand-alone file, you can run something like:

swiftc -O my_benchmark.swift && instruments -t "Time Profiler" ./my_benchmark. Then check where time is being spent for each of these benchmark functions. You'll have to read a little into things since the results are post-inlining, but basically it's a sanity check that the benchmark is spending time in the thing you're trying to benchmark, i.e. insert.

If you don't have a standalone file, you can still instrument the Benchmark_O binary, passing the specific benchmark you want as an argument on the command line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count operation takes ~2-3% of the time in the benchmark after profiling it a couple of times. I'm removing all the CheckResults to be sure most of the time is spent in inserting rather than asserting the actual operation.

var index = workload.endIndex
let scale = 1000
for i in 0..<N * scale {
workload.insert(character, at: index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can extract each of these 3 loop bodies into an @inline(__always) helper function, then ASCII vs nonASCII is just passing in a different Character.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, can add in the next commit!

@BalestraPatrick
Copy link
Contributor Author

@milseman Hey Michael, I pushed a new commit with some changes that you suggested 👍

Copy link
Contributor

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good!

Nitpick: I think your last commit introduces tabs instead of spaces. Please replace them with spaces and then we're ready to merge!

Do you have a Twitter handle? I assume you're the same @BalestraPatrick on Twitter too. I'd to give you a shout-out if you're comfortable with that.

@BalestraPatrick
Copy link
Contributor Author

@milseman Not sure why my VS code didn't respect the preference anymore, I used Xcode now and set it to 2 spaces like all other benchmark files.

Totally happy with the shout-out, I'll also write a small blog post this week to get and try more people to contribute to the benchmark suite. I'll collect all the useful commands and information you've been sharing to help other people get started quickly 💯

@BalestraPatrick
Copy link
Contributor Author

@milseman Anything left to do before merging this one? 😄

Copy link
Contributor

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@milseman
Copy link
Contributor

@swift-ci please smoke test

@milseman
Copy link
Contributor

I've been wanting to add a section with some of the "pro tips" to https://github.com/apple/swift/blob/master/docs/StandardLibraryProgrammersManual.md. I look forward to your blog post, and maybe we can incorporate some if it into the manual!

@milseman milseman merged commit 1c0778b into apple:master Oct 25, 2018
@BalestraPatrick BalestraPatrick deleted the insert-character-benchmark branch October 25, 2018 14:58
@palimondo
Copy link
Collaborator

@milseman From the benchmark results above, it appears that StartIndex and TowardsEndIndex variants of this benchmark are almost twice as fast in Onone than in the optimized builds. That seems like a rather strange de-optimization. Bug?

@milseman
Copy link
Contributor

CC @eeckstein. This is indeed surprising!

@eeckstein
Copy link
Member

No idea. It's worth looking at this in detail, e.g. by profiling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants