New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[benchmark] Add insert(_:Character) benchmark with ASCII and non-ASCII characters #19941
[benchmark] Add insert(_:Character) benchmark with ASCII and non-ASCII characters #19941
Conversation
@swift-ci please benchmark |
Hi @BalestraPatrick! Welcome to the Swift project! I've kicked off a benchmark to see how we do, and I've tagged @moiseev and @stephentyrone as reviewers. Thanks for your contribution! |
Build comment file:Performance: -O
Performance: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR. Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein). Hardware Overview
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
@swift-ci please test and merge |
Thanks so much for contributing, @BalestraPatrick! |
@swift-ci please test macOS platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really glad to see this! I have a couple concerns about hardening this to avoid unintended optimizations, and concerts about CheckResults
dominating run time. Did you check a profile, and if so, did you see String.count
taking up significant time vs the actual insert?
} | ||
|
||
@inline(never) | ||
func run_InsertCharacterEndIndex(_ N: Int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eeckstein, ideally we could make this scenario into a SILOptimizer test case demonstrating equivalent code to append, rather than a benchmark. But I don't now how to do that, and we may want resilience barriers in the way (though we can always add an inlinable fast-path). Any ideas?
let character: Character = "a" | ||
let scale = 3000 | ||
for _ in 0..<N * scale { | ||
workload.insert(character, at: workload.endIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say: workload.insert(identity(character), at: identity(workload.endIndex))
. That would make this a little more realistic as then the optimizer cannot know that every iteration of the loop is inserting the exact same character into the exact same (relative) position. Maybe even add a getCharacter
to TestUtils.swift if the cross-module generic adds any real cost.
Similarly for the other benchmarks, and for the var workload = str
above. What do you think, @aschwaighofer and @eeckstein ?
workload.insert(character, at: workload.endIndex) | ||
} | ||
blackHole(workload) | ||
CheckResults(workload.count == str.count + N * scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried that this check could dominate the benchmark, as count
is linear on the very String we're building up, and might be a more sophisticated operation than even just append. This may just not be a check we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have more details on how to use the profiler to gain more insights on how much CheckResults
takes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The easiest way, if you have this extracted into a stand-alone file, you can run something like:
swiftc -O my_benchmark.swift && instruments -t "Time Profiler" ./my_benchmark
. Then check where time is being spent for each of these benchmark functions. You'll have to read a little into things since the results are post-inlining, but basically it's a sanity check that the benchmark is spending time in the thing you're trying to benchmark, i.e. insert
.
If you don't have a standalone file, you can still instrument the Benchmark_O
binary, passing the specific benchmark you want as an argument on the command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The count
operation takes ~2-3% of the time in the benchmark after profiling it a couple of times. I'm removing all the CheckResults
to be sure most of the time is spent in inserting rather than asserting the actual operation.
var index = workload.endIndex | ||
let scale = 1000 | ||
for i in 0..<N * scale { | ||
workload.insert(character, at: index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can extract each of these 3 loop bodies into an @inline(__always)
helper function, then ASCII vs nonASCII is just passing in a different Character
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, can add in the next commit!
@milseman Hey Michael, I pushed a new commit with some changes that you suggested 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good!
Nitpick: I think your last commit introduces tabs instead of spaces. Please replace them with spaces and then we're ready to merge!
Do you have a Twitter handle? I assume you're the same @BalestraPatrick on Twitter too. I'd to give you a shout-out if you're comfortable with that.
@milseman Not sure why my VS code didn't respect the preference anymore, I used Xcode now and set it to 2 spaces like all other benchmark files. Totally happy with the shout-out, I'll also write a small blog post this week to get and try more people to contribute to the benchmark suite. I'll collect all the useful commands and information you've been sharing to help other people get started quickly 💯 |
@milseman Anything left to do before merging this one? 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@swift-ci please smoke test |
I've been wanting to add a section with some of the "pro tips" to https://github.com/apple/swift/blob/master/docs/StandardLibraryProgrammersManual.md. I look forward to your blog post, and maybe we can incorporate some if it into the manual! |
@milseman From the benchmark results above, it appears that |
CC @eeckstein. This is indeed surprising! |
No idea. It's worth looking at this in detail, e.g. by profiling. |
This PR adds some new benchmarks for
insert(_:Character)
. This is a small part of an effort described in SR-8905 to expand the benchmarking suite.I've tried my best to keep every benchmark ~1000μs as suggested.
Here you can find a short explanation of every benchmark for context:
InsertCharacterEndIndex
: adds a character to the end of the String. This is a stretch benchmark since it's covered byArrayAppend
but I think it could still be useful to make sure they keep the same performance.InsertCharacterTowardsEndIndex
: as suggested by @milseman, this benchmark inserts a character towards the end of the collection. The pointer of insertion is moved back to the end every 1000 insertions to avoid a quadratic complexity of the operation.InsertCharacterStartIndex
: as discussed in the comments of the JIRA ticket, this is a possible way of benchmarking the insertion at the start. We do the same operation multiple times by resetting the string to the initial value. In this way, only the inner loop has quadratic complexity to shift all the characters when inserting at the start. The outer loop is then used by the driver to scale linearly the test.InsertCharacterEndIndexNonASCII
: similar toInsertCharacterEndIndex
but inserting an emoji comprised of 4 unicode scalars.InsertCharacterTowardsEndIndexNonASCII
: similar toInsertCharacterTowardsEndIndex
but inserting an emoji comprised of 4 unicode scalars.InsertCharacterStartIndexNonASCII
: similar toInsertCharacterStartIndex
but inserting an emoji comprised of 4 unicode scalars.This is my first PR to the project, so let me know if there's anything I've missed. Any kind of feedback is appreciated 👍
Resolves SR-8908.