You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
How to read the data
The tables contain differences in performance which are larger than 8% and
differences in code size which are larger than 1%.
If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.
Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).
Hardware Overview
Model Name: Mac Pro
Model Identifier: MacPro6,1
Processor Name: 8-Core Intel Xeon E5
Processor Speed: 3 GHz
Number of Processors: 1
Total Number of Cores: 8
L2 Cache (per Core): 256 KB
L3 Cache: 25 MB
Memory: 64 GB
In assembly we could do better by using off-the-ends vector reads and masking, but that's not allowed in the Swift memory model.
On some uArches, we could do better by using unaligned leading and trailing loads, plus overlapping aligned loads while staying within the Swift model, but this is still a reasonable start.
On Apple's arm64 cores, this is probably not a win when the distribution of strings skews small (we have more integer ILP compared to x86 but it takes longer to bring SIMD comparison results back to the GPRs), so we may want this to be x86-only or have a tunable crossover.
Still, a perfectly reasonable start.
However, if you feel like doing more, under very reasonable assumptions about string length distribution, it is probably not advantageous to bother aligning accesses on any recent uArch, allowing you to eliminate the initial scalar loop, and reducing the amount of time spent in scalar code. It would be worth benchmarking that approach.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SIMD-izes internal func in StringCreate.swift