Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[string] SIMD-izes internal func in StringCreate.swift #30300

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

valeriyvan
Copy link
Contributor

SIMD-izes internal func in StringCreate.swift

@theblixguy theblixguy requested a review from milseman March 9, 2020 19:41
@valeriyvan valeriyvan changed the title SIMD-izes internal func in StringCreate.swift [string] SIMD-izes internal func in StringCreate.swift Mar 9, 2020
@milseman
Copy link
Contributor

milseman commented Mar 9, 2020

@swift-ci please benchmark

@swift-ci
Copy link
Collaborator

swift-ci commented Mar 9, 2020

Performance: -O

Regression OLD NEW DELTA RATIO
MapReduceClass2 33 38 +15.2% 0.87x
StringComparison_nonBMPSlowestPrenormal 1270 1410 +11.0% 0.90x
RemoveWhereSwapInts 53 58 +9.4% 0.91x (?)
ArrayPlusEqualFiveElementCollection 6919 7511 +8.6% 0.92x (?)
Array2D 6224 6752 +8.5% 0.92x
StringComparison_emoji 684 740 +8.2% 0.92x (?)
MapReduceAnyCollection 330 356 +7.9% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 227 172 -24.2% 1.32x
UTF8Decode_InitFromData 219 171 -21.9% 1.28x
UTF8Decode_InitFromBytes 230 183 -20.4% 1.26x
LazilyFilteredArrayContains 32100 26000 -19.0% 1.23x
ArrayAppendLazyMap 6240 5420 -13.1% 1.15x
NSStringConversion.MutableCopy.Rebridge.Long 907 797 -12.1% 1.14x (?)
UTF8Decode_InitFromData_ascii 239 212 -11.3% 1.13x (?)
NSStringConversion.MutableCopy.UTF8 772 712 -7.8% 1.08x (?)
NSStringConversion.MutableCopy.Rebridge.UTF8 670 620 -7.5% 1.08x (?)
Set.isDisjoint.Int.Empty 82 76 -7.3% 1.08x (?)
Set.isStrictSubset.Seq.Int.Empty 180 167 -7.2% 1.08x (?)
Set.isSubset.Seq.Int.Empty 184 171 -7.1% 1.08x (?)
Set.isSuperset.Seq.Empty.Int 74 69 -6.8% 1.07x (?)
Set.isSubset.Int.Empty 75 70 -6.7% 1.07x (?)
Set.isStrictSubset.Int.Empty 75 70 -6.7% 1.07x (?)
Set.isDisjoint.Seq.Int.Empty 76 71 -6.6% 1.07x (?)
Set.isDisjoint.Seq.Box.Empty 137 128 -6.6% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
ReduceInto.o 10932 14012 +28.2% 0.78x
MapReduce.o 27943 29151 +4.3% 0.96x
DiffingMyers.o 6499 6635 +2.1% 0.98x
RandomTree.o 12391 12591 +1.6% 0.98x
StringWalk.o 40081 40673 +1.5% 0.99x
RC4.o 3927 3983 +1.4% 0.99x
LazyFilter.o 8063 8177 +1.4% 0.99x

Performance: -Osize

Regression OLD NEW DELTA RATIO
DropWhileAnySeqCntRange 146 177 +21.2% 0.82x
MapReduceAnyCollection 362 413 +14.1% 0.88x
StringComparison_emoji 684 760 +11.1% 0.90x
DropFirstAnyCollection 147 163 +10.9% 0.90x (?)
PrefixWhileAnyCollectionLazy 143 158 +10.5% 0.91x (?)
DropWhileAnyCollection 163 179 +9.8% 0.91x (?)
DictionaryBridgeToObjC_Access 752 824 +9.6% 0.91x (?)
SuffixAnyCollection 53 58 +9.4% 0.91x
ArrayPlusEqualFiveElementCollection 6919 7511 +8.6% 0.92x (?)
Array2D 6224 6752 +8.5% 0.92x (?)
StringComparison_nonBMPSlowestPrenormal 1300 1410 +8.5% 0.92x (?)
DropFirstAnySeqCRangeIterLazy 191 207 +8.4% 0.92x
MapReduceSequence 565 611 +8.1% 0.92x (?)
DropLastAnySeqCntRange 618 668 +8.1% 0.93x (?)
PrefixAnySequenceLazy 1416 1530 +8.1% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
MapReduceLazyCollectionShort 76 35 -53.9% 2.17x
UTF8Decode_InitDecoding 227 171 -24.7% 1.33x
UTF8Decode_InitFromData 217 168 -22.6% 1.29x
UTF8Decode_InitFromBytes 229 181 -21.0% 1.27x
ArrayAppendLazyMap 7070 6040 -14.6% 1.17x
Set.isSubset.Int.Empty 82 71 -13.4% 1.15x
Set.isDisjoint.Int.Empty 88 78 -11.4% 1.13x
CharacterLiteralsLarge 100 89 -11.0% 1.12x
PrefixAnySeqCRangeIterLazy 158 142 -10.1% 1.11x (?)
PrefixAnySeqCntRangeLazy 158 142 -10.1% 1.11x
PrefixAnyCollection 163 147 -9.8% 1.11x
UTF8Decode_InitFromData_ascii 239 217 -9.2% 1.10x (?)
Set.isSubset.Seq.Int.Empty 188 171 -9.0% 1.10x (?)
Set.isDisjoint.Seq.Box.Empty 137 125 -8.8% 1.10x (?)
DataAccessBytesMedium 92 84 -8.7% 1.10x
Set.isStrictSuperset.Seq.Empty.Int 261 239 -8.4% 1.09x (?)
UTF8Decode_InitDecoding_ascii 251 231 -8.0% 1.09x (?)
Set.subtracting.Seq.Empty.Int 176 162 -8.0% 1.09x (?)
UTF8Decode_InitDecoding_ascii_as_ascii 260 241 -7.3% 1.08x (?)
DropFirstAnySeqCntRangeLazy 206 191 -7.3% 1.08x
ObjectiveCBridgeStubNSDateRefAccess 359 333 -7.2% 1.08x (?)
UTF8Decode_InitFromBytes_ascii 294 273 -7.1% 1.08x (?)
Set.subtracting.Empty.Box 14 13 -7.1% 1.08x (?)
Set.isStrictSubset.Seq.Int.Empty 187 174 -7.0% 1.07x (?)
CharacterLiteralsSmall 310 289 -6.8% 1.07x (?)
NSStringConversion.MutableCopy.Rebridge.Long 874 816 -6.6% 1.07x (?)
Set.subtracting.Seq.Empty.Box 183 171 -6.6% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
Array2D.o 2724 2804 +2.9% 0.97x
ChainedFilterMap.o 3405 3477 +2.1% 0.98x
RC4.o 3429 3501 +2.1% 0.98x
LazyFilter.o 7583 7687 +1.4% 0.99x
DiffingMyers.o 6659 6747 +1.3% 0.99x
 
Improvement OLD NEW DELTA RATIO
ReversedCollections.o 8777 8633 -1.6% 1.02x
SortArrayInClass.o 2958 2911 -1.6% 1.02x
Diffing.o 8611 8515 -1.1% 1.01x
RandomShuffle.o 3878 3838 -1.0% 1.01x

Performance: -Onone

Regression OLD NEW DELTA RATIO
ReversedBidirectional 64617 73360 +13.5% 0.88x (?)
 
Improvement OLD NEW DELTA RATIO
FloatingPointPrinting_Float_description_uniform 20800 16200 -22.1% 1.28x (?)
UTF8Decode_InitFromData 218 171 -21.6% 1.27x
UTF8Decode_InitFromBytes 232 182 -21.6% 1.27x
UTF8Decode_InitDecoding 265 216 -18.5% 1.23x (?)
NSStringConversion.MutableCopy.Rebridge.UTF8 745 653 -12.3% 1.14x (?)
StringMatch 45900 40800 -11.1% 1.12x (?)
ArrayOfPOD 986 912 -7.5% 1.08x (?)
NSStringConversion.MutableCopy.Rebridge.Long 939 870 -7.3% 1.08x (?)
UTF8Decode_InitFromBytes_ascii 312 290 -7.1% 1.08x (?)
CStringLongAscii 249 232 -6.8% 1.07x (?)
UTF8Decode_InitFromData_ascii 251 234 -6.8% 1.07x (?)
UTF8Decode_InitDecoding_ascii_as_ascii 286 267 -6.6% 1.07x (?)

Code size: -swiftlibs

Regression OLD NEW DELTA RATIO
libswiftCoreAudio.dylib 20480 24576 +20.0% 0.83x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

@milseman
Copy link
Contributor

@stephentyrone, thoughts on the implementation? Is there a better or more general approach to the alignment issues?

@stephentyrone
Copy link
Member

stephentyrone commented Mar 11, 2020

  1. In assembly we could do better by using off-the-ends vector reads and masking, but that's not allowed in the Swift memory model.
  2. On some uArches, we could do better by using unaligned leading and trailing loads, plus overlapping aligned loads while staying within the Swift model, but this is still a reasonable start.
  3. On Apple's arm64 cores, this is probably not a win when the distribution of strings skews small (we have more integer ILP compared to x86 but it takes longer to bring SIMD comparison results back to the GPRs), so we may want this to be x86-only or have a tunable crossover.

Still, a perfectly reasonable start.

However, if you feel like doing more, under very reasonable assumptions about string length distribution, it is probably not advantageous to bother aligning accesses on any recent uArch, allowing you to eliminate the initial scalar loop, and reducing the amount of time spent in scalar code. It would be worth benchmarking that approach.

@valeriyvan valeriyvan marked this pull request as draft April 15, 2020 10:31
@shahmishal
Copy link
Member

Please update the base branch to main by Oct 5th otherwise the pull request will be closed automatically.

  • How to change the base branch: (Link)
  • More detail about the branch update: (Link)

@valeriyvan valeriyvan changed the base branch from master to main October 1, 2020 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants