[string] SIMD-izes internal func in StringCreate.swift #30300

valeriyvan · 2020-03-09T19:37:01Z

SIMD-izes internal func in StringCreate.swift

milseman · 2020-03-09T20:36:01Z

@swift-ci please benchmark

swift-ci · 2020-03-09T21:37:54Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
MapReduceClass2	33	38	+15.2%	0.87x
StringComparison_nonBMPSlowestPrenormal	1270	1410	+11.0%	0.90x
RemoveWhereSwapInts	53	58	+9.4%	0.91x (?)
ArrayPlusEqualFiveElementCollection	6919	7511	+8.6%	0.92x (?)
Array2D	6224	6752	+8.5%	0.92x
StringComparison_emoji	684	740	+8.2%	0.92x (?)
MapReduceAnyCollection	330	356	+7.9%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
UTF8Decode_InitDecoding	227	172	-24.2%	1.32x
UTF8Decode_InitFromData	219	171	-21.9%	1.28x
UTF8Decode_InitFromBytes	230	183	-20.4%	1.26x
LazilyFilteredArrayContains	32100	26000	-19.0%	1.23x
ArrayAppendLazyMap	6240	5420	-13.1%	1.15x
NSStringConversion.MutableCopy.Rebridge.Long	907	797	-12.1%	1.14x (?)
UTF8Decode_InitFromData_ascii	239	212	-11.3%	1.13x (?)
NSStringConversion.MutableCopy.UTF8	772	712	-7.8%	1.08x (?)
NSStringConversion.MutableCopy.Rebridge.UTF8	670	620	-7.5%	1.08x (?)
Set.isDisjoint.Int.Empty	82	76	-7.3%	1.08x (?)
Set.isStrictSubset.Seq.Int.Empty	180	167	-7.2%	1.08x (?)
Set.isSubset.Seq.Int.Empty	184	171	-7.1%	1.08x (?)
Set.isSuperset.Seq.Empty.Int	74	69	-6.8%	1.07x (?)
Set.isSubset.Int.Empty	75	70	-6.7%	1.07x (?)
Set.isStrictSubset.Int.Empty	75	70	-6.7%	1.07x (?)
Set.isDisjoint.Seq.Int.Empty	76	71	-6.6%	1.07x (?)
Set.isDisjoint.Seq.Box.Empty	137	128	-6.6%	1.07x (?)

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
ReduceInto.o	10932	14012	+28.2%	0.78x
MapReduce.o	27943	29151	+4.3%	0.96x
DiffingMyers.o	6499	6635	+2.1%	0.98x
RandomTree.o	12391	12591	+1.6%	0.98x
StringWalk.o	40081	40673	+1.5%	0.99x
RC4.o	3927	3983	+1.4%	0.99x
LazyFilter.o	8063	8177	+1.4%	0.99x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
DropWhileAnySeqCntRange	146	177	+21.2%	0.82x
MapReduceAnyCollection	362	413	+14.1%	0.88x
StringComparison_emoji	684	760	+11.1%	0.90x
DropFirstAnyCollection	147	163	+10.9%	0.90x (?)
PrefixWhileAnyCollectionLazy	143	158	+10.5%	0.91x (?)
DropWhileAnyCollection	163	179	+9.8%	0.91x (?)
DictionaryBridgeToObjC_Access	752	824	+9.6%	0.91x (?)
SuffixAnyCollection	53	58	+9.4%	0.91x
ArrayPlusEqualFiveElementCollection	6919	7511	+8.6%	0.92x (?)
Array2D	6224	6752	+8.5%	0.92x (?)
StringComparison_nonBMPSlowestPrenormal	1300	1410	+8.5%	0.92x (?)
DropFirstAnySeqCRangeIterLazy	191	207	+8.4%	0.92x
MapReduceSequence	565	611	+8.1%	0.92x (?)
DropLastAnySeqCntRange	618	668	+8.1%	0.93x (?)
PrefixAnySequenceLazy	1416	1530	+8.1%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
MapReduceLazyCollectionShort	76	35	-53.9%	2.17x
UTF8Decode_InitDecoding	227	171	-24.7%	1.33x
UTF8Decode_InitFromData	217	168	-22.6%	1.29x
UTF8Decode_InitFromBytes	229	181	-21.0%	1.27x
ArrayAppendLazyMap	7070	6040	-14.6%	1.17x
Set.isSubset.Int.Empty	82	71	-13.4%	1.15x
Set.isDisjoint.Int.Empty	88	78	-11.4%	1.13x
CharacterLiteralsLarge	100	89	-11.0%	1.12x
PrefixAnySeqCRangeIterLazy	158	142	-10.1%	1.11x (?)
PrefixAnySeqCntRangeLazy	158	142	-10.1%	1.11x
PrefixAnyCollection	163	147	-9.8%	1.11x
UTF8Decode_InitFromData_ascii	239	217	-9.2%	1.10x (?)
Set.isSubset.Seq.Int.Empty	188	171	-9.0%	1.10x (?)
Set.isDisjoint.Seq.Box.Empty	137	125	-8.8%	1.10x (?)
DataAccessBytesMedium	92	84	-8.7%	1.10x
Set.isStrictSuperset.Seq.Empty.Int	261	239	-8.4%	1.09x (?)
UTF8Decode_InitDecoding_ascii	251	231	-8.0%	1.09x (?)
Set.subtracting.Seq.Empty.Int	176	162	-8.0%	1.09x (?)
UTF8Decode_InitDecoding_ascii_as_ascii	260	241	-7.3%	1.08x (?)
DropFirstAnySeqCntRangeLazy	206	191	-7.3%	1.08x
ObjectiveCBridgeStubNSDateRefAccess	359	333	-7.2%	1.08x (?)
UTF8Decode_InitFromBytes_ascii	294	273	-7.1%	1.08x (?)
Set.subtracting.Empty.Box	14	13	-7.1%	1.08x (?)
Set.isStrictSubset.Seq.Int.Empty	187	174	-7.0%	1.07x (?)
CharacterLiteralsSmall	310	289	-6.8%	1.07x (?)
NSStringConversion.MutableCopy.Rebridge.Long	874	816	-6.6%	1.07x (?)
Set.subtracting.Seq.Empty.Box	183	171	-6.6%	1.07x (?)

Code size: -Osize

Regression	OLD	NEW	DELTA	RATIO
Array2D.o	2724	2804	+2.9%	0.97x
ChainedFilterMap.o	3405	3477	+2.1%	0.98x
RC4.o	3429	3501	+2.1%	0.98x
LazyFilter.o	7583	7687	+1.4%	0.99x
DiffingMyers.o	6659	6747	+1.3%	0.99x

Improvement	OLD	NEW	DELTA	RATIO
ReversedCollections.o	8777	8633	-1.6%	1.02x
SortArrayInClass.o	2958	2911	-1.6%	1.02x
Diffing.o	8611	8515	-1.1%	1.01x
RandomShuffle.o	3878	3838	-1.0%	1.01x

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
ReversedBidirectional	64617	73360	+13.5%	0.88x (?)

Improvement	OLD	NEW	DELTA	RATIO
FloatingPointPrinting_Float_description_uniform	20800	16200	-22.1%	1.28x (?)
UTF8Decode_InitFromData	218	171	-21.6%	1.27x
UTF8Decode_InitFromBytes	232	182	-21.6%	1.27x
UTF8Decode_InitDecoding	265	216	-18.5%	1.23x (?)
NSStringConversion.MutableCopy.Rebridge.UTF8	745	653	-12.3%	1.14x (?)
StringMatch	45900	40800	-11.1%	1.12x (?)
ArrayOfPOD	986	912	-7.5%	1.08x (?)
NSStringConversion.MutableCopy.Rebridge.Long	939	870	-7.3%	1.08x (?)
UTF8Decode_InitFromBytes_ascii	312	290	-7.1%	1.08x (?)
CStringLongAscii	249	232	-6.8%	1.07x (?)
UTF8Decode_InitFromData_ascii	251	234	-6.8%	1.07x (?)
UTF8Decode_InitDecoding_ascii_as_ascii	286	267	-6.6%	1.07x (?)

Code size: -swiftlibs

Regression	OLD	NEW	DELTA	RATIO
libswiftCoreAudio.dylib	20480	24576	+20.0%	0.83x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

milseman · 2020-03-10T20:49:22Z

@stephentyrone, thoughts on the implementation? Is there a better or more general approach to the alignment issues?

stephentyrone · 2020-03-11T01:07:03Z

In assembly we could do better by using off-the-ends vector reads and masking, but that's not allowed in the Swift memory model.
On some uArches, we could do better by using unaligned leading and trailing loads, plus overlapping aligned loads while staying within the Swift model, but this is still a reasonable start.
On Apple's arm64 cores, this is probably not a win when the distribution of strings skews small (we have more integer ILP compared to x86 but it takes longer to bring SIMD comparison results back to the GPRs), so we may want this to be x86-only or have a tunable crossover.

Still, a perfectly reasonable start.

However, if you feel like doing more, under very reasonable assumptions about string length distribution, it is probably not advantageous to bother aligning accesses on any recent uArch, allowing you to eliminate the initial scalar loop, and reducing the amount of time spent in scalar code. It would be worth benchmarking that approach.

shahmishal · 2020-10-01T06:56:34Z

Please update the base branch to main by Oct 5th otherwise the pull request will be closed automatically.

How to change the base branch: (Link)
More detail about the branch update: (Link)

valeriyvan added 2 commits March 8, 2020 13:00

SIMD-izes _allASCII internal func in StringCreate.swift

7e33f08

Reads SIMD4<UInt> by pointer

09ced1e

theblixguy requested a review from milseman March 9, 2020 19:41

Adds TODO

e228c40

valeriyvan changed the title ~~SIMD-izes internal func in StringCreate.swift~~ [string] SIMD-izes internal func in StringCreate.swift Mar 9, 2020

milseman requested a review from stephentyrone March 9, 2020 20:36

valeriyvan marked this pull request as draft April 15, 2020 10:31

valeriyvan changed the base branch from master to main October 1, 2020 08:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[string] SIMD-izes internal func in StringCreate.swift #30300

[string] SIMD-izes internal func in StringCreate.swift #30300

valeriyvan commented Mar 9, 2020

milseman commented Mar 9, 2020

swift-ci commented Mar 9, 2020

milseman commented Mar 10, 2020

stephentyrone commented Mar 11, 2020 •

edited

shahmishal commented Oct 1, 2020

[string] SIMD-izes internal func in StringCreate.swift #30300

Are you sure you want to change the base?

[string] SIMD-izes internal func in StringCreate.swift #30300

Conversation

valeriyvan commented Mar 9, 2020

milseman commented Mar 9, 2020

swift-ci commented Mar 9, 2020

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

milseman commented Mar 10, 2020

stephentyrone commented Mar 11, 2020 • edited

shahmishal commented Oct 1, 2020

stephentyrone commented Mar 11, 2020 •

edited