Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] fast path for UMBP.initialize<C>(from: C) when C is a Slice #38677

Merged
merged 5 commits into from Aug 3, 2021

Conversation

glessard
Copy link
Contributor

@glessard glessard commented Jul 28, 2021

UnsafeMutableBufferPointer.initialize(from:) initializes a buffer from a Sequence. Many standard library collections have a fast path that implements the multi-element copy in a single call, but that fast path is currently stymied when the collection is wrapped in a Slice. This PR implements Slice._copyContents, which attempts a fast copy via _withContiguousStorageIfAvailable before reverting to an element-by-element copy.

Resolves SR-14491 (rdar://76728166)

@glessard glessard requested a review from lorentey July 28, 2021 21:09
`Sequence._copyContents(initializing:)` is the function relied on by
`UnsafeMutableBufferPointer` for performant initialization from Collections.
Until now, `Slice<Base: Collection>` has not had its own implementation,
and therefore fell back to the default version implemented on `Sequence`.

This implementation adds an attempted fast path, using
`withContiguousStorageIfAvailable`. If that fails, the `Sequence`
algorithm is used, as before.

This resolves https://bugs.swift.org/browse/SR-14491
@glessard
Copy link
Contributor Author

@swift-ci please benchmark

@swift-ci
Copy link
Collaborator

Performance (x86_64): -O

Regression OLD NEW DELTA RATIO
DictionaryOfAnyHashableStrings_insert 2800 5096 +82.0% 0.55x
Set.isDisjoint.Box25 323 454 +40.6% 0.71x (?)
Breadcrumbs.MutatedUTF16ToIdx.ASCII 3 4 +33.3% 0.75x (?)
ObjectiveCBridgeStubFromNSDateRef 3750 4540 +21.1% 0.83x (?)
DictionaryKeysContainsNative 19 23 +21.1% 0.83x (?)
String.data.Medium 88 104 +18.2% 0.85x (?)
ObjectiveCBridgeStubFromNSDate 5550 6330 +14.1% 0.88x (?)
String.data.LargeUnicode 94 107 +13.8% 0.88x (?)
 
Improvement OLD NEW DELTA RATIO
BufferFillFromSlice 77 41 -46.8% 1.88x
FlattenListLoop 2200 1453 -34.0% 1.51x (?)
Data.init.Sequence.64kB.Count.I 52 35 -32.7% 1.49x
Data.append.Sequence.64kB.Count.I 52 35 -32.7% 1.49x
Data.append.Sequence.64kB.Count 52 35 -32.7% 1.49x
Data.init.Sequence.64kB.Count 52 35 -32.7% 1.49x
Data.init.Sequence.2047B.Count.I 89 63 -29.2% 1.41x
Data.init.Sequence.2049B.Count.I 89 63 -29.2% 1.41x
Data.init.Sequence.809B.Count 81 61 -24.7% 1.33x
Data.init.Sequence.809B.Count.I 81 61 -24.7% 1.33x
FlattenListFlatMap 5673 4411 -22.2% 1.29x (?)
Data.init.Sequence.511B.Count.I 93 74 -20.4% 1.26x (?)
Data.append.Sequence.809B.Count.I 98 78 -20.4% 1.26x (?)
Data.append.Sequence.809B.Count 97 78 -19.6% 1.24x (?)
Data.init.Sequence.513B.Count.I 93 78 -16.1% 1.19x (?)
RemoveWhereMoveInts 14 12 -14.3% 1.17x
LessSubstringSubstring 38 35 -7.9% 1.09x
EqualSubstringSubstring 38 35 -7.9% 1.09x (?)
EqualStringSubstring 38 35 -7.9% 1.09x
EqualSubstringSubstringGenericEquatable 38 35 -7.9% 1.09x (?)
EqualSubstringString 38 35 -7.9% 1.09x
LessSubstringSubstringGenericComparable 38 35 -7.9% 1.09x
Set.subtracting.Empty.Box 14 13 -7.1% 1.08x (?)
SortStringsUnicode 2805 2610 -7.0% 1.07x
Array2D 6480 6048 -6.7% 1.07x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
RemoveWhere.o 16155 16571 +2.6% 0.97x
Queue.o 12119 12263 +1.2% 0.99x
 
Improvement OLD NEW DELTA RATIO
BufferFill.o 7940 7764 -2.2% 1.02x

Performance (x86_64): -Osize

Regression OLD NEW DELTA RATIO
Breadcrumbs.MutatedUTF16ToIdx.ASCII 3 4 +33.3% 0.75x (?)
ObjectiveCBridgeStubFromNSDate 5660 6400 +13.1% 0.88x (?)
ObjectiveCBridgeFromNSArrayAnyObjectForced 4160 4480 +7.7% 0.93x (?)
MapReduceShortString 13 14 +7.7% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
BufferFillFromSlice 77 41 -46.8% 1.88x
FlattenListFlatMap 5250 3677 -30.0% 1.43x (?)
Hanoi 3910 3550 -9.2% 1.10x
LessSubstringSubstring 38 35 -7.9% 1.09x
EqualSubstringSubstring 38 35 -7.9% 1.09x
EqualStringSubstring 38 35 -7.9% 1.09x (?)
EqualSubstringString 38 35 -7.9% 1.09x
LessSubstringSubstringGenericComparable 38 35 -7.9% 1.09x
ParseFloat.Double.Exp 13 12 -7.7% 1.08x (?)
Array2D 6736 6224 -7.6% 1.08x (?)
SortStringsUnicode 2810 2615 -6.9% 1.07x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
BufferFill.o 7135 8118 +13.8% 0.88x
UTF8Decode.o 21531 21770 +1.1% 0.99x

Performance (x86_64): -Onone

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeFromNSSetAnyObjectToStringForced 80000 87000 +8.7% 0.92x (?)
StringToDataMedium 5800 6250 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
BufferFillFromSlice 51061 42 -99.9% 1215.71x
DictionaryBridgeToObjC_Access 1066 915 -14.2% 1.17x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

@glessard
Copy link
Contributor Author

@swift-ci Apple Silicon benchmark

@glessard
Copy link
Contributor Author

@swift-ci please test

@swift-ci
Copy link
Collaborator

Performance (arm64): -O

Regression OLD NEW DELTA RATIO
DictionaryOfAnyHashableStrings_insert 1330 2254 +69.5% 0.59x (?)
Set.isDisjoint.Box25 114 193 +69.3% 0.59x (?)
Set.isDisjoint.Int25 86 130 +51.2% 0.66x (?)
Set.isDisjoint.Int50 86 127 +47.7% 0.68x (?)
StringRemoveDupes 114 142 +24.6% 0.80x (?)
CharIteration_punctuatedJapanese_unicodeScalars 240 280 +16.7% 0.86x (?)
SetIsSubsetBox25 84 95 +13.1% 0.88x (?)
ArrayAppend 390 440 +12.8% 0.89x (?)
ConvertFloatingPoint.MockFloat64ToDouble 9 10 +11.1% 0.90x (?)
DictionaryRemove 1790 1930 +7.8% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
BufferFillFromSlice 46 11 -76.1% 4.18x (?)
LessSubstringSubstring 21 18 -14.3% 1.17x (?)
EqualSubstringSubstring 21 18 -14.3% 1.17x (?)
EqualStringSubstring 21 18 -14.3% 1.17x (?)
EqualSubstringSubstringGenericEquatable 21 18 -14.3% 1.17x (?)
EqualSubstringString 21 18 -14.3% 1.17x (?)
LessSubstringSubstringGenericComparable 21 18 -14.3% 1.17x (?)
Set.filter.Int100.24k 855 784 -8.3% 1.09x (?)
Dict.CopyKeyValue.24k 863 795 -7.9% 1.09x (?)
Set.filter.Int100.20k 731 678 -7.3% 1.08x (?)

Code size: -O

Regression OLD NEW DELTA RATIO
RemoveWhere.o 13988 14348 +2.6% 0.97x
Queue.o 11757 11877 +1.0% 0.99x
 
Improvement OLD NEW DELTA RATIO
BufferFill.o 7314 7142 -2.4% 1.02x

Performance (arm64): -Osize

Improvement OLD NEW DELTA RATIO
BufferFillFromSlice 46 11 -76.1% 4.18x
ArrayAppendOptionals 930 410 -55.9% 2.27x (?)
EqualStringSubstring 21 18 -14.3% 1.17x (?)
EqualSubstringString 21 18 -14.3% 1.17x (?)
LessSubstringSubstring 22 19 -13.6% 1.16x (?)
EqualSubstringSubstring 22 19 -13.6% 1.16x (?)
EqualSubstringSubstringGenericEquatable 22 19 -13.6% 1.16x (?)
LessSubstringSubstringGenericComparable 22 19 -13.6% 1.16x (?)

Code size: -Osize

Regression OLD NEW DELTA RATIO
BufferFill.o 7646 8818 +15.3% 0.87x

Performance (arm64): -Onone

Regression OLD NEW DELTA RATIO
ParseFloat.Float.Exp 7 8 +14.3% 0.88x (?)
ArrayAppendOptionals 380 420 +10.5% 0.90x (?)
RecursiveOwnedParameter 3097 3421 +10.5% 0.91x (?)
DistinctClassFieldAccesses 580 624 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
BufferFillFromSlice 20494 11 -99.9% 1862.92x
IterateData 1613 1417 -12.2% 1.14x
DataAppendDataLargeToLarge 13800 12600 -8.7% 1.10x (?)
LessSubstringSubstring 39 36 -7.7% 1.08x (?)
EqualSubstringSubstring 39 36 -7.7% 1.08x (?)
EqualStringSubstring 39 36 -7.7% 1.08x (?)
EqualSubstringSubstringGenericEquatable 39 36 -7.7% 1.08x (?)
EqualSubstringString 39 36 -7.7% 1.08x (?)
LessSubstringSubstringGenericComparable 39 36 -7.7% 1.08x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini9,1
  Total Number of Cores: 8 (4 performance and 4 efficiency)
  Memory: 16 GB

@lorentey
Copy link
Member

Nice! Cc @karwa -- he has been interested in getting these fixed.

stdlib/public/core/Sequence.swift Outdated Show resolved Hide resolved
stdlib/public/core/Slice.swift Outdated Show resolved Hide resolved
stdlib/public/core/Slice.swift Outdated Show resolved Hide resolved
@swift-ci
Copy link
Collaborator

Build failed
Swift Test OS X Platform
Git Sha - 2bc053f

@glessard
Copy link
Contributor Author

glessard commented Aug 2, 2021

@swift-ci please smoke test

@glessard
Copy link
Contributor Author

glessard commented Aug 2, 2021

@swift-ci please test macOS platform

@glessard glessard merged commit 92335b1 into apple:main Aug 3, 2021
@glessard glessard deleted the sr14491-v1 branch August 3, 2021 17:28
@karwa
Copy link
Contributor

karwa commented Aug 9, 2021

Thanks a lot for taking this on, @glessard!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants