stdlib: Code size improvements for Dictionary for -Osize #21306

eeckstein · 2018-12-13T22:16:59Z

The first change is to remove some @inline(__always) attributes. Those were added before we had the guaranteed-by-default calling convention. They are not necessary anymore.

The second change is to not specialize some slow-path functions. This results that no specialization code for these functions is generated at the client side. Instead those functions are directly called in the libSwiftCore.
Note that Key-related hash and equality comparisons are still specialized, because otherwise the performance hit for Osize would be too big.

Some Dictionary benchmarks regress a bit with -Osize, but the code size wins are big.

rdar://problem/46534453

eeckstein · 2018-12-13T22:17:20Z

@swift-ci benchmark

swift-ci · 2018-12-13T23:08:42Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
DictionaryGroup	229	259	+13.1%	0.88x
CountAlgoString	1725	1940	+12.5%	0.89x
Improvement
SortAdjacentIntPyramids	1257	990	-21.2%	1.27x
DataSetCountSmall	154	137	-11.0%	1.12x (?)
Histogram	535	476	-11.0%	1.12x
WordCountHistogramASCII	4308	3989	-7.4%	1.08x

Code size: -O

TEST	OLD	NEW	DELTA	RATIO
Regression
DictionaryKeysContains.o	8615	10719	+24.4%	0.80x
DictionaryGroup.o	16963	18395	+8.4%	0.92x
WordCount.o	43171	44315	+2.6%	0.97x
HashQuadratic.o	5508	5644	+2.5%	0.98x
DictionaryOfAnyHashableStrings.o	11053	11229	+1.6%	0.98x
DictionaryCopy.o	7773	7885	+1.4%	0.99x
Improvement
DictTest3.o	24459	22475	-8.1%	1.09x
ReduceInto.o	18201	17145	-5.8%	1.06x
DictionarySwap.o	27750	26478	-4.6%	1.05x
DictTest4.o	25055	24034	-4.1%	1.04x
DictTest4Legacy.o	26513	25540	-3.7%	1.04x
DriverUtils.o	149581	144669	-3.3%	1.03x
StringRemoveDupes.o	7728	7520	-2.7%	1.03x
DictionaryCompactMapValues.o	19294	18950	-1.8%	1.02x
ObjectiveCBridging.o	43320	42696	-1.4%	1.01x

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
DictionaryLiteral	2856	7032	+146.2%	0.41x
DictionaryRemove	3498	5820	+66.4%	0.60x
FrequenciesUsingReduce	4018	6634	+65.1%	0.61x
DictionarySubscriptDefaultMutation	269	361	+34.2%	0.75x
DictionaryCopy	54175	72500	+33.8%	0.75x
DictionaryRemoveOfObjects	23229	29555	+27.2%	0.79x
FrequenciesUsingReduceInto	891	1104	+23.9%	0.81x
TwoSum	1167	1444	+23.7%	0.81x
DictionarySubscriptDefaultMutationArray	609	727	+19.4%	0.84x
DictionaryGroup	321	362	+12.8%	0.89x
Dictionary2OfObjects	2245	2510	+11.8%	0.89x
DictionarySwap	997	1108	+11.1%	0.90x
Dictionary2	882	970	+10.0%	0.91x
Dictionary	534	587	+9.9%	0.91x
CountAlgoString	1730	1900	+9.8%	0.91x
Dictionary3	229	251	+9.6%	0.91x
StringEnumRawValueInitialization	846	923	+9.1%	0.92x
StringRemoveDupes	339	369	+8.8%	0.92x
Improvement
DataSetCountSmall	157	140	-10.8%	1.12x

Code size: -Osize

TEST	OLD	NEW	DELTA	RATIO
Improvement
Histogram.o	4016	1920	-52.2%	2.09x
HashQuadratic.o	5164	2880	-44.2%	1.79x
TwoSum.o	5357	3109	-42.0%	1.72x
DictTest4Legacy.o	23880	14336	-40.0%	1.67x
StringRemoveDupes.o	7561	4761	-37.0%	1.59x
DictTest3.o	21233	13945	-34.3%	1.52x
DictionaryCompactMapValues.o	18126	12358	-31.8%	1.47x
DictTest2.o	14145	9833	-30.5%	1.44x
DictionaryRemove.o	15746	11028	-30.0%	1.43x
DictionarySubscriptDefault.o	27545	19601	-28.8%	1.41x
DictionaryLiteral.o	1509	1075	-28.8%	1.40x
DictTest.o	17233	12521	-27.3%	1.38x
ReduceInto.o	13403	9955	-25.7%	1.35x
DictionaryOfAnyHashableStrings.o	10757	8013	-25.5%	1.34x
DictTest4.o	20854	16894	-19.0%	1.23x
ReversedCollections.o	11842	9746	-17.7%	1.22x
DictionarySwap.o	26886	22214	-17.4%	1.21x
DictionaryCopy.o	7241	6177	-14.7%	1.17x
RGBHistogram.o	27566	23974	-13.0%	1.15x
DictionaryGroup.o	16143	14175	-12.2%	1.14x
DictOfArraysToArrayOfDicts.o	30614	27094	-11.5%	1.13x
DictionaryBridgeToObjC.o	5981	5419	-9.4%	1.10x
Prims.o	39308	37212	-5.3%	1.06x
PrimsSplit.o	39360	37264	-5.3%	1.06x
DictionaryKeysContains.o	8815	8359	-5.2%	1.05x
DriverUtils.o	130357	124053	-4.8%	1.05x
WordCount.o	40660	39532	-2.8%	1.03x
Hash.o	20367	19841	-2.6%	1.03x
ObjectiveCBridging.o	40695	40143	-1.4%	1.01x

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Improvement
DataSetCountSmall	217	200	-7.8%	1.08x (?)

Code size: -swiftlibs

TEST	OLD	NEW	DELTA	RATIO
Improvement
libswiftAVFoundation.dylib	65536	61440	-6.2%	1.07x
libswiftXCTest.dylib	81920	77824	-5.0%	1.05x
libswiftAppKit.dylib	81920	77824	-5.0%	1.05x
libswiftFoundation.dylib	1654784	1626112	-1.7%	1.02x
libswiftStdlibUnittest.dylib	380928	376832	-1.1%	1.01x

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

eeckstein · 2018-12-13T23:24:57Z

@swift-ci test

swift-ci · 2018-12-14T00:22:09Z

Build failed
Swift Test Linux Platform
Git Sha - 16fcc3e7946cf74f3a13a4012417191d27c321c8

swift-ci · 2018-12-14T00:54:11Z

Build failed
Swift Test OS X Platform
Git Sha - 16fcc3e7946cf74f3a13a4012417191d27c321c8

…ion when compiled with -Osize It's @_semantics("optimize.sil.specialize.generic.size.never") It is similar to "optimize.sil.specialize.generic.partial.never", but only prevents specialization if the optimization mode is Size

@inline

The first change is to remove some @inline(__always) attributes. Those were added before we had the guaranteed-by-default calling convention. They are not necessary anymore. The second change is to not specialize some slow-path functions. This results that no specialization code for these functions is generated at the client side. Instead those functions are directly called in the libSwiftCore. Note that Key-related hash and equality comparisons are still specialized, because otherwise the performance hit for Osize would be too big. Some Dictionary benchmarks regress a bit with -Osize, but the code size wins are big. rdar://problem/46534453

eeckstein · 2018-12-14T01:00:12Z

@swift-ci test

eeckstein · 2018-12-14T01:00:34Z

@swift-ci test

lorentey · 2018-12-14T15:05:43Z

Hm; I wonder if some of these would better be implemented by moving methods that don't depend on Key/Value to a non-generic _CoreNativeDictionary struct.

eeckstein · 2018-12-14T16:32:56Z

As I see it, all of the functions I annotated depend on the Value type.
In general, it's an interesting idea, because you probably could make such functions resilient without hurting performance.

aroben · 2018-12-14T16:49:24Z

Any idea what's causing the two -O performance regressions? Is it the removal of @inline(__always)?

And the -O code size regressions seem particularly surprising, given that this change should produce less code, not more.

eeckstein · 2018-12-14T16:55:21Z

The -O regressions could be noise (I didn't see them when running the benchmarks locally), e.g. because of different code alignment.
The -Osize regressions are because we now execute generic code instead of specialized code.
I experimented a lot to keep those regressions in an acceptable range. For example, Key-type related functions (e.g. lookup) are still specialized. Otherwise the regressions would be really bad.

aroben · 2018-12-14T16:58:27Z

Any idea what's going on with the "Code size: -O" regressions? For example, DictionaryKeysContains.o got about 2KB bigger.

eeckstein · 2018-12-14T17:44:59Z

This is a result of different inlining decisions

eeckstein requested review from lorentey and airspeedswift December 13, 2018 22:17

eeckstein added 2 commits December 13, 2018 16:59

eeckstein force-pushed the dictionary-code-size branch from 16fcc3e to aae60ff Compare December 14, 2018 00:59

airspeedswift approved these changes Dec 14, 2018

View reviewed changes

eeckstein merged commit 18e1905 into apple:master Dec 14, 2018

eeckstein deleted the dictionary-code-size branch December 14, 2018 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stdlib: Code size improvements for Dictionary for -Osize #21306

stdlib: Code size improvements for Dictionary for -Osize #21306

eeckstein commented Dec 13, 2018

eeckstein commented Dec 13, 2018

swift-ci commented Dec 13, 2018

eeckstein commented Dec 13, 2018

swift-ci commented Dec 14, 2018

swift-ci commented Dec 14, 2018

eeckstein commented Dec 14, 2018

eeckstein commented Dec 14, 2018

lorentey commented Dec 14, 2018

eeckstein commented Dec 14, 2018

aroben commented Dec 14, 2018

eeckstein commented Dec 14, 2018

aroben commented Dec 14, 2018

eeckstein commented Dec 14, 2018

stdlib: Code size improvements for Dictionary for -Osize #21306

stdlib: Code size improvements for Dictionary for -Osize #21306

Conversation

eeckstein commented Dec 13, 2018

eeckstein commented Dec 13, 2018

swift-ci commented Dec 13, 2018

Build comment file:

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

eeckstein commented Dec 13, 2018

swift-ci commented Dec 14, 2018

swift-ci commented Dec 14, 2018

eeckstein commented Dec 14, 2018

eeckstein commented Dec 14, 2018

lorentey commented Dec 14, 2018

eeckstein commented Dec 14, 2018

aroben commented Dec 14, 2018

eeckstein commented Dec 14, 2018

aroben commented Dec 14, 2018

eeckstein commented Dec 14, 2018