Add SortArrayInClass benchmark. #26663

atrick · 2019-08-15T05:54:08Z

Add SortArrayInClass benchmark.

This currently copies the array each time it swaps elements. This
makes it 1500x slower than it should be to sort the array. The
benchmark now runs in 15ms but should be around 10us when fully
optimized.

This algorithm is an interesting optimization problem involving array
optimization, uniqueness, bounds checks, and exclusivity. But the
general first order problem is how to modify a CoW data structure
that's stored in a class property. As it stands, the property getter
retains the class property around the modify accesses that checks
uniqueness.

atrick · 2019-08-15T05:54:22Z

@swift-ci test

atrick · 2019-08-15T05:54:28Z

@swift-ci benchmark

atrick · 2019-08-15T06:15:14Z

@swift-ci test.

atrick · 2019-08-15T06:15:21Z

@swift-ci benchmark

swift-ci · 2019-08-15T06:15:33Z

Build failed
Swift Test Linux Platform
Git Sha - 50fcbeb3367ddd8f3bf9a5bbb74cc7075b52020e

swift-ci · 2019-08-15T06:15:42Z

Build failed
Swift Test OS X Platform
Git Sha - 50fcbeb3367ddd8f3bf9a5bbb74cc7075b52020e

swift-ci · 2019-08-15T06:39:13Z

Performance: -O

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubToNSStringRef	95	72	-24.2%	1.32x (?)
ProtocolDispatch	217	196	-9.7%	1.11x
PointerArithmetics	23900	21700	-9.2%	1.10x
ParseFloat.Float.Exp	11	10	-9.1%	1.10x (?)
LazilyFilteredArrayContains	22000	20300	-7.7%	1.08x (?)

Added	MIN	MAX	MEAN	MAX_RSS
SortArrayInClass	16017	16185	16096	—

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
TestsUtils.o	25134	25670	+2.1%	0.98x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
FlattenListLoop	2167	2954	+36.3%	0.73x (?)
DataSubscriptSmall	13	17	+30.8%	0.76x
Dictionary4	166	207	+24.7%	0.80x (?)
ObjectiveCBridgeStubFromNSStringRef	83	102	+22.9%	0.81x (?)
RemoveWhereSwapInts	29	33	+13.8%	0.88x (?)
Array2D	3696	4192	+13.4%	0.88x
RandomShuffleLCG2	368	416	+13.0%	0.88x
ObjectiveCBridgeStubNSDateRefAccess	174	196	+12.6%	0.89x (?)
RemoveWhereMoveInts	17	19	+11.8%	0.89x (?)
MapReduce	217	238	+9.7%	0.91x (?)
RemoveWhereFilterInts	23	25	+8.7%	0.92x (?)
MapReduceAnyCollection	240	260	+8.3%	0.92x (?)
ArraySetElement	262	283	+8.0%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
DataCountSmall	17	13	-23.5%	1.31x
DataCreateEmpty	100	80	-20.0%	1.25x (?)

Added	MIN	MAX	MEAN	MAX_RSS
SortArrayInClass	15998	16030	16019	—

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
StringBuilderLong	980	1070	+9.2%	0.92x (?)
StringToDataEmpty	600	650	+8.3%	0.92x (?)

Improvement	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDateMutation	361	326	-9.7%	1.11x (?)

Added	MIN	MAX	MEAN	MAX_RSS
SortArrayInClass	21976	22181	22046	—

Code size: -swiftlibs

✅	Benchmark Check Report
⛔️⏱	`SortArrayInClass` execution took at least 15585 μs. _{Decrease the workload of SortArrayInClass by a factor of 16 (100), to be less than 1000 μs.}
⚠️Ⓜ️	`SortArrayInClass` has very wide range of memory used between independent, repeated measurements. _{SortArrayInClass mem_pages [i1, i2]: min=[7, 7] 𝚫=0 R=[68, 92]}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

compnerd

Thanks for the help in getting this added.

benchmark/single-source/SortArrayInClass.swift

jrose-apple · 2019-08-15T18:46:00Z

Isn't this correct behavior, since there could be a subclass that overrides array?

This currently copies the array each time it swaps elements. This makes it 1500x slower than it should be to sort the array. The benchmark now runs in 15ms but should be around 10us when fully optimized. This algorithm is an interesting optimization problem involving array optimization, uniqueness, bounds checks, and exclusivity. But the general first order problem is how to modify a CoW data structure that's stored in a class property. As it stands, the property getter retains the class property around the modify accesses that checks uniqueness.

atrick · 2019-08-15T23:33:09Z

@swift-ci smoke test.

atrick · 2019-08-15T23:33:17Z

@swift-ci benchmark

atrick · 2019-08-16T00:00:41Z

@jrose-apple It's true that if the class were open and the variable was public, or if we didn't use wmo, then we wouldn't be able to inline the array accessor. But the benchmark class is internal and it's built with -O -wmo, so the array accessor does get inlined. That's beside the point though because I'm questioning whether the array should be copied even at -Onone.

From any normal programmers's point of view this code:

let t = array[i]
array[i] = array[j]
array[j] = t

Should be equivalent to:

let t = array[i]
let u = array[j]
array[i] = u
array[j] = t

The fact that we force an array copy here seems like an unintended artifact of SILGen to me rather than fundamental lanuage semantics. For some reason the scope of the assignment's RHS extends beyond the modification rather than ending after the RHS is fully evaluated. @rjmccall will remember why and know if there's a good reason we can't fix it.

jrose-apple · 2019-08-16T00:03:13Z

Oh, yeah, sure, the problem is a real problem. My question was whether the benchmark made any sense, and you're right, it does.

swift-ci · 2019-08-16T00:20:35Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
RandomShuffleLCG2	704	768	+9.1%	0.92x
FlattenListLoop	3977	4333	+9.0%	0.92x (?)
Array2D	6912	7520	+8.8%	0.92x (?)
ObjectiveCBridgeStubNSDateRefAccess	343	371	+8.2%	0.92x (?)
RemoveWhereSwapInts	62	67	+8.1%	0.93x
MapReduce	369	398	+7.9%	0.93x (?)
MapReduceAnyCollection	370	398	+7.6%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
NSStringConversion.UTF8	847	783	-7.6%	1.08x (?)

Added	MIN	MAX	MEAN	MAX_RSS
SortArrayInClass	36343	36641	36494	—

Code size: -O

Regression	OLD	NEW	DELTA	RATIO
TestsUtils.o	25134	25670	+2.1%	0.98x

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
ObjectiveCBridgeStubDateAccess	228	257	+12.7%	0.89x
ObjectiveCBridgeStubFromNSStringRef	155	173	+11.6%	0.90x (?)

Improvement	OLD	NEW	DELTA	RATIO
CharacterLiteralsLarge	111	100	-9.9%	1.11x (?)
ObjectiveCBridgeStubDateMutation	285	257	-9.8%	1.11x
CharacterLiteralsSmall	345	322	-6.7%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
SortArrayInClass	36156	36551	36371	—

Code size: -Osize

Performance: -Onone

Added	MIN	MAX	MEAN	MAX_RSS
SortArrayInClass	44472	44533	44509	—

Code size: -swiftlibs

✅	Benchmark Check Report
⛔️⏱	`SortArrayInClass` execution took at least 35364 μs. _{Decrease the workload of SortArrayInClass by a factor of 64 (100), to be less than 1000 μs.}
⚠️Ⓜ️	`SortArrayInClass` has very wide range of memory used between independent, repeated measurements. _{SortArrayInClass mem_pages [i1, i2]: min=[7, 7] 𝚫=0 R=[54, 46]}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

atrick requested a review from compnerd August 15, 2019 05:55

compnerd approved these changes Aug 15, 2019

View reviewed changes

benchmark/single-source/SortArrayInClass.swift Outdated Show resolved Hide resolved

atrick merged commit bdf0890 into apple:master Aug 23, 2019

atrick deleted the add-sortarrayinclass branch September 13, 2019 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SortArrayInClass benchmark. #26663

Add SortArrayInClass benchmark. #26663

atrick commented Aug 15, 2019 •

edited

atrick commented Aug 15, 2019

atrick commented Aug 15, 2019

atrick commented Aug 15, 2019

atrick commented Aug 15, 2019

swift-ci commented Aug 15, 2019

swift-ci commented Aug 15, 2019

swift-ci commented Aug 15, 2019

compnerd left a comment

jrose-apple commented Aug 15, 2019

atrick commented Aug 15, 2019

atrick commented Aug 15, 2019

atrick commented Aug 16, 2019

jrose-apple commented Aug 16, 2019

swift-ci commented Aug 16, 2019

Add SortArrayInClass benchmark. #26663

Add SortArrayInClass benchmark. #26663

Conversation

atrick commented Aug 15, 2019 • edited

atrick commented Aug 15, 2019

atrick commented Aug 15, 2019

atrick commented Aug 15, 2019

atrick commented Aug 15, 2019

swift-ci commented Aug 15, 2019

swift-ci commented Aug 15, 2019

swift-ci commented Aug 15, 2019

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

compnerd left a comment

Choose a reason for hiding this comment

jrose-apple commented Aug 15, 2019

atrick commented Aug 15, 2019

atrick commented Aug 15, 2019

atrick commented Aug 16, 2019

jrose-apple commented Aug 16, 2019

swift-ci commented Aug 16, 2019

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

atrick commented Aug 15, 2019 •

edited