Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SortArrayInClass benchmark. #26663

Merged
merged 1 commit into from Aug 23, 2019
Merged

Add SortArrayInClass benchmark. #26663

merged 1 commit into from Aug 23, 2019

Conversation

atrick
Copy link
Member

@atrick atrick commented Aug 15, 2019

Add SortArrayInClass benchmark.

This currently copies the array each time it swaps elements. This
makes it 1500x slower than it should be to sort the array. The
benchmark now runs in 15ms but should be around 10us when fully
optimized.

This algorithm is an interesting optimization problem involving array
optimization, uniqueness, bounds checks, and exclusivity. But the
general first order problem is how to modify a CoW data structure
that's stored in a class property. As it stands, the property getter
retains the class property around the modify accesses that checks
uniqueness.

@atrick
Copy link
Member Author

atrick commented Aug 15, 2019

@swift-ci test

@atrick
Copy link
Member Author

atrick commented Aug 15, 2019

@swift-ci benchmark

@atrick atrick requested a review from compnerd August 15, 2019 05:55
@atrick
Copy link
Member Author

atrick commented Aug 15, 2019

@swift-ci test.

@atrick
Copy link
Member Author

atrick commented Aug 15, 2019

@swift-ci benchmark

@swift-ci
Copy link
Collaborator

Build failed
Swift Test Linux Platform
Git Sha - 50fcbeb3367ddd8f3bf9a5bbb74cc7075b52020e

@swift-ci
Copy link
Collaborator

Build failed
Swift Test OS X Platform
Git Sha - 50fcbeb3367ddd8f3bf9a5bbb74cc7075b52020e

@swift-ci
Copy link
Collaborator

Performance: -O

Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubToNSStringRef 95 72 -24.2% 1.32x (?)
ProtocolDispatch 217 196 -9.7% 1.11x
PointerArithmetics 23900 21700 -9.2% 1.10x
ParseFloat.Float.Exp 11 10 -9.1% 1.10x (?)
LazilyFilteredArrayContains 22000 20300 -7.7% 1.08x (?)
 
Added MIN MAX MEAN MAX_RSS
SortArrayInClass 16017 16185 16096

Code size: -O

Regression OLD NEW DELTA RATIO
TestsUtils.o 25134 25670 +2.1% 0.98x

Performance: -Osize

Regression OLD NEW DELTA RATIO
FlattenListLoop 2167 2954 +36.3% 0.73x (?)
DataSubscriptSmall 13 17 +30.8% 0.76x
Dictionary4 166 207 +24.7% 0.80x (?)
ObjectiveCBridgeStubFromNSStringRef 83 102 +22.9% 0.81x (?)
RemoveWhereSwapInts 29 33 +13.8% 0.88x (?)
Array2D 3696 4192 +13.4% 0.88x
RandomShuffleLCG2 368 416 +13.0% 0.88x
ObjectiveCBridgeStubNSDateRefAccess 174 196 +12.6% 0.89x (?)
RemoveWhereMoveInts 17 19 +11.8% 0.89x (?)
MapReduce 217 238 +9.7% 0.91x (?)
RemoveWhereFilterInts 23 25 +8.7% 0.92x (?)
MapReduceAnyCollection 240 260 +8.3% 0.92x (?)
ArraySetElement 262 283 +8.0% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
DataCountSmall 17 13 -23.5% 1.31x
DataCreateEmpty 100 80 -20.0% 1.25x (?)
 
Added MIN MAX MEAN MAX_RSS
SortArrayInClass 15998 16030 16019

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
StringBuilderLong 980 1070 +9.2% 0.92x (?)
StringToDataEmpty 600 650 +8.3% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
ObjectiveCBridgeStubDateMutation 361 326 -9.7% 1.11x (?)
 
Added MIN MAX MEAN MAX_RSS
SortArrayInClass 21976 22181 22046

Code size: -swiftlibs

Benchmark Check Report
⛔️⏱ SortArrayInClass execution took at least 15585 μs.
Decrease the workload of SortArrayInClass by a factor of 16 (100), to be less than 1000 μs.
⚠️Ⓜ️ SortArrayInClass has very wide range of memory used between independent, repeated measurements.
SortArrayInClass mem_pages [i1, i2]: min=[7, 7] 𝚫=0 R=[68, 92]
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

Copy link
Collaborator

@compnerd compnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the help in getting this added.

benchmark/single-source/SortArrayInClass.swift Outdated Show resolved Hide resolved
@jrose-apple
Copy link
Contributor

Isn't this correct behavior, since there could be a subclass that overrides array?

This currently copies the array each time it swaps elements. This
makes it 1500x slower than it should be to sort the array. The
benchmark now runs in 15ms but should be around 10us when fully
optimized.

This algorithm is an interesting optimization problem involving array
optimization, uniqueness, bounds checks, and exclusivity. But the
general first order problem is how to modify a CoW data structure
that's stored in a class property. As it stands, the property getter
retains the class property around the modify accesses that checks
uniqueness.
@atrick
Copy link
Member Author

atrick commented Aug 15, 2019

@swift-ci smoke test.

@atrick
Copy link
Member Author

atrick commented Aug 15, 2019

@swift-ci benchmark

@atrick
Copy link
Member Author

atrick commented Aug 16, 2019

@jrose-apple It's true that if the class were open and the variable was public, or if we didn't use wmo, then we wouldn't be able to inline the array accessor. But the benchmark class is internal and it's built with -O -wmo, so the array accessor does get inlined. That's beside the point though because I'm questioning whether the array should be copied even at -Onone.

From any normal programmers's point of view this code:

let t = array[i]
array[i] = array[j]
array[j] = t

Should be equivalent to:

let t = array[i]
let u = array[j]
array[i] = u
array[j] = t

The fact that we force an array copy here seems like an unintended artifact of SILGen to me rather than fundamental lanuage semantics. For some reason the scope of the assignment's RHS extends beyond the modification rather than ending after the RHS is fully evaluated. @rjmccall will remember why and know if there's a good reason we can't fix it.

@jrose-apple
Copy link
Contributor

Oh, yeah, sure, the problem is a real problem. My question was whether the benchmark made any sense, and you're right, it does.

@swift-ci
Copy link
Collaborator

Performance: -O

Regression OLD NEW DELTA RATIO
RandomShuffleLCG2 704 768 +9.1% 0.92x
FlattenListLoop 3977 4333 +9.0% 0.92x (?)
Array2D 6912 7520 +8.8% 0.92x (?)
ObjectiveCBridgeStubNSDateRefAccess 343 371 +8.2% 0.92x (?)
RemoveWhereSwapInts 62 67 +8.1% 0.93x
MapReduce 369 398 +7.9% 0.93x (?)
MapReduceAnyCollection 370 398 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
NSStringConversion.UTF8 847 783 -7.6% 1.08x (?)
 
Added MIN MAX MEAN MAX_RSS
SortArrayInClass 36343 36641 36494

Code size: -O

Regression OLD NEW DELTA RATIO
TestsUtils.o 25134 25670 +2.1% 0.98x

Performance: -Osize

Regression OLD NEW DELTA RATIO
ObjectiveCBridgeStubDateAccess 228 257 +12.7% 0.89x
ObjectiveCBridgeStubFromNSStringRef 155 173 +11.6% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
CharacterLiteralsLarge 111 100 -9.9% 1.11x (?)
ObjectiveCBridgeStubDateMutation 285 257 -9.8% 1.11x
CharacterLiteralsSmall 345 322 -6.7% 1.07x (?)
 
Added MIN MAX MEAN MAX_RSS
SortArrayInClass 36156 36551 36371

Code size: -Osize

Performance: -Onone

Added MIN MAX MEAN MAX_RSS
SortArrayInClass 44472 44533 44509

Code size: -swiftlibs

Benchmark Check Report
⛔️⏱ SortArrayInClass execution took at least 35364 μs.
Decrease the workload of SortArrayInClass by a factor of 64 (100), to be less than 1000 μs.
⚠️Ⓜ️ SortArrayInClass has very wide range of memory used between independent, repeated measurements.
SortArrayInClass mem_pages [i1, i2]: min=[7, 7] 𝚫=0 R=[54, 46]
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@atrick atrick merged commit bdf0890 into apple:master Aug 23, 2019
@atrick atrick deleted the add-sortarrayinclass branch September 13, 2019 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants