New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Exclusivity] Access Enforcement Folding Optimization #16595
Conversation
This is a special SIL case that MemAccessUtils and DiagnoseStaticExclusivity diagnostics handle. Ensure that we have coverage.
@swift-ci test. |
@swift-ci test source compatibility. |
Build failed |
changed = true; | ||
// Insertion in DenseMap invalidates the iterator (if `this` == | ||
// `other`. Break out of the loop and start over. | ||
restart = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test the impact of this change on compile time? I am concerned about the worst-case complexity of this change / data structure: we might have a lot of repeated / unnecessary work if we frequently restart this loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The set size should be tiny here. I just couldn't stand copying data to handle an exceedingly rare corner case. But, there's no sense having confusing code. Changed to...
// Insertion in DenseMap invalidates the iterator in the rare case of
// self-recursion (`this` == `other`) that passes accessed storage though an
// argument. Rather than complicate the code, make a temporary copy of the
// AccessedStorage.
SmallVector<std::pair<AccessedStorage, StorageAccessInfo>, 8> otherAccesses;
otherAccesses.reserve(other.storageAccessMap.size());
otherAccesses.append(other.storageAccessMap.begin(),
other.storageAccessMap.end());
bool changed = false;
for (auto &accessEntry : otherAccesses) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding compile time. This pass should not make a measurable difference. I'm sanity checking with some real-world projects.
My general philosophy, which drove the decisions in this pass and the upcoming whole module pass:
- Minimize the memory consumption and cost of the no-optimization case. Additional costs are proportional to the amount of optimization actually performed. Typically this means a single pass over the IR (possibly relying on DFS) doing an opcode check and a single hash table of optimizable things. This pass relies on interprocedural analysis before doing any optimization, so I was paranoid about its cost and size of the results.
- Worst-case, pathological behavior, which we never see in testing, is more important to guard against than it is to reduce typical overhead of the optimization.
@swift-ci test. |
1 similar comment
@swift-ci test. |
/// | ||
/// AccessFolding is a forward data flow analysis that tracks open accesses. If | ||
/// any path to an access' end of scope has a potentially conflicting access, | ||
/// then that access is marker as a nested conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: marker instead of marked.
@swift-ci test compiler performance. |
@swift-ci smoke benchmark. |
Build failed |
Build comment file:Optimized (O)Improvement (53)
No Changes (382)
Hardware Overview
|
Nice!🎉🍾🚀🚢🌴 |
@swift-ci test OS X Platform. |
Build failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@swift-ci smoke test. |
1 similar comment
@swift-ci smoke test. |
Use AccessedStorageAnalysis to find access markers with no nested conflicts. This optimization analyzes the scope of each access to determine whether it contains a potentially conflicting access. If not, then it can be demoted to an instantaneous check, which still catches conflicts on any enclosing outer scope. This removes up to half of the runtime calls associated with exclusivity checking.
There are ~100 significant benchmark regressions (of ~350) with -O -enforce-exclusivity=checked. This optimization roughly cuts the overhead in half for almost all of those regressions. These are the top 30 improvements with the optimization enabled. XorShift....................................................2.83x ReversedArray...............................................2.76x RangeIterationSigned........................................2.67x ExclusivityGlobal...........................................2.57x Random......................................................2.44x ReversedDictionary..........................................2.41x GeekbenchGEMM...............................................2.35x ArrayInClass................................................2.31x StringWalk..................................................2.29x Ary.........................................................2.25x Ary3........................................................2.25x Ary2........................................................2.21x MultiFileTogether...........................................2.17x MultiFileSeparate...........................................2.17x RecursiveOwnedParameter.....................................2.14x LevenshteinDistance.........................................2.04x HashTest....................................................1.97x Voronoi.....................................................1.94x NopDeinit...................................................1.92x Life........................................................1.89x Richards....................................................1.84x Rectangles..................................................1.74x MatMul......................................................1.71x LinkedList..................................................1.51x GeekbenchFFT................................................1.47x Xcbuild_OutputByteStreamPerfTests...........................1.39x ObjectAllocation............................................1.33x MapReduceLazyCollection.....................................1.30x Prims.......................................................1.28x CharIndexing_tweet_unicodeScalars_Backwards.................1.28x
@swift-ci smoke test. |
1 similar comment
@swift-ci smoke test. |
I verified no measurable impact on building SwiftNIO with -enforce-exclusivity=checked -O. |
nice! |
Use AccessedStorageAnalysis to find access markers with no nested conflicts.
This optimization analyzes the scope of each access to determine
whether it contains a potentially conflicting access. If not, then it
can be demoted to an instantaneous check, which still catches
conflicts on any enclosing outer scope.
This removes up to half of the runtime calls associated with
exclusivity checking.
There are about 100 significant benchmark regressions with -O
-enforce-exclusivity=checked.
This optimization roughly cuts the overhead in half. These are the top
30 improvements with the optimization enabled.
XorShift....................................................2.83x
ReversedArray...............................................2.76x
RangeIterationSigned........................................2.67x
ExclusivityGlobal...........................................2.57x
Random......................................................2.44x
ReversedDictionary..........................................2.41x
GeekbenchGEMM...............................................2.35x
ArrayInClass................................................2.31x
StringWalk..................................................2.29x
Ary.........................................................2.25x
Ary3........................................................2.25x
Ary2........................................................2.21x
MultiFileTogether...........................................2.17x
MultiFileSeparate...........................................2.17x
RecursiveOwnedParameter.....................................2.14x
LevenshteinDistance.........................................2.04x
HashTest....................................................1.97x
Voronoi.....................................................1.94x
NopDeinit...................................................1.92x
Life........................................................1.89x
Richards....................................................1.84x
Rectangles..................................................1.74x
MatMul......................................................1.71x
LinkedList..................................................1.51x
GeekbenchFFT................................................1.47x
Xcbuild_OutputByteStreamPerfTests...........................1.39x
ObjectAllocation............................................1.33x
MapReduceLazyCollection.....................................1.30x
Prims.......................................................1.28x
CharIndexing_tweet_unicodeScalars_Backwards.................1.28x