Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
AirAllocateStackByGraphColoring should use the optimized interference…
… graphs from AirAllocateRegistersByGraphColoring https://bugs.webkit.org/show_bug.cgi?id=226258 Reviewed by Phil Pizlo. Source/JavaScriptCore: The main change in this patch is that AirAllocateStackByGraphColoring is now using the optimized datastructures in wtf/InterferenceGraph.h. This required templating most of it over the interference graph used (Small/Large/Huge), but I tried keeping some common parts out of the templated class to minimize the impact on compile times and binary size. A consequence of that change is that coalescableMoves and remappedStackSlots now store indices instead of direct pointers to StackSlots, resulting in a memory reduction of about 3x as well. Another consequence is that I had to slightly alter the way that coalescing works: instead of carefully removing the interference edges of the killed slot, we simply use mayClear() which is not guaranteed to remove anything. I believe that this is sound, because every subsequent access to m_interference checks whether a slot has been coalesced first, so dropping these edges is purely a memory saving, but has no logical effect. The new code was tested in a few ways: - running on JetStream2 with asan - running on JetStream2 with TEST_OPTIMIZED_INTERFERENCE_GRAPH - running on JetStream2 and logging the frame sizes at the end of this phase, and comparing to the results of doing the same on ToT (same average frame size) The two functions where this code had the largest memory footprint in JetStream2 were both in tsf-wasm. One has 751 stack slots, and had an interference graph of 2.1MB and a coalescableMoves vector of 440kB The other has 673 stack slots, and had an interference graph of 1.9MB and a coalescableMoves vector of 421kB. With this patch, they respectively use 79kB+146kB and 67kB+140kB The effect on the rest of JetStream2 is less likely to matter as few functions used more than a few dozens of kB in this phase, but in percentages are just as huge. More importantly (and the reason I wrote this patch in the first place), I checked mruby-wasm.aotoki.dev which with a few other pages forced us to lower Options::maximumTmpsForGraphColoring because of jetsams. It has two massive functions that reach this phase if I increase Options::maximumTmpsForGraphColoring: - about 6k stack slots -> 215MB + 6MB (interference graph + coalescableMoves) - about 9k stack slots -> 395MB + 4MB After this patch, they respectively use 4.5MB+2MB and 9MB+1.5MB, or roughly a 40x improvement. Combined with the recent improvements to the register allocator, I hope to be able to increase Options::maximumTmpsForGraphColoring soon (in a different patch for easier bisection if either cause a perf regression). This would be helpful, since its lowering cratered our performance on some other wasm application by 8x. In terms of compile times, this patch lowered the time spent in AllocateStackByGraphColoring over the course of a run of JetStream2 from roughly 350ms to roughly 270ms. This is almost certainly negligible, but at least it guarantees that it did not regress. * b3/air/AirAllocateRegistersByGraphColoring.cpp: * b3/air/AirAllocateStackByGraphColoring.cpp: (JSC::B3::Air::allocateStackByGraphColoring): Source/WTF: I moved the interference graphs datastructures from AirAllocateRegistersByGraphColoring to their own wtf/InterferenceGraph.h file. There are three of them: - SmallInterferenceGraph, best for n < 400 - LargeInterferenceGraph, for n < 2**16 - HugeInterferenceGraph, for n up to 2**32 I also added "Iterable" versions of them, that have an operator[] method whose result you can iterate on to get all the indices which interfere with a given index. SmallIterableInterferenceGraph is the same as the non-iterable version, but the Large and Huge versions are a bit slower than their counterparts and use 2x memory. All of these were tested by running JetStream2 with the TEST_OPTIMIZED_INTERFERENCE_GRAPH set to 1. This flag makes the optimized datastructures run in parallel with a reference implementation, and their results are checked for equality on every method call. There is one small difference allowed: iteration is not guaranteed to go through elements in the same order. I also added a clear() method to LikelyDenseUnsignedIntegerSet, and added the NotNull flag to its various uses of placement new. * WTF.xcodeproj/project.pbxproj: * wtf/CMakeLists.txt: * wtf/HashSet.h: (WTF::W>::memoryUse const): * wtf/InterferenceGraph.h: Added. (WTF::InterferenceBitVector::contains): (WTF::InterferenceBitVector::addAndReturnIsNewEntry): (WTF::InterferenceBitVector::add): (WTF::InterferenceBitVector::clear): (WTF::InterferenceBitVector::mayClear): (WTF::InterferenceBitVector::setMaxIndex): (WTF::InterferenceBitVector::forEach): (WTF::InterferenceBitVector::size const): (WTF::InterferenceBitVector::memoryUse const): (WTF::InterferenceBitVector::dumpMemoryUseInKB const): (WTF::InterferenceBitVector::Iterable::iterator::operator++): (WTF::InterferenceBitVector::Iterable::iterator::operator* const): (WTF::InterferenceBitVector::Iterable::iterator::operator== const): (WTF::InterferenceBitVector::Iterable::iterator::operator!= const): (WTF::InterferenceBitVector::Iterable::begin const): (WTF::InterferenceBitVector::Iterable::end const): (WTF::InterferenceBitVector::operator[] const): (WTF::InterferenceBitVector::index const): (WTF::InterferenceVector::contains): (WTF::InterferenceVector::addAndReturnIsNewEntry): (WTF::InterferenceVector::add): (WTF::InterferenceVector::clear): (WTF::InterferenceVector::mayClear): (WTF::InterferenceVector::setMaxIndex): (WTF::InterferenceVector::forEach): (WTF::InterferenceVector::size const): (WTF::InterferenceVector::memoryUse const): (WTF::InterferenceVector::dumpMemoryUseInKB const): (WTF::InterferenceVector::Iterable::begin const): (WTF::InterferenceVector::Iterable::end const): (WTF::InterferenceVector::operator[] const): (WTF::UndirectedEdgesDuplicatingAdapter::contains): (WTF::UndirectedEdgesDuplicatingAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDuplicatingAdapter::add): (WTF::UndirectedEdgesDuplicatingAdapter::clear): (WTF::UndirectedEdgesDuplicatingAdapter::mayClear): (WTF::UndirectedEdgesDuplicatingAdapter::setMaxIndex): (WTF::UndirectedEdgesDuplicatingAdapter::forEach): (WTF::UndirectedEdgesDuplicatingAdapter::size const): (WTF::UndirectedEdgesDuplicatingAdapter::memoryUse const): (WTF::UndirectedEdgesDuplicatingAdapter::dumpMemoryUseInKB const): (WTF::UndirectedEdgesDuplicatingAdapter::operator[] const): (WTF::UndirectedEdgesDedupAdapter::contains): (WTF::UndirectedEdgesDedupAdapter::addAndReturnIsNewEntry): (WTF::UndirectedEdgesDedupAdapter::add): (WTF::UndirectedEdgesDedupAdapter::clear): (WTF::UndirectedEdgesDedupAdapter::mayClear): (WTF::UndirectedEdgesDedupAdapter::setMaxIndex): (WTF::UndirectedEdgesDedupAdapter::forEach): (WTF::UndirectedEdgesDedupAdapter::size const): (WTF::UndirectedEdgesDedupAdapter::memoryUse const): (WTF::UndirectedEdgesDedupAdapter::dumpMemoryUseInKB const): (WTF::InterferenceHashSet::contains): (WTF::InterferenceHashSet::addAndReturnIsNewEntry): (WTF::InterferenceHashSet::add): (WTF::InterferenceHashSet::clear): (WTF::InterferenceHashSet::setMaxIndex): (WTF::InterferenceHashSet::forEach): (WTF::InterferenceHashSet::size const): (WTF::InterferenceHashSet::memoryUse const): (WTF::InterferenceHashSet::dumpMemoryUseInKB const): (WTF::InstrumentedInterferenceGraph::contains): (WTF::InstrumentedInterferenceGraph::addAndReturnIsNewEntry): (WTF::InstrumentedInterferenceGraph::add): (WTF::InstrumentedInterferenceGraph::clear): (WTF::InstrumentedInterferenceGraph::mayClear): (WTF::InstrumentedInterferenceGraph::setMaxIndex): (WTF::InstrumentedInterferenceGraph::forEach): (WTF::InstrumentedInterferenceGraph::size const): (WTF::InstrumentedInterferenceGraph::memoryUse const): (WTF::InstrumentedInterferenceGraph::dumpMemoryUseInKB const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::Iterable): (WTF::InstrumentedIterableInterferenceGraph::Iterable::begin const): (WTF::InstrumentedIterableInterferenceGraph::Iterable::end const): (WTF::InstrumentedIterableInterferenceGraph::operator[] const): * wtf/LikelyDenseUnsignedIntegerSet.h: (WTF::LikelyDenseUnsignedIntegerSet::LikelyDenseUnsignedIntegerSet): (WTF::LikelyDenseUnsignedIntegerSet::clear): (WTF::LikelyDenseUnsignedIntegerSet::add): (WTF::LikelyDenseUnsignedIntegerSet::estimateHashSetSize): (WTF::LikelyDenseUnsignedIntegerSet::transitionToHashSet): (WTF::LikelyDenseUnsignedIntegerSet::transitionToBitVector): Canonical link: https://commits.webkit.org/238229@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@278186 268f45cc-cd09-0410-ab3c-d52691b4dbfc
- Loading branch information