IR: Optimize runtime of optimization passes #670

skmp · 2021-01-19T11:34:42Z

Overview

This does some easy optimizations to the runtime of our optimization passes. In general, there are no big algorithmic changes, this focuses on improving the data structures. RA is substantially changed.

Details

Reworked RA::CalculateNodeInterference to almost O(N) from O(N!)
Simplified and optimized data structures & initialization in RA
Reworked RCLSE::FindMemberInfo to be O(1) from O(N)
Merged DSE passes. Makes them ~ 2.8x as fast.
Replaced some std::maps with std::unordred_maps. This was benchmarked, as it's not always a net-win.
DSE directly looks for OP_JUMP/OP_CONDJUMP instead of looping
Removed the concept of Virtual Registers from RA
Optimized RA allocation from O(Regs * Interferences * Nodes) to O(Interferences * Nodes)
Optimized RA to stop allocating after first spill
Optimized RA to do compaction after spilling, as the input comes pre-compacted
Optimized IR Compaction to do fewer lookups, and to only memset for debug builds
Validated that IR output is exactly the same as before in all of Bytemark

Data

clang prompt takes 2.3s before this, and 0.90s after (with -n 4000 -m)

skmp · 2021-01-19T20:52:18Z

Found a major bug in the RA changes, converting to draft

…plexity

skmp · 2021-01-21T11:24:25Z

Added some more optimizations, removed a few bugs that crept in, and validated IR output is the same as before. This is ready for review now @Sonicadvance1

External/FEXCore/Source/Interface/IR/Passes/DeadStoreElimination.cpp

phire · 2021-01-21T12:41:44Z

External/FEXCore/Source/Interface/IR/Passes/RegisterAllocationPass.cpp

+      bool OptimizeSRA;
+      uint32_t SpillPointId;
+
+      #define INFO_MAKE(id, Class) ((id) | (Class << 24))


A bitfield struct might make more sense than these macros?

I initially had a bitfield struct, but for some reason I don't recall I switched over to the macros. I'll investigate the struct again tomorrow, and switch over if there's no blocker

phire · 2021-01-21T13:06:38Z

External/FEXCore/Source/Interface/IR/Passes/RegisterAllocationPass.cpp

+
+  constexpr uint64_t INVALID_REGCLASS = (((uint64_t)INVALID_CLASS) << 32) | (INVALID_REG);
+
+  template<unsigned _Size = 6, typename T = uint32_t>


Could do with a comment explaining why we have a bucketlist and how the constants have been chosen.

Good point. Will get to it tomorrow

Added the comment and removed default size of 6 (it's explicitly set in all instances anyway)

phire · 2021-01-21T13:12:46Z

External/FEXCore/Source/Interface/IR/Passes/DeadStoreElimination.cpp

+          auto& BlockInfo = InfoMap[BlockNode];
+
+          //// GPR ////
+          // We can't track through these


We probably could if we added a restriction that OP_STORECONTEXTINDEXED was only valid for x87, which is the only place we use it.

It's also used for the segment registers afaik. @Sonicadvance1 thoughts?

Hmm, we might.

Either way, as long as we can prove it's only used for for a limited set of registers we can avoid this optimisation killer.

It's only used for 32bit and LDT/GDT loading of segment register data. Very minor.

Created follow up #677

skmp force-pushed the skmp/faster-ra branch from f99dd8f to 192638e Compare January 19, 2021 17:43

skmp changed the title ~~RA: rework ConstrainedRAPass::CalculateNodeInterference~~ IR: Optimize runtime of optimization passes Jan 19, 2021

skmp marked this pull request as ready for review January 19, 2021 17:50

skmp requested a review from Sonicadvance1 January 19, 2021 17:50

skmp marked this pull request as draft January 19, 2021 20:52

Stefanos Kornilios Mitsis Poiitidis added 21 commits January 20, 2021 04:26

RA: rework ConstrainedRAPass::CalculateNodeInterference

e847f94

RA: Switch active set to also use BucketList

77d072e

RCLSE: Optimize FindMemberInfo to be O(1)

91ddc1d

RA: Optimize register conflict handling

6380ab6

RA: Simplify and optimize data structures

af94e24

ConstProp: Switch maps to unordred_maps

900fc6d

DSE: Merge Flag/GPR/FPR passes for perf

56ca731

DSE: Merge logic more for perf

997650b

DSE: Optimize map lookups

6bdf785

DSE: No need to loop to find branching op

f3e19ef

RA: Get OP_JUMP/CONDJUMP without loop in CalculatePrecessors

9c760ff

RA: Rename CalculatePrecessors to CalculatePredecessors

10b6fb0

ConstProp: Keep pools in heap

47ee9af

RA: Fix several bugs, get rid of virtual registers, remove unused com…

c3682f9

…plexity

RA: Cleanups

e7e6b66

JIT/x64: Fix VAddV

a9332e6

RA: Expire ending intervals before starting new ones

9f3ce47

RA: Make spans at least 1 offset long

d3fc85e

RA: Exit after first spill per iteration is found

8aef9cb

ConsProp: Revert to ordered set for identical codegen

9c462eb

IR: Sync Invalid class with RA

ac1036a

skmp force-pushed the skmp/faster-ra branch from 6042c8d to ac1036a Compare January 20, 2021 02:26

Stefanos Kornilios Mitsis Poiitidis added 2 commits January 21, 2021 01:01

RA: Run compation after spilling, not before

0e7db64

IRCompaction: Only memset in debug

00ca576

skmp marked this pull request as ready for review January 21, 2021 11:23

phire reviewed Jan 21, 2021

View reviewed changes

phire approved these changes Jan 22, 2021

View reviewed changes

Address review feedback

0546da7

skmp force-pushed the skmp/faster-ra branch from 3030517 to 0546da7 Compare January 22, 2021 09:02

Sonicadvance1 merged commit d4ffea5 into main Jan 22, 2021

Sonicadvance1 deleted the skmp/faster-ra branch January 22, 2021 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IR: Optimize runtime of optimization passes #670

IR: Optimize runtime of optimization passes #670

skmp commented Jan 19, 2021 •

edited

skmp commented Jan 19, 2021

skmp commented Jan 21, 2021

phire Jan 21, 2021

skmp Jan 21, 2021

phire Jan 21, 2021

skmp Jan 21, 2021

skmp Jan 22, 2021

phire Jan 21, 2021

skmp Jan 21, 2021

phire Jan 22, 2021

Sonicadvance1 Jan 22, 2021

skmp Jan 22, 2021


		constexpr uint64_t INVALID_REGCLASS = (((uint64_t)INVALID_CLASS) << 32) \| (INVALID_REG);

		template<unsigned _Size = 6, typename T = uint32_t>

IR: Optimize runtime of optimization passes #670

IR: Optimize runtime of optimization passes #670

Conversation

skmp commented Jan 19, 2021 • edited

Overview

Details

Data

skmp commented Jan 19, 2021

skmp commented Jan 21, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skmp commented Jan 19, 2021 •

edited