-
Notifications
You must be signed in to change notification settings - Fork 143
IR: Optimize runtime of optimization passes #670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Found a major bug in the RA changes, converting to draft |
Added some more optimizations, removed a few bugs that crept in, and validated IR output is the same as before. This is ready for review now @Sonicadvance1 |
bool OptimizeSRA; | ||
uint32_t SpillPointId; | ||
|
||
#define INFO_MAKE(id, Class) ((id) | (Class << 24)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bitfield struct might make more sense than these macros?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially had a bitfield struct, but for some reason I don't recall I switched over to the macros. I'll investigate the struct again tomorrow, and switch over if there's no blocker
|
||
constexpr uint64_t INVALID_REGCLASS = (((uint64_t)INVALID_CLASS) << 32) | (INVALID_REG); | ||
|
||
template<unsigned _Size = 6, typename T = uint32_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do with a comment explaining why we have a bucketlist and how the constants have been chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will get to it tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the comment and removed default size of 6 (it's explicitly set in all instances anyway)
auto& BlockInfo = InfoMap[BlockNode]; | ||
|
||
//// GPR //// | ||
// We can't track through these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could if we added a restriction that OP_STORECONTEXTINDEXED was only valid for x87, which is the only place we use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also used for the segment registers afaik. @Sonicadvance1 thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we might.
Either way, as long as we can prove it's only used for for a limited set of registers we can avoid this optimisation killer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only used for 32bit and LDT/GDT loading of segment register data. Very minor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created follow up #677
Overview
This does some easy optimizations to the runtime of our optimization passes. In general, there are no big algorithmic changes, this focuses on improving the data structures. RA is substantially changed.
Details
RA::CalculateNodeInterference
to almost O(N) from O(N!)RCLSE::FindMemberInfo
to be O(1) from O(N)std::map
s withstd::unordred_maps
. This was benchmarked, as it's not always a net-win.Data
clang prompt takes 2.3s before this, and 0.90s after (with
-n 4000 -m
)