-
Notifications
You must be signed in to change notification settings - Fork 32
[AIE2P] Enable up-combining for VLD + VCONV #392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you be a bit more precise? I think this is based on regression figures for AIE2 at a very particular time, state of the combines, state of the benchmark suites, etc. If there's a specific feature of AIE2P or AIE2 that is driving this, please mention it. If we can derive it from specific instruction characteristics of the conv or the load instructions, consider testing for that specific instruction feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this for other patterns in aie2, but not for this. I would prefer to not change the behavior for a target that we are not testing anymore, independent on the reason. In that time, a bunch of spill regression related to accumulator registers were found and I don't have all those results anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would reverse the condition of this nested if, having an early return false and no else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very general: parent checks are the cheapest one in town and should absolutely be done first. And I mean first-first, perhaps even before calling the select function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could optimize this selection framework as a whole. I created a ticket to review all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add 'by moving the VCONV up to the load.'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think advancing the conversion is in general better and easier. We usually want to schedule loads early due to long latency. Can we try it first?
c40983b to
aadb4c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct, addressing register is defined after.
aadb4c6 to
2407fe3
Compare
|
QoR numbers: For |
2407fe3 to
cd33344
Compare
martien-de-jong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Although the 'only for aie2p' condition remains mysterious.
This catches some missing opportunities in GEMM.
FYI: @martien-de-jong .