-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split warm and cold blocks #7300
Conversation
e9736ee
to
7134799
Compare
Fixed EOF. |
7134799
to
723a918
Compare
Fixed Windows and macOS builds. |
723a918
to
ae70563
Compare
Improved previous fix. |
macOS failure in TestJitBuilderAPIGenerator is very unlikely related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some review comments after an initial skim of the code.
The commit message describes the different pieces that are implemented here, but I think a brief bigger picture description of why this functionality is being provided is important.
@@ -391,6 +400,47 @@ void OMR::CodeGenerator::lowerTrees() | |||
TR_ASSERT(node->getVisitCount() != visitCount, "Code Gen: error in lowering trees"); | |||
TR_ASSERT(node->getReferenceCount() == 0, "Code Gen: error in lowering trees"); | |||
|
|||
if (node->getOpCodeValue() == TR::BBStart) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these operations be guarded with
if (comp()->getOption(TR_SplitWarmAndColdBlocks) && !comp()->compileRelocatableCode())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put all the code in OMRCodeGenerator.cpp into a separate function and protected with getOption(TR_SplitWarmAndColdBlocks). The code actually supposed to work even with AOT (I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the case in OMR, but in J9, the cold and warm sections are only split if fej9()->needsContiguousCodeAndDataCacheAllocation()
is false; however, for relocatable compiles this returns true. I don't think J9's AOT supports discontiguous code sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dsouzai. This is very good point. Somehow I was under impression that since some platforms already put some snippets of code (thunks) into the cold cache it should work. But, I can imagine splitting blocks might not be supported by AOT...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will use !comp()->compileRelocatableCode()
for this PR and we can discuss enabling with AOT in a separate issue.
@@ -406,6 +456,106 @@ void OMR::CodeGenerator::lowerTrees() | |||
} | |||
|
|||
self()->postLowerTrees(); | |||
|
|||
if (comp()->getOption(TR_SplitWarmAndColdBlocks) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this will add nearly 100 lines of code applicable only when warm/cold block splitting is enabled, I think it should be moved into its own function for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think I will move all the new code in this file into a new function which will have an extra pass through the trees and then some trees modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
namespace OMR { class RegisterUsage; } | ||
namespace TR { class RegisterDependencyConditions; } | ||
|
||
// Hack markers | ||
#define CANT_REMATERIALIZE_ADDRESSES(cg) (cg->comp()->target().is64Bit()) // AMD64 produces a memref with an unassigned addressRegister | ||
#define OVER_ESTIMATATION 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: OVER_ESTIMATION
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
int32_t getAccumulatedInstructionLengthError() {return _accumulatedInstructionLengthError;} | ||
int32_t setAccumulatedInstructionLengthError(int32_t e) {return (_accumulatedInstructionLengthError = e);} | ||
int32_t addAccumulatedInstructionLengthError(int32_t e) {return (_accumulatedInstructionLengthError += e);} | ||
int64_t getAccumulatedInstructionLengthError() {return _accumulatedInstructionLengthError;} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the accumulated error need to be an int64_t
? The buffer length and snippet start offsets are still 32-bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setAccumulatedInstructionLengthError()
takes difference of addresses which is int64_t
. But you are right, it's better to cast the difference to int32_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
compiler/compile/OMRCompilation.cpp
Outdated
glRegDeps = oldBBStart->getChild(0); | ||
|
||
TR::CFG *cfg = getFlowGraph(); | ||
TR::Compilation *comp = cfg->comp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? This is already in the Compilation
class so self()
should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
namespace OMR { class RegisterUsage; } | ||
namespace TR { class RegisterDependencyConditions; } | ||
|
||
// Hack markers | ||
#define CANT_REMATERIALIZE_ADDRESSES(cg) (cg->comp()->target().is64Bit()) // AMD64 produces a memref with an unassigned addressRegister | ||
#define OVER_ESTIMATATION 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment describing what this means and what it is used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed. I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Of course. Should I put into the commit message or into the PR? |
Since it is only a single commit, please put it in the commit message. |
ae70563
to
d6098d8
Compare
Extended commit message. |
3f0304d
to
609cf8b
Compare
Fixed line endings. |
609cf8b
to
1d11b1e
Compare
Fixed Windows build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes requested; also had some questions. I'll likely need to do a second pass after getting more familiar with the relevant code.
@@ -391,6 +400,47 @@ void OMR::CodeGenerator::lowerTrees() | |||
TR_ASSERT(node->getVisitCount() != visitCount, "Code Gen: error in lowering trees"); | |||
TR_ASSERT(node->getReferenceCount() == 0, "Code Gen: error in lowering trees"); | |||
|
|||
if (node->getOpCodeValue() == TR::BBStart) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the case in OMR, but in J9, the cold and warm sections are only split if fej9()->needsContiguousCodeAndDataCacheAllocation()
is false; however, for relocatable compiles this returns true. I don't think J9's AOT supports discontiguous code sections.
1d11b1e
to
a973b81
Compare
Addressed comments from @dsouzai above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, minor changes / questions requested.
TR::CFG *cfg = getFlowGraph(); | ||
TR::Block *newFirstBlock = TR::Block::createEmptyBlock(oldBBStart, self(), oldFirstBlock->getFrequency()); | ||
|
||
newFirstBlock->takeGlRegDeps(self(), glRegDeps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I don't have any GlRegDeps experience, do we need to remove the glRegDeps node from the old BBStart, or does it not matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it actually works correctly. Here is an example of such transformation(we inserted empty block 3 in front of block 2, block 2 has not changed):
n102n BBStart <block_11> (freq 3) [0x7f4fe3065fa0] bci=[-1,0,104] rc=0 vc=301 vn=- li=- udi=- nc=1
n104n GlRegDeps () [0x7f4fe3066040] bci=[-1,0,104] rc=1 vc=301 vn=- li=2 udi=- nc=3 flg=0x20
n105n aRegLoad edx other<parm 2 Lsun/nio/fs/UnixPath;>[#424 Parm] [flags 0x40000107 0x0 ] [0x7f4fe3066090] bci=[-1,0,104] rc=2 vc=301 vn=- li=2 udi=- nc=0
n106n aRegLoad esi file<parm 1 Lsun/nio/fs/UnixPath;>[#423 Parm] [flags 0x40000107 0x0 ] (SeenRealReference ) [0x7f4fe30660e0] bci=[-1,0,104] rc=2 vc=301 vn=- li=2 udi=- nc=0 flg=0x8000
n107n aRegLoad eax this<'this' parm Lsun/nio/fs/UnixException;>[#422 Parm] [flags 0x40000107 0x0 ] [0x7f4fe3066130] bci=[-1,0,104] rc=2 vc=301 vn=- li=2 udi=- nc=0
n109n goto --> block_2 BBStart at n9n [0x7f4fe30661d0] bci=[-1,0,104] rc=0 vc=301 vn=- li=- udi=- nc=1
n108n GlRegDeps () [0x7f4fe3066180] bci=[-1,0,104] rc=1 vc=301 vn=- li=2 udi=- nc=3 flg=0x20
n105n ==>aRegLoad
n106n ==>aRegLoad
n107n ==>aRegLoad
n103n BBEnd </block_11> ===== [0x7f4fe3065ff0] bci=[-1,0,104] rc=0 vc=301 vn=- li=- udi=- nc=0
n9n BBStart <block_2> (freq 3) (cold) [0x7f4fe2f99a00] bci=[-1,0,104] rc=0 vc=301 vn=- li=2 udi=- nc=1
n58n GlRegDeps () [0x7f4fe30651e0] bci=[-1,0,104] rc=1 vc=301 vn=- li=2 udi=- nc=3 flg=0x20
n59n aRegLoad edx other<parm 2 Lsun/nio/fs/UnixPath;>[#424 Parm] [flags 0x40000107 0x0 ] [0x7f4fe3065230] bci=[-1,0,104] rc=3 vc=301 vn=- li=2 udi=- nc=0
n60n aRegLoad esi file<parm 1 Lsun/nio/fs/UnixPath;>[#423 Parm] [flags 0x40000107 0x0 ] (SeenRealReference ) [0x7f4fe3065280] bci=[-1,0,104] rc=3 vc=301 vn=- li=2 udi=- nc=0 flg=0x8000
n61n aRegLoad eax this<'this' parm Lsun/nio/fs/UnixException;>[#422 Parm] [flags 0x40000107 0x0 ] [0x7f4fe30652d0] bci=[-1,0,104] rc=3 vc=301 vn=- li=2 udi=- nc=0
n15n ifacmpne --> block_4 BBStart at n1n () [0x7f4fe2f99be0] bci=[-1,1,104] rc=0 vc=301 vn=- li=2 udi=- nc=3 flg=0x20
n60n ==>aRegLoad
n12n aconst NULL (X==0 ) [0x7f4fe2f99af0] bci=[-1,1,104] rc=1 vc=301 vn=- li=2 udi=- nc=0 flg=0x2
n62n GlRegDeps () [0x7f4fe3065320] bci=[-1,1,104] rc=1 vc=301 vn=- li=2 udi=- nc=3 flg=0x20
n59n ==>aRegLoad
n60n ==>aRegLoad
n61n ==>aRegLoad
n10n BBEnd </block_2> (cold) [0x7f4fe2f99a50] bci=[-1,1,104] rc=0 vc=301 vn=- li=2 udi=- nc=0
@vijaysun-omr could you please take a look and let us know what you think? It's a rare case, but it's better to get it right. We need to insert an empty block at the beginning of the method since all of its blocks are cold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting to have some doubts. For the purpose of this PR, we need to insert a new EBB at the start of the CFG. On the other hand, we need to convey resister/parameter association for the original start EBB (block_2). If the block remains empty, most likely things will work. But if the same insertion method is used somewhere else and then some code is added into block_11 there might be issues...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I am assuming the goto
is present because we need to separate that first block from the rest of the method that is all marked cold (in warm vs cold code cache).
Assuming this is right and the goto
is needed, then the GlRegDeps
state shown seems correct to me. i.e. we don't need any more or any fewer GlRegDeps
and the commoning among the children also seems right to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the goto is needed to separate warm and cold blocks. And yes, I think I agree - there is no need for any commoning between these two blocks. And all the GlRegDeps are correct, even if some code gets inserted into that new block.
// Update CFG | ||
// | ||
TR::CFG *cfg = comp->getFlowGraph(); | ||
cfg->addEdge(lastWarmBlock, comp->getStartBlock()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this add an edge to targetTreeTop
, since the goto isn't always going to the start block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I made a mistake with the last fix. It's better not to insert any edge. It's only needed for an unreachable goto
so conceptually does not affect CFG. I think I will remove the code above.
a973b81
to
fb2c720
Compare
Addressed comments above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@0xdaryl if everything looks good to you as well, I can kick off testing |
compiler/compile/OMRCompilation.hpp
Outdated
* | ||
* \note | ||
* Inserts an empty block before the current first block | ||
* Moves glRegDeps from the current block to the new one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say "Moves glRegDeps from the current first block to the new one" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -1779,6 +1795,10 @@ class OMR_EXTENSIBLE CodeGenerator | |||
void incOutOfLineColdPathNestedDepth(){_outOfLineColdPathNestedDepth++;} | |||
void decOutOfLineColdPathNestedDepth(){_outOfLineColdPathNestedDepth--;} | |||
|
|||
bool getIsInWarmCodeCache() {return _flags2.testAny(IsInWarmCodeCache);} | |||
void setIsInWarmCodeCache() {_flags2.set(IsInWarmCodeCache);} | |||
void resetIsInWarmCodeCache() {_flags2.reset(IsInWarmCodeCache);} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this flag used for? I don't see a use of it in this PR.
If it is needed, what does it mean exactly? That is, when used, someone would be asking the CodeGenerator object if "IsInWarmCodeCache" which doesn't make sense to me unless you supply a code address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I should document. It's used by instruction selection to identify the stage it is in. Can be used for inserting corresponding debug counters for example. Currently, only used by openj9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Can you call this flag "InstructionSelectionInWarmCodeCache" instead then for improved clarity? Your API methods would then be: getInstructionSelectionInWarmCodeCache(), setInstructionSelectionInWarmCodeCache(), and resetInstructionSelectionInWarmCodeCache().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agree. This name will be more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed as suggested above.
@@ -218,6 +218,9 @@ class OMR_EXTENSIBLE Instruction | |||
bool needsAOTRelocation() { return (_index & TO_MASK(NeedsAOTRelocation)) != 0; } | |||
void setNeedsAOTRelocation(bool v = true) { v ? _index |= TO_MASK(NeedsAOTRelocation) : _index &= ~TO_MASK(NeedsAOTRelocation); } | |||
|
|||
bool isLastWarmInstruction() { return (_index & TO_MASK(LastWarmInstruction)) != 0; } | |||
void setLastWarmInstruction(bool v = true) { v ? _index |= TO_MASK(LastWarmInstruction) : _index &= ~TO_MASK(LastWarmInstruction); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is LastWarmInstruction ever set to something non-NULL? I can't seem to find any code in this PR that does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature requires support in a sub-project. It has to identify last warm instruction during instruction selection.
I put it in the PR description:
- identify last warm instruction during instruction selection (currently, only in openj9)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this flag is only useful for a downstream project then it is possible to introduce this enum value in that project specifically.
The concern I have with introducing it in OMR without any references is that someone may innocuously use it to (seemingly) find the last instruction warm instruction in the method and never find it (because nothing ever sets it). If this does live in OMR then I think more documentation is necessary. I do have questions over its utilization, though. Will this flag be updated if new instructions are inserted after the current warm instruction, or is it only intended to be used very late in code generation when no more instructions will be added. Again, documentation will help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag is used in OMR but can be set in a downstream project. If it is not set, OMR will not split warm and cold blocks. I think it's up to the code that sets it to maintain its correct value. I will try to document a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
fb2c720
to
f54cf1b
Compare
Addressed comments above |
f54cf1b
to
838c20a
Compare
Fixed Windows build. |
- use the same heuristcs for code cache disclaim as for data cache - disclaim starting from the cold code - move stack overflow outline instructions into the warm area to increase disclaim efficiency Depends on: eclipse-omr/omr#7300 Depends on: eclipse-omr/omr#7324
838c20a
to
e3ea137
Compare
Addressed latest comments. |
e3ea137
to
079205c
Compare
Enabled cold block counting in |
079205c
to
d3fb233
Compare
Fixed Windows build. |
Can you squash the commits please? I'll launch final testing then. |
Since printing verbose "METHOD STATS" is already in added printing some more info about the cold cache (in a separate commit). |
Sorry, did not see your message and added another commit. But it's very minor: just printing some more stats about the cold code cache in the verbose output. I will squash everything now. |
OMR already provides warm and cold code cache and some platforms already place some code into the cold cache, but not cold blocks. The goal is to provide a capability of placing cold blocks into the cold cache. This can help with the future footprint reduction work: - find last warm block during tree lowering - identify last warm instruction during instruction selection (currently, only in openj9) - switch to the cold code cache after last warm instruction during binary encoding (currently, only on x platform) - the code is only enabled if -Xjit:splitWarmAndColdBlocks option is on - print cold code cache info in verbose #METHOD STATS
5365a79
to
9b184d4
Compare
Squashed all commits. |
Jenkins build all |
- use the same heuristcs for code cache disclaim as for data cache - disclaim starting from the cold code - move stack overflow outline instructions into the warm area to increase disclaim efficiency Depends on: eclipse-omr/omr#7300 Depends on: eclipse-omr/omr#7324
OMR already provides warm and cold code cache and some platforms already
place some code into the cold cache, but not cold blocks. The goal is to
provide a capability of placing cold blocks into the cold cache. This can
help with the future footprint reduction work: