Split warm and cold blocks #7300

gita-omr · 2024-04-04T23:34:02Z

OMR already provides warm and cold code cache and some platforms already
place some code into the cold cache, but not cold blocks. The goal is to
provide a capability of placing cold blocks into the cold cache. This can
help with the future footprint reduction work:

find last warm block during tree lowering
identify last warm instruction during instruction selection (currently, only in openj9)
switch to the cold code cache after last warm instruction during binary encoding (currently, only on x platform)
the code is only enabled if -Xjit:splitWarmAndColdBlocks option is on

gita-omr · 2024-04-04T23:38:30Z

Fixed EOF.

gita-omr · 2024-04-05T00:51:20Z

Fixed Windows and macOS builds.

gita-omr · 2024-04-05T01:03:36Z

Improved previous fix.

gita-omr · 2024-04-05T01:48:18Z

macOS failure in TestJitBuilderAPIGenerator is very unlikely related.

0xdaryl

These are some review comments after an initial skim of the code.

The commit message describes the different pieces that are implemented here, but I think a brief bigger picture description of why this functionality is being provided is important.

compiler/codegen/OMRCodeGenerator.cpp

0xdaryl · 2024-04-05T12:38:12Z

compiler/codegen/OMRCodeGenerator.cpp

@@ -391,6 +400,47 @@ void OMR::CodeGenerator::lowerTrees()
      TR_ASSERT(node->getVisitCount() != visitCount, "Code Gen: error in lowering trees");
      TR_ASSERT(node->getReferenceCount() == 0, "Code Gen: error in lowering trees");

+      if (node->getOpCodeValue() == TR::BBStart)
+         {


Should these operations be guarded with
if (comp()->getOption(TR_SplitWarmAndColdBlocks) && !comp()->compileRelocatableCode()) ?

Put all the code in OMRCodeGenerator.cpp into a separate function and protected with getOption(TR_SplitWarmAndColdBlocks). The code actually supposed to work even with AOT (I think).

I don't think it's the case in OMR, but in J9, the cold and warm sections are only split if fej9()->needsContiguousCodeAndDataCacheAllocation() is false; however, for relocatable compiles this returns true. I don't think J9's AOT supports discontiguous code sections.

Thanks @dsouzai. This is very good point. Somehow I was under impression that since some platforms already put some snippets of code (thunks) into the cold cache it should work. But, I can imagine splitting blocks might not be supported by AOT...

Will use !comp()->compileRelocatableCode() for this PR and we can discuss enabling with AOT in a separate issue.

compiler/codegen/OMRCodeGenerator.cpp

0xdaryl · 2024-04-05T12:40:45Z

compiler/codegen/OMRCodeGenerator.cpp

@@ -406,6 +456,106 @@ void OMR::CodeGenerator::lowerTrees()
      }

   self()->postLowerTrees();
+
+   if (comp()->getOption(TR_SplitWarmAndColdBlocks) &&


As this will add nearly 100 lines of code applicable only when warm/cold block splitting is enabled, I think it should be moved into its own function for readability.

Right, I think I will move all the new code in this file into a new function which will have an extra pass through the trees and then some trees modification.

compiler/compile/OMRCompilation.hpp

compiler/x/codegen/OMRCodeGenerator.cpp

0xdaryl · 2024-04-05T12:44:18Z

compiler/x/codegen/OMRCodeGenerator.cpp

 namespace OMR { class RegisterUsage; }
 namespace TR { class RegisterDependencyConditions; }

 // Hack markers
 #define CANT_REMATERIALIZE_ADDRESSES(cg) (cg->comp()->target().is64Bit()) // AMD64 produces a memref with an unassigned addressRegister
+#define OVER_ESTIMATATION 4


Typo: OVER_ESTIMATION

0xdaryl · 2024-04-05T12:46:54Z

compiler/codegen/OMRCodeGenerator.hpp

-   int32_t getAccumulatedInstructionLengthError() {return _accumulatedInstructionLengthError;}
-   int32_t setAccumulatedInstructionLengthError(int32_t e) {return (_accumulatedInstructionLengthError = e);}
-   int32_t addAccumulatedInstructionLengthError(int32_t e) {return (_accumulatedInstructionLengthError += e);}
+   int64_t getAccumulatedInstructionLengthError() {return _accumulatedInstructionLengthError;}


Why does the accumulated error need to be an int64_t? The buffer length and snippet start offsets are still 32-bit.

setAccumulatedInstructionLengthError() takes difference of addresses which is int64_t. But you are right, it's better to cast the difference to int32_t.

0xdaryl · 2024-04-05T12:48:26Z

compiler/compile/OMRCompilation.cpp

+      glRegDeps = oldBBStart->getChild(0);
+
+   TR::CFG *cfg = getFlowGraph();
+   TR::Compilation *comp = cfg->comp();


Is this needed? This is already in the Compilation class so self() should do.

0xdaryl · 2024-04-05T12:50:16Z

compiler/x/codegen/OMRCodeGenerator.cpp

 namespace OMR { class RegisterUsage; }
 namespace TR { class RegisterDependencyConditions; }

 // Hack markers
 #define CANT_REMATERIALIZE_ADDRESSES(cg) (cg->comp()->target().is64Bit()) // AMD64 produces a memref with an unassigned addressRegister
+#define OVER_ESTIMATATION 4


Please add a comment describing what this means and what it is used for.

Not needed. I will remove it.

gita-omr · 2024-04-05T14:37:44Z

These are some review comments after an initial skim of the code.

The commit message describes the different pieces that are implemented here, but I think a brief bigger picture description of why this functionality is being provided is important.

Of course. Should I put into the commit message or into the PR?

0xdaryl · 2024-04-05T14:39:30Z

Of course. Should I put into the commit message or into the PR?

Since it is only a single commit, please put it in the commit message.

gita-omr · 2024-04-05T22:00:14Z

Extended commit message.

gita-omr · 2024-04-06T00:50:45Z

Fixed line endings.

gita-omr · 2024-04-06T01:05:38Z

Fixed Windows build.

dsouzai

Some minor changes requested; also had some questions. I'll likely need to do a second pass after getting more familiar with the relevant code.

compiler/codegen/OMRCodeGenerator.cpp

compiler/compile/OMRCompilation.cpp

compiler/x/codegen/OMRCodeGenerator.cpp

dsouzai · 2024-04-09T20:03:49Z

compiler/codegen/OMRCodeGenerator.cpp

@@ -391,6 +400,47 @@ void OMR::CodeGenerator::lowerTrees()
      TR_ASSERT(node->getVisitCount() != visitCount, "Code Gen: error in lowering trees");
      TR_ASSERT(node->getReferenceCount() == 0, "Code Gen: error in lowering trees");

+      if (node->getOpCodeValue() == TR::BBStart)
+         {


I don't think it's the case in OMR, but in J9, the cold and warm sections are only split if fej9()->needsContiguousCodeAndDataCacheAllocation() is false; however, for relocatable compiles this returns true. I don't think J9's AOT supports discontiguous code sections.

gita-omr · 2024-04-12T21:43:11Z

Addressed comments from @dsouzai above.

dsouzai

Overall LGTM, minor changes / questions requested.

compiler/codegen/OMRCodeGenerator.cpp

dsouzai · 2024-04-15T13:54:54Z

compiler/compile/OMRCompilation.cpp

+   TR::CFG *cfg = getFlowGraph();
+   TR::Block *newFirstBlock = TR::Block::createEmptyBlock(oldBBStart, self(), oldFirstBlock->getFrequency());
+
+   newFirstBlock->takeGlRegDeps(self(), glRegDeps);


Because I don't have any GlRegDeps experience, do we need to remove the glRegDeps node from the old BBStart, or does it not matter?

I think it actually works correctly. Here is an example of such transformation(we inserted empty block 3 in front of block 2, block 2 has not changed):

n102n BBStart <block_11> (freq 3) [0x7f4fe3065fa0] bci=[-1,0,104] rc=0 vc=301 vn=- li=- udi=- nc=1 n104n GlRegDeps () [0x7f4fe3066040] bci=[-1,0,104] rc=1 vc=301 vn=- li=2 udi=- nc=3 flg=0x20 n105n aRegLoad edx other<parm 2 Lsun/nio/fs/UnixPath;>[#424 Parm] [flags 0x40000107 0x0 ] [0x7f4fe3066090] bci=[-1,0,104] rc=2 vc=301 vn=- li=2 udi=- nc=0 n106n aRegLoad esi file<parm 1 Lsun/nio/fs/UnixPath;>[#423 Parm] [flags 0x40000107 0x0 ] (SeenRealReference ) [0x7f4fe30660e0] bci=[-1,0,104] rc=2 vc=301 vn=- li=2 udi=- nc=0 flg=0x8000 n107n aRegLoad eax this<'this' parm Lsun/nio/fs/UnixException;>[#422 Parm] [flags 0x40000107 0x0 ] [0x7f4fe3066130] bci=[-1,0,104] rc=2 vc=301 vn=- li=2 udi=- nc=0 n109n goto --> block_2 BBStart at n9n [0x7f4fe30661d0] bci=[-1,0,104] rc=0 vc=301 vn=- li=- udi=- nc=1 n108n GlRegDeps () [0x7f4fe3066180] bci=[-1,0,104] rc=1 vc=301 vn=- li=2 udi=- nc=3 flg=0x20 n105n ==>aRegLoad n106n ==>aRegLoad n107n ==>aRegLoad n103n BBEnd </block_11> ===== [0x7f4fe3065ff0] bci=[-1,0,104] rc=0 vc=301 vn=- li=- udi=- nc=0 n9n BBStart <block_2> (freq 3) (cold) [0x7f4fe2f99a00] bci=[-1,0,104] rc=0 vc=301 vn=- li=2 udi=- nc=1 n58n GlRegDeps () [0x7f4fe30651e0] bci=[-1,0,104] rc=1 vc=301 vn=- li=2 udi=- nc=3 flg=0x20 n59n aRegLoad edx other<parm 2 Lsun/nio/fs/UnixPath;>[#424 Parm] [flags 0x40000107 0x0 ] [0x7f4fe3065230] bci=[-1,0,104] rc=3 vc=301 vn=- li=2 udi=- nc=0 n60n aRegLoad esi file<parm 1 Lsun/nio/fs/UnixPath;>[#423 Parm] [flags 0x40000107 0x0 ] (SeenRealReference ) [0x7f4fe3065280] bci=[-1,0,104] rc=3 vc=301 vn=- li=2 udi=- nc=0 flg=0x8000 n61n aRegLoad eax this<'this' parm Lsun/nio/fs/UnixException;>[#422 Parm] [flags 0x40000107 0x0 ] [0x7f4fe30652d0] bci=[-1,0,104] rc=3 vc=301 vn=- li=2 udi=- nc=0 n15n ifacmpne --> block_4 BBStart at n1n () [0x7f4fe2f99be0] bci=[-1,1,104] rc=0 vc=301 vn=- li=2 udi=- nc=3 flg=0x20 n60n ==>aRegLoad n12n aconst NULL (X==0 ) [0x7f4fe2f99af0] bci=[-1,1,104] rc=1 vc=301 vn=- li=2 udi=- nc=0 flg=0x2 n62n GlRegDeps () [0x7f4fe3065320] bci=[-1,1,104] rc=1 vc=301 vn=- li=2 udi=- nc=3 flg=0x20 n59n ==>aRegLoad n60n ==>aRegLoad n61n ==>aRegLoad n10n BBEnd </block_2> (cold) [0x7f4fe2f99a50] bci=[-1,1,104] rc=0 vc=301 vn=- li=2 udi=- nc=0

@vijaysun-omr could you please take a look and let us know what you think? It's a rare case, but it's better to get it right. We need to insert an empty block at the beginning of the method since all of its blocks are cold.

I am starting to have some doubts. For the purpose of this PR, we need to insert a new EBB at the start of the CFG. On the other hand, we need to convey resister/parameter association for the original start EBB (block_2). If the block remains empty, most likely things will work. But if the same insertion method is used somewhere else and then some code is added into block_11 there might be issues...

So, I am assuming the goto is present because we need to separate that first block from the rest of the method that is all marked cold (in warm vs cold code cache).

Assuming this is right and the goto is needed, then the GlRegDeps state shown seems correct to me. i.e. we don't need any more or any fewer GlRegDeps and the commoning among the children also seems right to me.

Yes, the goto is needed to separate warm and cold blocks. And yes, I think I agree - there is no need for any commoning between these two blocks. And all the GlRegDeps are correct, even if some code gets inserted into that new block.

compiler/compile/OMRCompilation.cpp

dsouzai · 2024-04-15T14:02:40Z

compiler/codegen/OMRCodeGenerator.cpp

+      // Update CFG
+      //
+      TR::CFG *cfg = comp->getFlowGraph();
+      cfg->addEdge(lastWarmBlock, comp->getStartBlock());


Shouldn't this add an edge to targetTreeTop, since the goto isn't always going to the start block?

I think I made a mistake with the last fix. It's better not to insert any edge. It's only needed for an unreachable goto so conceptually does not affect CFG. I think I will remove the code above.

compiler/codegen/OMRCodeGenerator.cpp

compiler/il/OMRBlock.hpp

compiler/x/codegen/OMRCodeGenerator.cpp

gita-omr · 2024-04-17T18:41:14Z

Addressed comments above.

dsouzai

LGTM

dsouzai · 2024-04-18T19:58:29Z

@0xdaryl if everything looks good to you as well, I can kick off testing

compiler/codegen/OMRCodeGenerator.cpp

compiler/x/codegen/OMRCodeGenerator.cpp

0xdaryl · 2024-04-23T12:00:49Z

compiler/compile/OMRCompilation.hpp

+   *
+   * \note
+   *    Inserts an empty block before the current first block
+   *    Moves glRegDeps from the current block to the new one


Should this say "Moves glRegDeps from the current first block to the new one" ?

0xdaryl · 2024-04-23T12:05:34Z

compiler/codegen/OMRCodeGenerator.hpp

@@ -1779,6 +1795,10 @@ class OMR_EXTENSIBLE CodeGenerator
   void incOutOfLineColdPathNestedDepth(){_outOfLineColdPathNestedDepth++;}
   void decOutOfLineColdPathNestedDepth(){_outOfLineColdPathNestedDepth--;}

+   bool getIsInWarmCodeCache() {return _flags2.testAny(IsInWarmCodeCache);}
+   void setIsInWarmCodeCache() {_flags2.set(IsInWarmCodeCache);}
+   void resetIsInWarmCodeCache() {_flags2.reset(IsInWarmCodeCache);}


What is this flag used for? I don't see a use of it in this PR.

If it is needed, what does it mean exactly? That is, when used, someone would be asking the CodeGenerator object if "IsInWarmCodeCache" which doesn't make sense to me unless you supply a code address.

Yeah, I should document. It's used by instruction selection to identify the stage it is in. Can be used for inserting corresponding debug counters for example. Currently, only used by openj9.

OK. Can you call this flag "InstructionSelectionInWarmCodeCache" instead then for improved clarity? Your API methods would then be: getInstructionSelectionInWarmCodeCache(), setInstructionSelectionInWarmCodeCache(), and resetInstructionSelectionInWarmCodeCache().

Yes, agree. This name will be more descriptive.

Renamed as suggested above.

0xdaryl · 2024-04-23T12:12:11Z

compiler/codegen/OMRInstruction.hpp

@@ -218,6 +218,9 @@ class OMR_EXTENSIBLE Instruction
   bool needsAOTRelocation() { return (_index & TO_MASK(NeedsAOTRelocation)) != 0; }
   void setNeedsAOTRelocation(bool v = true) { v ? _index |= TO_MASK(NeedsAOTRelocation) : _index &= ~TO_MASK(NeedsAOTRelocation); }

+   bool isLastWarmInstruction() { return (_index & TO_MASK(LastWarmInstruction)) != 0; }
+   void setLastWarmInstruction(bool v = true) { v ? _index |= TO_MASK(LastWarmInstruction) : _index &= ~TO_MASK(LastWarmInstruction); }


Where is LastWarmInstruction ever set to something non-NULL? I can't seem to find any code in this PR that does this.

This feature requires support in a sub-project. It has to identify last warm instruction during instruction selection.

I put it in the PR description:

identify last warm instruction during instruction selection (currently, only in openj9)

If this flag is only useful for a downstream project then it is possible to introduce this enum value in that project specifically.

The concern I have with introducing it in OMR without any references is that someone may innocuously use it to (seemingly) find the last instruction warm instruction in the method and never find it (because nothing ever sets it). If this does live in OMR then I think more documentation is necessary. I do have questions over its utilization, though. Will this flag be updated if new instructions are inserted after the current warm instruction, or is it only intended to be used very late in code generation when no more instructions will be added. Again, documentation will help here.

The flag is used in OMR but can be set in a downstream project. If it is not set, OMR will not split warm and cold blocks. I think it's up to the code that sets it to maintain its correct value. I will try to document a bit more.

Added comments.

gita-omr · 2024-04-25T05:22:05Z

Addressed comments above

gita-omr · 2024-04-25T05:51:07Z

Fixed Windows build.

- use the same heuristcs for code cache disclaim as for data cache - disclaim starting from the cold code - move stack overflow outline instructions into the warm area to increase disclaim efficiency Depends on: eclipse-omr/omr#7300 Depends on: eclipse-omr/omr#7324

gita-omr · 2024-04-29T17:26:30Z

Addressed latest comments.

gita-omr · 2024-04-30T01:11:43Z

Enabled cold block counting in OMR::CodeGenerator::getMethodStats()

gita-omr · 2024-04-30T17:25:45Z

Fixed Windows build.

0xdaryl · 2024-05-03T13:41:25Z

Can you squash the commits please? I'll launch final testing then.

gita-omr · 2024-05-04T01:49:04Z

Since printing verbose "METHOD STATS" is already in added printing some more info about the cold cache (in a separate commit).

gita-omr · 2024-05-04T01:52:29Z

Can you squash the commits please? I'll launch final testing then.

Sorry, did not see your message and added another commit. But it's very minor: just printing some more stats about the cold code cache in the verbose output. I will squash everything now.

OMR already provides warm and cold code cache and some platforms already place some code into the cold cache, but not cold blocks. The goal is to provide a capability of placing cold blocks into the cold cache. This can help with the future footprint reduction work: - find last warm block during tree lowering - identify last warm instruction during instruction selection (currently, only in openj9) - switch to the cold code cache after last warm instruction during binary encoding (currently, only on x platform) - the code is only enabled if -Xjit:splitWarmAndColdBlocks option is on - print cold code cache info in verbose #METHOD STATS

gita-omr · 2024-05-04T01:57:53Z

Squashed all commits.

0xdaryl · 2024-05-04T17:21:33Z

Jenkins build all

- use the same heuristcs for code cache disclaim as for data cache - disclaim starting from the cold code - move stack overflow outline instructions into the warm area to increase disclaim efficiency Depends on: eclipse-omr/omr#7300 Depends on: eclipse-omr/omr#7324

gita-omr requested review from vijaysun-omr, Leonardo2718, dsouzai, 0xdaryl and mstoodle as code owners April 4, 2024 23:34

gita-omr mentioned this pull request Apr 4, 2024

Split warm and cold blocks eclipse-openj9/openj9#19272

Merged

github-actions bot added arch:x86 comp:compiler labels Apr 4, 2024

gita-omr force-pushed the split_warm_cold_blocks branch from e9736ee to 7134799 Compare April 4, 2024 23:37

gita-omr force-pushed the split_warm_cold_blocks branch from 7134799 to 723a918 Compare April 5, 2024 00:50

gita-omr force-pushed the split_warm_cold_blocks branch from 723a918 to ae70563 Compare April 5, 2024 01:02

0xdaryl requested changes Apr 5, 2024

View reviewed changes

gita-omr force-pushed the split_warm_cold_blocks branch from ae70563 to d6098d8 Compare April 5, 2024 21:58

gita-omr force-pushed the split_warm_cold_blocks branch from 3f0304d to 609cf8b Compare April 6, 2024 00:50

gita-omr force-pushed the split_warm_cold_blocks branch from 609cf8b to 1d11b1e Compare April 6, 2024 01:05

dsouzai requested changes Apr 9, 2024

View reviewed changes

gita-omr force-pushed the split_warm_cold_blocks branch from 1d11b1e to a973b81 Compare April 12, 2024 21:41

dsouzai requested changes Apr 15, 2024

View reviewed changes

gita-omr force-pushed the split_warm_cold_blocks branch from a973b81 to fb2c720 Compare April 17, 2024 18:40

dsouzai approved these changes Apr 18, 2024

View reviewed changes

0xdaryl reviewed Apr 23, 2024

View reviewed changes

gita-omr force-pushed the split_warm_cold_blocks branch from fb2c720 to f54cf1b Compare April 25, 2024 05:21

gita-omr force-pushed the split_warm_cold_blocks branch from f54cf1b to 838c20a Compare April 25, 2024 05:50

gita-omr mentioned this pull request Apr 26, 2024

Disclaim Cold Code Cache eclipse-openj9/openj9#19397

Merged

gita-omr force-pushed the split_warm_cold_blocks branch from 838c20a to e3ea137 Compare April 29, 2024 17:25

gita-omr force-pushed the split_warm_cold_blocks branch from e3ea137 to 079205c Compare April 30, 2024 01:10

gita-omr force-pushed the split_warm_cold_blocks branch from 079205c to d3fb233 Compare April 30, 2024 17:25

gita-omr force-pushed the split_warm_cold_blocks branch from 5365a79 to 9b184d4 Compare May 4, 2024 01:56

0xdaryl self-assigned this May 5, 2024

0xdaryl approved these changes May 5, 2024

View reviewed changes

0xdaryl merged commit 24081c1 into eclipse-omr:master May 5, 2024
15 of 18 checks passed

pshipton mentioned this pull request May 31, 2024

(0.46) Disclaim Cold Code Cache eclipse-openj9/openj9#19581

Merged

Split warm and cold blocks #7300

Split warm and cold blocks #7300

Conversation

gita-omr commented Apr 4, 2024 • edited Loading

gita-omr commented Apr 4, 2024

gita-omr commented Apr 5, 2024

gita-omr commented Apr 5, 2024

gita-omr commented Apr 5, 2024

0xdaryl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gita-omr commented Apr 5, 2024

0xdaryl commented Apr 5, 2024

gita-omr commented Apr 5, 2024

gita-omr commented Apr 6, 2024

gita-omr commented Apr 6, 2024

dsouzai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gita-omr commented Apr 12, 2024

dsouzai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gita-omr Apr 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gita-omr commented Apr 17, 2024

dsouzai left a comment

Choose a reason for hiding this comment

dsouzai commented Apr 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gita-omr Apr 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gita-omr commented Apr 25, 2024

gita-omr commented Apr 25, 2024

gita-omr commented Apr 29, 2024

gita-omr commented Apr 30, 2024

gita-omr commented Apr 30, 2024

0xdaryl commented May 3, 2024

gita-omr commented May 4, 2024 • edited Loading

gita-omr commented May 4, 2024

gita-omr commented May 4, 2024

0xdaryl commented May 4, 2024

gita-omr commented Apr 4, 2024 •

edited

Loading

gita-omr Apr 17, 2024 •

edited

Loading

gita-omr Apr 29, 2024 •

edited

Loading

gita-omr commented May 4, 2024 •

edited

Loading