Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Z: Support NaN values w.r.t fmax/fmin/dmax/dmin on S390 #7196

Merged
merged 2 commits into from
Feb 14, 2024

Conversation

sarwat12
Copy link
Contributor

@sarwat12 sarwat12 commented Dec 6, 2023

This PR adds support to the S390 MaxMin evaluator to handle NaN operands for [fd]min/max opcodes.

  • Add NaN checks and set return register to NaN if either operands are NaN
  • Enable MaxMin tests for Float/Double on LoZ, and disable them on z/OS

Closes: #5157

Signed-off-by: Sarwat Shaheen sarwat.shaheen@yahoo.com

@sarwat12
Copy link
Contributor Author

sarwat12 commented Dec 6, 2023

Tagging @r30shah and @Spencer-Comin for review.

@sarwat12
Copy link
Contributor Author

sarwat12 commented Dec 6, 2023

Personal Build #19727 to test the changes.

@@ -334,23 +334,61 @@ xmaxxminHelper(TR::Node* node, TR::CodeGenerator* cg, TR::InstOpCode::Mnemonic c

TR::LabelSymbol* cFlowRegionStart = generateLabelSymbol(cg);
TR::LabelSymbol* cFlowRegionEnd = generateLabelSymbol(cg);
//Label Symbol for handling NaN operands
TR::LabelSymbol* nanLabel = generateLabelSymbol(cg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the initialization of nanLabel closer to where it is used.


//Handle NaN case for doubles
if (node->getOpCode().isDouble())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this evaluator used for non floating type min max as well (integer , long , etc) ?
I believe we pass in the move Op and compare Op so it will generate appropriate instructions for such cases.

Current change in this PR would have NaN case for all the types whereas Integer / Long do not have a NaN.

Can you confirm? If this is true than, we need to refactor and add checks so that only Double and Float would generate NaN case.

Copy link
Contributor Author

@sarwat12 sarwat12 Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're correct, I confirm that this evaluator is also used in specific cases of int/long MaxMin calls. Since only double and float generate NaN cases, I could try refactoring out the double/float specific code to a separate helper function. Something like this at the start of xmaxxminHelper:

`

if (node->getOpCode().isDouble() || node->getOpCode().isFloat()){
return fdmaxminHelper(node, cg, compareRROp, branchCond, moveRROp);
}

`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming. I think you can refactor the single function to handle this scenario. You need to guard the NaN changes with isDouble/Float query. You can tackle the common condition code early on (CC 0/1/2) first and lastly you can generate code for NaN case)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarwat12 You still need to work on this suggestion. Evaluator in current state will allocate a register to hold NaN value for Integral type min and max node as well. Code for NaN should only be generated for float and double.

@sarwat12 sarwat12 force-pushed the NaN_fdMinMax branch 2 times, most recently from a11dcc2 to 20f4bbd Compare December 13, 2023 19:06
@sarwat12
Copy link
Contributor Author

sarwat12 commented Dec 13, 2023

Personal Build #19902 to test new changes.

@sarwat12 sarwat12 marked this pull request as ready for review December 13, 2023 19:12
@sarwat12
Copy link
Contributor Author

Tagging @r30shah for review.

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the last comment.

@sarwat12 sarwat12 force-pushed the NaN_fdMinMax branch 2 times, most recently from 0a88bfa to 680672e Compare December 15, 2023 18:55

TR::RegisterDependencyConditions* deps = new (cg->trHeapMemory()) TR::RegisterDependencyConditions(0, 2, cg);
//Label Symbol for handling possible NaN operand case for float/double
TR::LabelSymbol* nanLabel;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize nanLabel to NULL.

generateS390BranchInstruction(cg, TR::InstOpCode::BRC, TR::InstOpCode::COND_BRC, node, cFlowRegionEnd);

//Handle NaN case
generateS390LabelInstruction(cg, TR::InstOpCode::label, node, nanLabel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. We only generate nanLabel symbol for Double / Float case. nanLabel for other case would be uninitialized. This will fail while compiling.

genLoadLongConstant(cg, node, DOUBLE_NAN, nanReg, NULL, deps, NULL);
generateRRInstruction(cg, TR::InstOpCode::LDGR, node, lhsReg, nanReg);
}
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If node is not double than, it can be int, long, short, byte and float. You can not assume it will be Float only.

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change, overall it is good.

{
//Load immediate DOUBLE_NAN value into GPR, then move it to result FPR
genLoadLongConstant(cg, node, DOUBLE_NAN, nanReg, NULL, deps, NULL);
generateRRInstruction(cg, TR::InstOpCode::LDGR, node, lhsReg, nanReg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the common instruction for both Double and Float, should move it out of if-else block

generateRRInstruction(cg, TR::InstOpCode::LDGR, node, lhsReg, nanReg);
}
deps->addPostConditionIfNotAlreadyInserted(nanReg, TR::RealRegister::AssignAny);
cg->stopUsingRegister(nanReg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this line to end of evaluator after you are attaching dependency to and of ICF label (You just added the register to dep)

@sarwat12 sarwat12 force-pushed the NaN_fdMinMax branch 2 times, most recently from 7284904 to c1f1ce1 Compare January 12, 2024 16:06
Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@r30shah
Copy link
Contributor

r30shah commented Jan 12, 2024

jenkins build zos,zlinux

@r30shah
Copy link
Contributor

r30shah commented Jan 12, 2024

@sarwat12 seems like your changes are failing on z/OS. Can you investigate?

@sarwat12
Copy link
Contributor Author

sarwat12 commented Feb 1, 2024

New instructions sequence generated from running DoubleMaxMin tests:

[     0x2aa2652bad0]                          proc
 [     0x2aa2652bbd0]                          fence   Relative [ 0x2aa2647b3e0 ] BBStart <block_2> (frequency 10000)
 [     0x2aa2652c110]                          LD      FPR_0017, Parm[Parm  0<parm 0 D>] ?+0(GPR15)
 [     0x2aa2654a4a0]                          LD      FPR_0019, Parm[Parm  1<parm 1 D>] ?+8(GPR15)
 [     0x2aa2654a610]                          Label L0016:     # (Start of internal control flow)
 [     0x2aa2654a6d0]                          CDBR    FPR_0017,FPR_0019
 [     0x2aa2654a780]                          BRC     BLR(0x2), Label L0017
 [     0x2aa2654a840]                          LTDBR   FPR_0017,FPR_0017
 [     0x2aa2654a8f0]                          BRC     MASK2(0x1), Label L0017
 [     0x2aa2654a9b0]                          LDR     FPR_0017,FPR_0019
 [     0x2aa2654b090]                          assocreg
 [     0x2aa2654aaf0]                          Label L0017:     # (End of internal control flow)
 POST:
 {AssignAny:FPR_0017:R} {AssignAny:FPR_0019:R}
 [     0x2aa2654b7a0]                          assocreg
 [     0x2aa2654b1f0]                          retn
 POST:
 {FPR0:FPR_0017:R}
 [     0x2aa2654bd20]                          fence   Relative [ 0x2aa2647b3e4 ] BBEnd </block_2>
 [     0x2aa2654c2d0]                          assocreg {FPR0:FPR_0017:R}

New instructions sequence generated from running FloatMaxMin tests:

[     0x2aa2c4baa70]                          proc
 [     0x2aa2c4bab70]                          fence   Relative [ 0x2aa2c47b460 ] BBStart <block_2> (frequency 10000)
 [     0x2aa2c4bb0b0]                          LDE     FPR_0017, Parm[Parm  0<parm 0 F>] ?+0(GPR15),0
 [     0x2aa2c50c160]                          LDE     FPR_0019, Parm[Parm  1<parm 1 F>] ?+8(GPR15),0
 [     0x2aa2c50c2d0]                          Label L0016:     # (Start of internal control flow)
 [     0x2aa2c50c390]                          CEBR    FPR_0017,FPR_0019
 [     0x2aa2c50c440]                          BRC     BLR(0x2), Label L0017
 [     0x2aa2c50c500]                          LTEBR   FPR_0017,FPR_0017
 [     0x2aa2c50c5b0]                          BRC     MASK2(0x1), Label L0017
 [     0x2aa2c50c670]                          LER     FPR_0017,FPR_0019
 [     0x2aa2c50cd50]                          assocreg
 [     0x2aa2c50c7b0]                          Label L0017:     # (End of internal control flow)
 POST:
 {AssignAny:FPR_0017:R} {AssignAny:FPR_0019:R}
 [     0x2aa2c50d460]                          assocreg
 [     0x2aa2c50ceb0]                          retn
 POST:
 {FPR0:FPR_0017:R}
 [     0x2aa2c50d9e0]                          fence   Relative [ 0x2aa2c47b464 ] BBEnd </block_2>
 [     0x2aa2c50df90]                          assocreg {FPR0:FPR_0017:R}

@r30shah
Copy link
Contributor

r30shah commented Feb 1, 2024

@sarwat12 Is the above instruction generated for Min or Max ? Looking at the branch after CEBR, it is Max.

@sarwat12
Copy link
Contributor Author

sarwat12 commented Feb 1, 2024

Hi @r30shah , I was trying to take a look at the fminEvaluator which makes a call to xmaxxminHelper. It seems that fmin passes the branch condition BLR. The above instruction sequence could be for Min.

TR::Register*
OMR::Z::TreeEvaluator::fminEvaluator(TR::Node* node, TR::CodeGenerator* cg)
   {
   return xmaxxminHelper(node, cg, TR::InstOpCode::CEBR, TR::InstOpCode::COND_BLR, TR::InstOpCode::LER);
   }

In the example case above, assuming if it is a call coming from the fminEvaluator,

CEBR    FPR_0017,FPR_0019 // First Operand low, so CC1 will be set
BRC     BLR(0x2), Label L0017 // Branch low taken to Label L0017
LTEBR   FPR_0017,FPR_0017 // This would be skipped
BRC     MASK2(0x1), Label L0017 // This would be skipped
LER     FPR_0017,FPR_0019 // This would be skipped
Label L0017: // Result Reg = FPR_0017

I could be wrong in the assumption of Min, please let me know if this seems incorrect, then I can try to verify if this may go wrong for specific corner cases.

@r30shah
Copy link
Contributor

r30shah commented Feb 1, 2024

Can you share the binary encoding ? I want to see the CC in the encoded instruction.

@sarwat12
Copy link
Contributor Author

sarwat12 commented Feb 1, 2024

Binary encoding of the instructions sequence generated from running FloatMaxMin tests:

0x3ff9eb000f8 00000000 [     0x2aa24a97f20]                                                   Label L0032:
     0x3ff9eb000f8 00000000 [     0x2aa24a593a0]                                                   proc
     0x3ff9eb000f8 00000000 [     0x2aa24a98200] e3 f0 f0 78 00 24                                 STG     GPR15,#385 120(GPR15)
     0x3ff9eb000fe 00000006 [     0x2aa24a98370] e3 f0 ff 60 ff 71                                 LAY     GPR15,#386 -160(GPR15)
     0x3ff9eb00104 0000000c [     0x2aa24a98500] 70 00 f1 20                                       STE     FPR0,#387 288(GPR15)
     0x3ff9eb00108 00000010 [     0x2aa24a98670] 70 20 f1 28                                       STE     FPR2,#388 296(GPR15)
     0x3ff9eb0010c 00000014 [     0x2aa24a594a0]                                                   fence   Relative [ 0x2aa24a09f60 ] BBStart <block_2> (frequency 10000)
     0x3ff9eb0010c 00000014 [     0x2aa24a599e0] ed 00 f1 20 00 24                                 LDE     FPR0, Parm[Parm  0<parm 0 F>] 288(GPR15),0
     0x3ff9eb00112 0000001a [     0x2aa24a77d70] ed 10 f1 28 00 24                                 LDE     FPR1, Parm[Parm  1<parm 1 F>] 296(GPR15),0
     0x3ff9eb00118 00000020 [     0x2aa24a77ee0]                                                   Label L0016: # (Start of internal control flow)
     0x3ff9eb00118 00000020 [     0x2aa24a77fa0] b3 09 00 01                                       CEBR    FPR0,FPR1
     0x3ff9eb0011c 00000024 [     0x2aa24a78050] a7 24 00 25                                       BRC     BLR(0x2), Label L0017, labelTargetAddr=0x0x3ff9eb0012a
     0x3ff9eb00120 00000028 [     0x2aa24a78110] b3 02 00 00                                       LTEBR   FPR0,FPR0
     0x3ff9eb00124 0000002c [     0x2aa24a781c0] a7 14 00 20                                       BRC     MASK2(0x1), Label L0017, labelTargetAddr=0x0x3ff9eb0012a
     0x3ff9eb00128 00000030 [     0x2aa24a78280] 38 01                                             LER     FPR0,FPR1
     0x3ff9eb0012a 00000032 [     0x2aa24a783c0]                                                   Label L0017: # (End of internal control flow)
     0x3ff9eb0012a 00000032 [     0x2aa24a98780]                                                   Label L0034:
     0x3ff9eb0012a 00000032 [     0x2aa24a98900] e3 f0 f1 18 00 04                                 LG      GPR15,#389 280(GPR15)
     0x3ff9eb00130 00000038 [     0x2aa24a989b0] 07 fe                                             BCR     BER(mask=0xf), GPR14
     0x3ff9eb00132 0000003a [     0x2aa24a78ac0]                                                   retn
     0x3ff9eb00132 0000003a [     0x2aa24a795f0]                                                   fence   Relative [ 0x2aa24a09f64 ] BBEnd </block_2>
     0x3ff9eb00132 0000003a [     0x2aa24a98040]                                                   Label L0033:

@r30shah
Copy link
Contributor

r30shah commented Feb 2, 2024

@sarwat12 Thanks for the binary encoding, I think I was bitten incorrect name printed in the instruction selection log MASK2. It is indeed MASK1 (CC3). I apologize for the confusion and have deleted the comment with misleading analysis. Posting the correct analysis here,

FPR_0017 - Op1 - < 0
FPR_0019 - Op2 - < 0
FPR_0017 < FPR_0019

CEBR    FPR_0017,FPR_0019 // First Operand low, so CC1 will be set
BRC     BLR(0x2), Label L0017 // CC1 is set not CC2 which is what BRC checks, so branch will not be taken, 
LTEBR   FPR_0017,FPR_0017 // This would be skipped
BRC     MASK2(0x1), Label L0017 // CC3 is checked which is the case when FPR_0017 is NaN, Branch will not be taken
LER     FPR_0017,FPR_0019 // FPR_0017 is loaded with FPR_0019 which is higher 
Label L0017: // Result Reg = FPR_0017

There is one modification though, I would like to suggest, It can be made in the dmax/fmax/dmin/fmin evaluator. We would not have to check for anything else if the operands are equal and can skip reg - reg load as well. Can I request you to make changes as part of this PR as well?

@sarwat12 sarwat12 force-pushed the NaN_fdMinMax branch 2 times, most recently from 4f789e3 to d499cd6 Compare February 6, 2024 19:47
Comment on lines 346 to 350
if (node->getOpCode().isDouble() || node->getOpCode().isFloat())
{
if (node->getOpCode().isDouble())
{
// If first operand is NaN, then we are done, otherwise fallthrough to move second operand as result
generateRREInstruction(cg, TR::InstOpCode::LTDBR, node, lhsReg, lhsReg);
generateS390BranchInstruction(cg, TR::InstOpCode::BRC, TR::InstOpCode::COND_CC3, node, cFlowRegionEnd);
}
else if (node->getOpCode().isFloat())
{
// If first operand is NaN, then we are done, otherwise fallthrough to move second operand as result
generateRREInstruction(cg, TR::InstOpCode::LTEBR, node, lhsReg, lhsReg);
generateS390BranchInstruction(cg, TR::InstOpCode::BRC, TR::InstOpCode::COND_CC3, node, cFlowRegionEnd);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can simplify the code above like following.

Suggested change
if (node->getOpCode().isDouble() || node->getOpCode().isFloat())
{
if (node->getOpCode().isDouble())
{
// If first operand is NaN, then we are done, otherwise fallthrough to move second operand as result
generateRREInstruction(cg, TR::InstOpCode::LTDBR, node, lhsReg, lhsReg);
generateS390BranchInstruction(cg, TR::InstOpCode::BRC, TR::InstOpCode::COND_CC3, node, cFlowRegionEnd);
}
else if (node->getOpCode().isFloat())
{
// If first operand is NaN, then we are done, otherwise fallthrough to move second operand as result
generateRREInstruction(cg, TR::InstOpCode::LTEBR, node, lhsReg, lhsReg);
generateS390BranchInstruction(cg, TR::InstOpCode::BRC, TR::InstOpCode::COND_CC3, node, cFlowRegionEnd);
}
}
if (node->getOpCode().isFloatingPoint())
{
// If first operand is NaN, then we are done, otherwise fallthrough to move second operand as result
generateRREInstruction(cg, node->getOpCode().isDouble() ? TR::InstOpCode::LTDBR : TR::InstOpCode::LTEBR , node, lhsReg, lhsReg);
generateS390BranchInstruction(cg, TR::InstOpCode::BRC, TR::InstOpCode::COND_CC3, node, cFlowRegionEnd);
}

@r30shah
Copy link
Contributor

r30shah commented Feb 6, 2024

Jenkins build zos,zlinux

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last minor nitpick. As it is just for removing comment, please wait while current builds finish before making any changes, so that we do not need to launch builds again.

@@ -338,21 +338,29 @@ xmaxxminHelper(TR::Node* node, TR::CodeGenerator* cg, TR::InstOpCode::Mnemonic c
generateS390LabelInstruction(cg, TR::InstOpCode::label, node, cFlowRegionStart);
cFlowRegionStart->setStartInternalControlFlow();

//Checking common Condition Code and branching to cFlowRegionEnd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any relevance of this comment. It is checking for the condition passed into this common utility function. Pls remove it.

@sarwat12
Copy link
Contributor Author

sarwat12 commented Feb 6, 2024

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @sarwat12 Given that we still could not get this working on z/OS (Hence could not enable the test on z/OS). Can you confirm the tests are enabled on Linux on Z?
Also please open up an issue to document what we observed on z/OS to make sure we do come back to those issues.

@r30shah
Copy link
Contributor

r30shah commented Feb 7, 2024

@sarwat12 commit body for second commit should adhere to commit guideline

Also, Please update the commit title to reflect both facts (Enabling on LoZ and disabling on z/OS)

@r30shah
Copy link
Contributor

r30shah commented Feb 7, 2024

Jenkins build zos,zlinux

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@r30shah
Copy link
Contributor

r30shah commented Feb 7, 2024

@hzongaro Can We request you to review and merge these changes?

Copy link
Contributor

@hzongaro hzongaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes look correct. Just a small suggestion regarding comments.

Also, the commit guidelines recommend that the title of a commit be less than 70 characters, and that each line in the body of the commit be at most 72 characters, if possible. May I ask you to adjust the comments accordingly?

Finally, as tests are still failing on z/OS, please remove the "Closes: #5157" comment.

compiler/z/codegen/ControlFlowEvaluator.cpp Show resolved Hide resolved
- Enables the Float & Double MaxMin tests on Linux on Z
- Disables them on z/OS, since TRIL parser cannot handle NaN values

Closes: eclipse-omr#5157

Signed-off-by: Sarwat Shaheen <sarwat.shaheen@yahoo.com>
Adds support to the S390 MaxMin evaluator to handle NaN operands

- Check f/d operands for NaN and set return register to NaN
- Avoid extra load instructions for NaN, if operands are equal

Closes: eclipse-omr#5157

Signed-off-by: Sarwat Shaheen sarwat.shaheen@yahoo.com
Copy link
Contributor

@hzongaro hzongaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@hzongaro
Copy link
Contributor

Most recent changes only added some comments and updated commit message, so no need to rerun testing. Merging.

@hzongaro hzongaro merged commit 0966eb3 into eclipse-omr:master Feb 14, 2024
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review fmin / fmax / dmin / dmax w.r.t NaN values on S390
3 participants