Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86: support Return SIMD8. #46899

Merged
merged 5 commits into from
Jan 19, 2021
Merged

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Jan 13, 2021

Support trees like:

N002 (  4,  3) [000055] ------------              *  RETURN    simd8  $304
N001 (  3,  2) [000054] -------N----              \--*  LCL_VAR   simd8 <System.Numerics.Vector2> V09 tmp3         u:1 (last use) $302

that we return in EAX/EDX.

Also, fix a few non-functional bugs like wrong printing or wrong instruction attributes that are not read.
Unblocks #46238

@sandreenko sandreenko added the NO-REVIEW Experimental/testing PR, do NOT review it label Jan 13, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 13, 2021
@sandreenko
Copy link
Contributor Author

/azp list

@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sandreenko
Copy link
Contributor Author

as expected x86 does not use it currently and x64 uses only for simd16. The types were wrong for x64 but they were not used.

@sandreenko sandreenko marked this pull request as ready for review January 15, 2021 00:50
@sandreenko sandreenko removed the NO-REVIEW Experimental/testing PR, do NOT review it label Jan 15, 2021
@sandreenko sandreenko changed the title test x86: support Return SIMD8. Jan 15, 2021
@sandreenko
Copy link
Contributor Author

PTAL @dotnet/jit-contrib , cc @jkoritzinsky

Copy link
Member

@jkoritzinsky jkoritzinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM based on my knowledge. Just one comment to clarify.

// reg0 = opReg[31:0]
inst_RV_RV(ins_Copy(opReg, TYP_INT), reg0, opReg, TYP_INT);
// reg1 = opRef[63:32]
inst_RV_RV_IV(INS_pextrd, EA_4BYTE, reg1, opReg, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we guaranteed to be able to use an SSE4.1 instruction here?

Copy link
Member

@jkoritzinsky jkoritzinsky Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, we should be able to use PEXTRW twice along with a 16 bit shift to emulate it. (IIRC SSE2 is our minimum instruction set support for x86)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find 5.0 requirements for some reason, @echesakovMSFT or @tannergooding probably know better.

Copy link
Member

@jkoritzinsky jkoritzinsky Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case SSE2 is our minimum, here's an instruction sequence that I think would be SSE2-compatible with equivalent results (given the return value is in xmm0) to save you time if needed:

pextrw  eax, xmm0, 3
shl     eax, 16
pextrw  edx, xmm0, 2
or      edx, eax
movd    eax, xmm0

Copy link
Contributor

@echesakov echesakov Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree with @jkoritzinsky - you would need to check for compExactlyDependsOn(InstructionSet_SSE41) before emitting pextrd in a similar way as it is done here

if (!compExactlyDependsOn(InstructionSet_SSE41))

In addition, please make sure that we don't emit that instruction during crossgen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you want compOpportunisticallyDependsOn. The SSE2 vs SSE4.1 paths are equivalent, it's just that the SSE4.1 path is "faster" and should be preferred when it is available.

For crossgen1 SSE4.1 should be false, for crossgen2, it should depend on the ISA configuration (which I believe currently defaults to SSE4.1) and should flag the method as "uses SSE4.1" in which case it will be rejitted if false.

IIRC, there are some special considerations when it's all internal and therefore a IsSupported check doesn't exist, but @jkotas and @davidwrighton are the two most familiar with this (David wrote the original logic and Jan made some recent changes in the area).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you want compOpportunisticallyDependsOn

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compOpportunisticallyDependsOn is what you want and it will do exactly the right thing when using crossgen2, but I believe it is not safe for this kind of purpose when compiling S.P.C with crossgen1. As we haven't quite gotten rid of crossgen1, you'll need to find some way to not generate the SSE4.1 support when crossgening S.P.C with original crossgen, or at least, wait until #47019 is in and enabled for both X86 and X64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to merge this PR as is with unsafe behavior for x86 old machines because:

  1. this code is emitted only for PInvoke stubs that we don't crossgen, so no real issue here;
  2. it could only affect old machines;
  3. it will automatically be fixed with a full switch to crossgen2;

the alternative is to merge the change with SSE2 and open an issue to return SSE4.1 code after crossgen1 deprecation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good news. Crossgen2 was put in place to replace compilation for S.P.C over the weekend. The only time that S.P.C should now be compiled with crossgen1 is in self-contained app build scenarios before we transition those over as well, and as you state there is no actual use of this feature in S.P.C.

In light of that, I believe this is ok to check in.

@sandreenko
Copy link
Contributor Author

Thanks, @tannergooding, the PR was updated based on your review.

This should probably go through inst_RV_TT_IV.

The jit does not currently have a clear contract when to call GetEmitter()->emitIns vs CodeGen-> inst_* and it is often confusing. I did not want to useinst_RV_TT_IV there because it takes a tree node when we already know the register but it is fine.

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sandreenko sandreenko merged commit 5a91b0c into dotnet:master Jan 19, 2021
@sandreenko sandreenko deleted the genSIMDSplitReturn branch January 19, 2021 23:35
@ghost ghost locked as resolved and limited conversation to collaborators Feb 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants