Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Support assignments of multi-reg values to enregistered promoted structs #34105

Closed
CarolEidt opened this issue Mar 26, 2020 · 8 comments · Fixed by #36862
Closed

JIT: Support assignments of multi-reg values to enregistered promoted structs #34105

CarolEidt opened this issue Mar 26, 2020 · 8 comments · Fixed by #36862
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@CarolEidt
Copy link
Contributor

CarolEidt commented Mar 26, 2020

Currently, calls or other operations (such as MUL_LONG on 32-bit platforms) that return multiple registers must be stored (either to the stack or elsewhere). This means that promoted structs (whose fields have been promoted to allow them to be separately tracked, optimized and enregistered) will be forced to the stack.
This is the underlying issue in #5112 and #8571, but support for this would also enable MUL_LONG to be supported without forcing the destination to be in memory, as well as enabling hardware intrinsics such as Bmi2.MultiplyNoFlags to return a ValueTuple and allow the result to go directly to a pair of registers.

category:cq
theme:structs
skill-level:expert
cost:large

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Mar 26, 2020
@CarolEidt
Copy link
Contributor Author

@terrajobst - The idea of having an intrinsic return a ValueTuple was mentioned in the context of Bmi2.MultiplyNoFlags, but comments here: #11782 (comment) and here: #11782 (comment) indicated that it was not in keeping with API guidelines, yet in a recent API review I believe you indicated that it would be OK. Can you advise?

@terrajobst
Copy link
Member

Tuples: we can use tuples in APIs and as of recently we did in a few cases. The general guidance is: don't use them as parameters and only use them when the number of values if 4 or less and we're sure we never need to add more. For the most part, we prefer custom structs because we can add members.

@CarolEidt CarolEidt added the tenet-performance Performance related issue label Mar 26, 2020
@CarolEidt
Copy link
Contributor Author

Thanks @terrajobst. I'll proceed on the assumption that tuples would be a reasonable way to address the problem of returning multi-register values in intrinsics.

@tannergooding
Copy link
Member

A possible downside of tuples would be it takes the good name if we have a better type in the future.

For example, Multiply for x86 is int32 * int32 = int64, on so we don't need a tuple, but Multiply for x64 can be int64 * int64 = int128.
If we support it via a (long hi, ulong lo) tuple today, we will have to come up with an alternative name if/when we support a proper Int128/UInt128 type.

I don't think this should be a blocker, but it is something we might want to take into consideration when proposing these APIs.

@CarolEidt
Copy link
Contributor Author

@tannergooding - we already have that issue. I'm experimenting with supporting mulx with a tuple result, but we already have a version that takes two ulongs and returns a single ulong. So for the purposes of experimentation I've added a MultiplyNoFlags2. Just curious why the 32-bit version doesn't provide a ulong result - is it to be consistent with the 64-bit version, or was it due to concerns with longs being forced to memory on 32-bit targets?

@tannergooding
Copy link
Member

Just curious why the 32-bit version doesn't provide a ulong result - is it to be consistent with the 64-bit version, or was it due to concerns with longs being forced to memory on 32-bit targets?

I think it was to mirror the C/C++ API actually: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=mulx

@CarolEidt
Copy link
Contributor Author

The fundamental issue here is that we need to be able to take a result from a single node that produces multiple registers, and assign them to the corresponding fields of a promoted struct. This is supported reasonably well currently when the target is going to be in memory anyway (on stack or otherwise). However when the target is a local struct that otherwise could be in registers, we force it to memory.

Multi-reg localVars

  • The primary issue is how to represent a definition of a struct localVar that has been promoted, but which is being defined by a node that produces multiple registers. Uses don't have quite the same issues; we're already dealing with these as GT_FIELD_LISTs.

    • Option 1: (What I'm experimenting with now) is to continue to represent these as they are currently done, as a single lclVar node. This is already handled in liveness analysis - it updates the liveness for the fields when the whole struct is used or defined. Issues:

      • I had to disable SSA, and therefore value numbering for these; unlike liveness it doesn't support the definition of multiple fields with a single GT_LCL_VAR node.
      • It currently doesn't separately annotate last uses where applicable. This would require additional flags, as is currently done for the spill flags on multireg calls.
      • I haven't added the spill flags, but they could be added similarly
      • It currently only handles two registers, and we'd need 4 to handle HFAs on Arm/Arm64. I think we can handle up to 4 by using regNumberSmalls.
    • Option2: Add a new kind of lclVar node - e.g. GT_LCL_VAR_TUPLE that explicitly represents N locals that are assigned or used together.

      • I think this winds up looking a lot like Option 1. It might have less impact on the paths that don't use multi-reg vars, but I think it would require a lot more changes - we'd pretty much have to handle this new node in all the places we handle lclVars today. And this would duplicate (or require replacement of) some of the support already in liveness, for example, that handles promoted structs.
    • Option3: Add a parent node that has GT_STORE_LCL_VAR children, like GT_FIELD_LIST

      • I think this has similar issues to GT_ASG. It requires the children of a node to consume values produced by another child of that node.

@dotnet/jit-contrib - thoughts?

P.S. There may be a better place for this discussion (suggestions welcome).

@sandreenko
Copy link
Contributor

Option 1: (What I'm experimenting with now) is to continue to represent these as they are currently done, as a single lclVar node. This is already handled in liveness analysis - it updates the liveness for the fields when the whole struct is used or defined.

I am experimenting with it as well, but my case is simpler: ASG(struct LCL_VAR with 1 promoted field, call struct). The master branch is currently lucky if the field type is long (x64), because in this case we do retyping struct parent LCL_VAR->long parent LCL_FIELD -> long child LCL_VAR, but after my struct changes it won't be enregistered.

I had to disable SSA, and therefore value numbering for these; unlike liveness it doesn't support the definition of multiple fields with a single GT_LCL_VAR node.

I was trying to substitute all manipulations with the parent LCL_VAR (that should be defs only, all usages should use fields, otherwise it can't be promoted independently). It was straightforward in valuenum, harder in SSA.

So I prefer option 1 with my current perspective because options 2 and 3 won't solve the case that I see (especially option 2).

@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 4, 2020
@BruceForstall BruceForstall added this to the 5.0 milestone Apr 4, 2020
CarolEidt added a commit to CarolEidt/runtime that referenced this issue Jun 10, 2020
Allow struct lclVars that are returned in multiple registers to be
enregistered, as long as the fields are a match for the registers.

Fix dotnet#34105
CarolEidt added a commit that referenced this issue Jun 11, 2020
* Enregister multireg lclVars

Allow struct lclVars that are returned in multiple registers to be
enregistered, as long as the fields are a match for the registers.

Fix #34105
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
sandreenko pushed a commit to sandreenko/runtime that referenced this issue Sep 2, 2021
Allow struct lclVars that are returned in multiple registers to be
enregistered, as long as the fields are a match for the registers.

Fix dotnet#34105
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants