-
Notifications
You must be signed in to change notification settings - Fork 5k
Using an argument as a Vector variable causes data to be copied back and forth #115075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Possibly a duplicate of #78883 |
Makes sense. Although I think aliasing issues notwithstanding, in vmovups ymmword ptr [rdx], ymm0
vmovups ymm0, ymmword ptr [rdx] Clearly the second Take another example public class TestClass
{
int z;
void M1(int y)
{
z *= y;
z *= y;
z *= y;
z *= y;
}
void M2(int y)
{
ref int x = ref z;
x *= y;
x *= y;
x *= y;
x *= y;
}
} https://godbolt.org/z/xYfsds798 TestClass:M1(int):this (FullOpts):
; Emitting BLENDED_CODE for X64 with AVX - Unix
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 this [V00,T00] ( 7, 7 ) ref -> rdi this class-hnd single-def <TestClass>
; V01 arg1 [V01,T01] ( 6, 6 ) int -> rsi single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V03 cse0 [V03,T02] ( 3, 3 ) int -> rax "CSE #01: aggressive"
; V04 cse1 [V04,T03] ( 3, 3 ) int -> rax "CSE #02: aggressive"
; V05 cse2 [V05,T04] ( 3, 3 ) int -> rax "CSE #03: aggressive"
;
; Lcl frame size = 0
;; size=0 bbWeight=1 PerfScore 0.00
mov eax, esi
imul eax, dword ptr [rdi+0x08]
mov dword ptr [rdi+0x08], eax
imul eax, esi
mov dword ptr [rdi+0x08], eax
imul eax, esi
mov dword ptr [rdi+0x08], eax
imul eax, esi
mov dword ptr [rdi+0x08], eax
;; size=27 bbWeight=1 PerfScore 15.25
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 28, prolog size 0, PerfScore 16.25, instruction count 10, allocated bytes for code 28 (MethodHash=0b0a3fd5) for method TestClass:M1(int):this (FullOpts)
; ============================================================
TestClass:M2(int):this (FullOpts):
; Emitting BLENDED_CODE for X64 with AVX - Unix
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
; V00 this [V00,T02] ( 3, 3 ) ref -> rdi this class-hnd single-def <TestClass>
; V01 arg1 [V01,T01] ( 6, 6 ) int -> rsi single-def
;# V02 OutArgs [V02 ] ( 1, 1 ) struct ( 0) [rsp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
; V03 tmp1 [V03,T00] ( 9, 18 ) byref -> rdi single-def "dup spill"
;
; Lcl frame size = 0
;; size=0 bbWeight=1 PerfScore 0.00
add rdi, 8
mov eax, esi
imul eax, dword ptr [rdi]
mov dword ptr [rdi], eax
mov eax, esi
imul eax, dword ptr [rdi]
mov dword ptr [rdi], eax
mov eax, esi
imul eax, dword ptr [rdi]
mov dword ptr [rdi], eax
imul esi, dword ptr [rdi]
mov dword ptr [rdi], esi
;; size=30 bbWeight=1 PerfScore 25.00
ret
;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code 31, prolog size 0, PerfScore 26.00, instruction count 13, allocated bytes for code 31 (MethodHash=8d69b9d6) for method TestClass:M2(int):this (FullOpts)
; ============================================================
|
This depends on the hardware and several other considerations. It isn't strictly a no-op as it zeros any "upper bits", such as on AVX-512 capable hardware. Likewise a load from memory requires presuming single-threaded state and no mutations may occur. The .NET memory model does allow some presumptions here, but it requires tracking of the relevant data, knowledge of whether it could alias or not, etc. For a real app calling |
Description
Consider
which generates
Note those unnecessary
No problem with local variables.
Configuration
.NET 9, also 10.0.100-preview.5.25224.4
Windows 11, x64
Cannot reproduce on https://godbolt.org/, possibly because they are using *nix.
Regression?
Not known
The text was updated successfully, but these errors were encountered: