Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fix recent x86 SIMD regressions #8414

Merged
merged 1 commit into from
Dec 1, 2016

Conversation

BruceForstall
Copy link
Member

  1. Recent PUTARG_STK work didn't consider SIMD arguments.
  2. SSE3_4 work caused underestimation of instruction sizes for SSE4
    instructions (e.g., pmulld).

1. Recent PUTARG_STK work didn't consider SIMD arguments.
2. SSE3_4 work caused underestimation of instruction sizes for SSE4
instructions (e.g., pmulld).
@BruceForstall
Copy link
Member Author

@sivarv @CarolEidt PTAL
cc @dotnet/jit-contrib

@sivarv
Copy link
Member

sivarv commented Dec 1, 2016

Looks good to me.

#ifdef _TARGET_AMD64_
// If Byte 4 (which is 0xFF00) is non-zero, that's where the RM encoding goes.

// If Byte 4 (which is 0xFF00) is zero, that's where the RM encoding goes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly, I believe that the previous comment is actually correct, if confusing. The encoding for some reason puts byte 4 in byte 3 (though I would probably have zero-based the counting and called it byte3. Anyway, see instrsxarch.h near line 139 where there's a comment that talks about this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider wording it as, "If Byte 4, which is encoded in bits 15-8 of the mask (so we and with 0xFF00)," ...


In reply to: 90478740 [](ancestors = 90478740)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt The original comment said that if Byte 4 (at mask 0xFF00) is zero, then the size is 5. But that's obviously the opposite of what the code does. My comment change actually reflects what the code does. And I think it reflects that comment in instrsxarch.h as well: in the RM encoding column, we leave the RM slot zero except for the case of 4-byte opcodes that need RM, where all the bytes are already filled with opcode. In that case, we need some special code to determine the RM version size.

It looks like this code is basically a hack for handling the case of a 5-byte opcode + RM in the table, where we don't actually have 5 bytes in the table.

Am I still confused?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're not confused - I am. It's unfortunate that the comment begins with the opposite of the condition being checked. So NVM (unless you'd like to change the comment to "If Byte 4 (which is 0xFF00) is non-zero, then it will be placed after the 4 byte encoding, making the total 5 bytes. Otherwise, that byte is where the RM encoding goes." Sorry for my confusion.


In reply to: 90501469 [](ancestors = 90501469)

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a change to a comment, LGTM.

@BruceForstall BruceForstall merged commit eed8ab2 into dotnet:master Dec 1, 2016
@BruceForstall BruceForstall deleted the FixSimdRegressions branch December 1, 2016 19:04
@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…essions

Fix recent x86 SIMD regressions

Commit migrated from dotnet/coreclr@eed8ab2
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants