Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

try regMask to unsigned __int128 for linux/arm64 #94589

Closed
wants to merge 14 commits into from

Conversation

kunalspathak
Copy link
Member

Just to see the TP impact

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 9, 2023
@ghost ghost assigned kunalspathak Nov 9, 2023
@ghost
Copy link

ghost commented Nov 9, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Just to see the TP impact

Author: kunalspathak
Assignees: kunalspathak
Labels:

area-CodeGen-coreclr

Milestone: -

//
uint32_t BitOperations::PopCount(unsigned __int128 value)
{
return BitOperations::PopCount(static_cast<uint64_t>(value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return BitOperations::PopCount(static_cast<uint64_t>(value));
return BitOperations::PopCount(static_cast<uint64_t>(value >> 64)) + BitOperations::PopCount(static_cast<uint64_t>(value));

Shouldn't this check all the bits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should. Currently, since we just use 64-bits, I want to quickly prototype and see the TP impact of changing the regMaskTP data type to 128-bits. I will be cleaning few things here and other places (e.g. BitScanForward) before marking it "ready for review".

@kunalspathak
Copy link
Member Author

After spending lot of time investigating the failures, here is what I found:

  • There are no failures when the arm64 binaries are compiled natively on arm64 machine.
  • The failures only repros when cross compiled binary of arm64 i.e. libclrjit_universal_arm64_x64.so are used. We use them during compiling System.Private.CoreLib.dll using crossgen2 in the CI (and hence the failures) or when using libclrjit_universal_arm64_x64.so in tpdiff pipeline. The reason for failures is that __int128 assumes that the memory is 16-byte aligned and hence the compiler generates movaps instruction instead of movups. I tried to objdump libclrjit_universal_arm64_x64.so before / after changes and we can see the difference:
image When we segfault, I see the memory is not 16-byte aligned: image

So it seems that this might not be a possible option, but i will keep digging on alternatives.

On side note, if we decide to just enable something on linux without having a native windows msvc support, we will have to add leg in superpmi-replay (which verifies libclrjit.so and libclrjit_universal_arm64_x64.so). Currently, superpmi-replay pipeline only verifies on binaries cross-compiled by windows.

@jakobbotsch
Copy link
Member

When we segfault, I see the memory is not 16-byte aligned

This sounds like a JIT bug. The JIT's allocator needs to guarantee 16 byte alignment for types with alignof(x) == 16. Sounds like it doesn't do that.

@kunalspathak
Copy link
Member Author

kunalspathak commented Nov 17, 2023

When we segfault, I see the memory is not 16-byte aligned

This sounds like a JIT bug. The JIT's allocator needs to guarantee 16 byte alignment for types with alignof(x) == 16. Sounds like it doesn't do that.

yes, we have this today that could change.

    // Ensure that we always allocate in pointer sized increments.
    size = roundUp(size, sizeof(size_t));

But from what I am discovering, this will impact memory consumption a lot. E.g. GenTree has regMaskSmall field and hence its size on arm64 will increase from 88 bytes to 96 bytes and likewise for several other data structures. So I am reconsidering if this is a good solution or not.

@jakobbotsch
Copy link
Member

But from what I am discovering, this will impact memory consumption a lot. E.g. GenTree has regMaskSmall field and hence its size on arm64 will increase from 88 bytes to 96 bytes and likewise for several other data structures. So I am reconsidering if this is a good solution or not

Makes sense... Yeah, it does sound expensive. On a side note having such a mask in GenTree seems like a good place to try to regain some memory if we can move it somewhere else... Sounds like we're paying a lot of memory for something we only rarely use.

@kunalspathak
Copy link
Member Author

having such a mask in GenTree seems like a good place to tr

Exactly my thought when I noticed gtRsvdRegs. I will see if I can move it around or replace with something cheaper.

@kunalspathak
Copy link
Member Author

I will work on a different idea of splitting regMaskTP in regMaskInt, regMaskFloat, etc.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants