Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86 Processor Flags Refinement #4183

Merged
merged 3 commits into from Aug 27, 2019

Conversation

@dsouzai
Copy link
Contributor

commented Jul 31, 2019

This PR is opened to facilitate allowing relocatable compilations to be loaded on a machine different from the one it was generated on. The idea is to mask out all the features the compiler does not care about. This way, as long as the machine (on which one wishes to run relocatable code) has all the necessary features that the compiler was aware of when the code was generated, the code will be valid to execute on the machine.

To make this future proof, this PR also changes the way the processor flags are tested; an assert is added for flags that aren't part of the mask.

@dsouzai

This comment has been minimized.

Copy link
Contributor Author

commented Jul 31, 2019

@andrewcraik @0xdaryl could you guys take a look? I've yet to run more extensive testing, but I figure it's worth getting your opinions on this change.

@andrewcraik

This comment has been minimized.

Copy link
Contributor

commented Jul 31, 2019

I have to say I'm not a fan of the removals from the enums - the constants are described in the x86 architecture documents and having them accessible in the codegen etc make sense. I also don't think that removing the query APIs is the right thing to do. Wouldn't it make more sense to record the set of thing we care about and just let the codegen do its thing? I worry this will be fragile because people may add support for a flag without updating the mask function which is in a completely different part of the code.

@dsouzai

This comment has been minimized.

Copy link
Contributor Author

commented Jul 31, 2019

Wouldn't it make more sense to record the set of thing we care about and just let the codegen do its thing? I worry this will be fragile because people may add support for a flag without updating the mask function which is in a completely different part of the code.

The reason I figured removing them was safer is because if the API existed, someone could make use of an API and completely forget that there's a mask that needs to be updated. By removing the API, it forces the dev to uncomment the enum, at which point someone (either the dev or the committer) will hopefully see the comment at the top of each of those enums which says to update the mask.

Also the reason I couldn't add the mask in the .hpp file is because being a .hpp file, it complained about multiple definitions when I added this to the .hpp file:

   uint32_t featureFlagsMask =    TR_BuiltInFPU
                                | TR_CMPXCHG8BInstruction
                                | TR_CMOVInstructions
                                | TR_MMXInstructions
                                | TR_SSE
                                | TR_SSE2;

However, I'm open to ideas to address these concerns.

@andrewcraik

This comment has been minimized.

Copy link
Contributor

commented Aug 1, 2019

so we could have a static function in the header to return the mask so that the def could live in the header (if we can't come up with anything better). I would still prefer to see a mechanism whereby we keep track of the features queried and just keep the most restrictive set seen somewhere in the SCC / validation records. Each time we commit a compile we make sure the more restrictive of the current compile and the record in the SCC is updated...

Failing that then keeping the APIs but adding a fatal assert checking if we are doing an AOT compile and we ask for a feature not in the mask you get a helpful message telling you to add it to the mask...

@dsouzai

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2019

I would still prefer to see a mechanism whereby we keep track of the features queried and just keep the most restrictive set seen somewhere in the SCC / validation records. Each time we commit a compile we make sure the more restrictive of the current compile and the record in the SCC is updated...

That's definitely a good approach but I think that's quite a bit more work because it will require some coordination to ensure consistency on all platforms. I think for now to fix the existing problem quickly, I'll just add the static functions + an assert to ensure that the flag is added to the appropriate place.

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
@dsouzai dsouzai force-pushed the dsouzai:x86FeatureFlags branch from 271263f to 0b64e0f Aug 15, 2019
@@ -223,6 +224,12 @@ struct TR_X86ProcessorInfo
friend class OMR::X86::CodeGenerator;

void initialize();
void assertUnused()

This comment has been minimized.

Copy link
@andrewcraik

andrewcraik Aug 21, 2019

Contributor

Wouldn't a better name be something like ensureRelocatableCompileSupport or something like that? Could we just create functions to wrap the testAny for each of the feature flags and check the mask so we don't have to remove the assertUnused - we just have to add the flag to the mask?

In order to faciliate allowing relocatable compilations to be loaded on
a machine different from the one it was generated on, this commit masks
out all the features the compiler does not care about; this way, as long
as the machine (on which one wishes to run relocatable code) has all the
necessary features that the compiler was aware of when the code was
generated, the code will be valid to execute on the machine.

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
@dsouzai dsouzai force-pushed the dsouzai:x86FeatureFlags branch from 0b64e0f to 40f9d63 Aug 21, 2019
@dsouzai

This comment has been minimized.

Copy link
Contributor Author

commented Aug 21, 2019

@andrewcraik I haven't run the full suite of tests yet, but how does this update look to you?

This commit ensures adds a fatal assert to all the processor feature
queries that the compiler currently does not make use of. The purpose of
this is to ensure that if in the future, one of these queries is used,
the getFeatureFlagsMask (or variant) function is updated to ensure
correctness on relocatable compilations.

Signed-off-by: Irwin D'Souza <dsouzai.gh@gmail.com>
@dsouzai dsouzai force-pushed the dsouzai:x86FeatureFlags branch from 40f9d63 to 6fcbe9f Aug 21, 2019
@dsouzai dsouzai marked this pull request as ready for review Aug 23, 2019
@dsouzai dsouzai requested review from 0xdaryl and mstoodle as code owners Aug 23, 2019
@dsouzai

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2019

I ran personal tests just as a preemptive step and everything looked fine. I also verified that with this change, OpenJ9's AOT code validation isn't as conservative (while still be functionally correct).

@andrewcraik do you mind reviewing?

@andrewcraik

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2019

@dsouzai I think this is a reasonable compromise. One last thing you may want to check is that we don't ignore the HTM flag on some processors with known microcode bugs - for the Intel processors with bugs we disabled support I believe incase the CPUs had not had their microcode patched. Otherwise I think this makes sense. Should we also filter on vendor since things like NOP sequences differ between vendors in the code gen?

@andrewcraik

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2019

@genie-omr build all

@dsouzai

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2019

One last thing you may want to check is that we don't ignore the HTM flag on some processors with known microcode bugs - for the Intel processors with bugs we disabled support I believe incase the CPUs had not had their microcode patched.

The code as it stands doesn't ignore the HTM processor feature flag (the enum is TR_RTM). The compiler uses

   bool getSupportsTM() { return _flags4.testAny(SupportsTM);}
   void setSupportsTM() { _flags4.set(SupportsTM);}

which are TR::CodeGenerator methods. The compiler queries _targetProcessorInfo.supportsTM() first to see if the target CPU even has the HTM feature before deciding whether to disable it based on processor.

Should we also filter on vendor since things like NOP sequences differ between vendors in the code gen?

We could, but it shouldn't be functionally necessary as both AMD and Intel implement the same x86 ISA. If we did differentiate between the two, we might fail the load of a relocatable compile just because the NOP sequences are not optimal; it then becomes a question whether those NOP sequences are worth failing the load & doing a JIT compile instead. Either way though, that change deserves to be in its own PR.

@andrewcraik

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2019

Ok I'm fine with a separate PR - the AMD sequences are functionally different because we use multiple instructions rather than the single instruction on intel, but I'm fine to move that off to another place.

@dsouzai

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2019

the AMD sequences are functionally different because we use multiple instructions rather than the single instruction on intel

Does that mean if one was to generate (relocatable) code on Intel and run it on an AMD machine, the code wouldn't execute in an functionally equivalent way? Or is it just that the code sequence is no longer as optimal?

@andrewcraik

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2019

I believe it may not be functionally equivalent.

@dsouzai

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2019

I believe it may not be functionally equivalent.

Ok I'll open an issue on OpenJ9 to ensure we keep track of the vendor.

@dsouzai

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2019

Opened eclipse/openj9#6836 .

@andrewcraik this PR should be ok to merge now; all tests passed.

@andrewcraik

This comment has been minimized.

Copy link
Contributor

commented Aug 27, 2019

Given the follow-on issue this is indeed good to land.

@andrewcraik andrewcraik merged commit 4e1c98c into eclipse:master Aug 27, 2019
14 checks passed
14 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/eclipse-omr/pr/aix_ppc-64 Build finished.
Details
continuous-integration/eclipse-omr/pr/linux_390-64 Build finished.
Details
continuous-integration/eclipse-omr/pr/linux_aarch64 Build finished.
Details
continuous-integration/eclipse-omr/pr/linux_arm Build finished.
Details
continuous-integration/eclipse-omr/pr/linux_ppc-64_le_gcc Build finished.
Details
continuous-integration/eclipse-omr/pr/linux_x86 Build finished.
Details
continuous-integration/eclipse-omr/pr/linux_x86-64 Build finished.
Details
continuous-integration/eclipse-omr/pr/linux_x86-64_cmprssptrs Build finished.
Details
continuous-integration/eclipse-omr/pr/osx_x86-64 Build finished.
Details
continuous-integration/eclipse-omr/pr/win_x86-64 Build finished.
Details
continuous-integration/eclipse-omr/pr/zos_390-64 Build finished.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
eclipsefdn/eca The author(s) of the pull request is covered by necessary legal agreements in order to proceed!
Details
@andrewcraik andrewcraik self-assigned this Aug 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.