Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent cmake settings for TLH_PREFETCH_FTA #18870

Open
knn-k opened this issue Feb 1, 2024 · 21 comments
Open

Inconsistent cmake settings for TLH_PREFETCH_FTA #18870

knn-k opened this issue Feb 1, 2024 · 21 comments

Comments

@knn-k
Copy link
Contributor

knn-k commented Feb 1, 2024

There are flags OMR_GC_TLH_PREFETCH_FTA and J9VM_GC_TLH_PREFETCH_FTA.
Settings of these flags in cmake files are complicated, and look inconsistent for some platforms.

runtime/cmake/caches/common.cmake enables J9VM_GC_TLH_PREFETCH_FTA. The file does not have an entry for OMR_GC_TLH_PREFETCH_FTA.

set(J9VM_GC_TLH_PREFETCH_FTA ON CACHE BOOL "")

Platform OMR_GC_TLH_PREFETCH_FTA J9VM_GC_TLH_PREFETCH_FTA
linux_x86-64 ON (explicit) ON (common)
aix_ppc-64 (no setting) OFF (explicit)
linux_ppc-64_le OFF (explicit) ON (common)
zos_390-64 OFF (explicit) OFF (explicit)
linux_390-64 (no setting) ON (common)
linux_aarch64 (no setting) ON (common)
@knn-k
Copy link
Contributor Author

knn-k commented Feb 1, 2024

It seems the OMR_GC_TLH_PREFETCH_FTA value in the generated omrcfg.h reflects the J9VM_GC_TLH_PREFETCH_FTA value, as far as I tried on AArch64 Linux turning the J9VM_ flag on and off.

What happens on linux_ppc-64_le, where J9VM_GC_TLH_PREFETCH_FTA is ON and OMR_GC_TLH_PREFETCH_FTA is OFF?
Can anybody look at OMR_GC_TLH_PREFETCH_FTA in the omrcfg.h for linux_ppc-64_le, please?

@knn-k
Copy link
Contributor Author

knn-k commented Feb 1, 2024

OMR_GC_TLH_PREFETCH_FTA is defined in the generated omrcfg.h when J9VM_GC_TLH_PREFETCH_FTA is ON and OMR_GC_TLH_PREFETCH_FTA is OFF in cmake files. J9VM_GC_TLH_PREFETCH_FTA setting seems to override OMR_GC_TLH_PREFETCH_FTA.

I think setting the OMR_GC_TLH_PREFETCH_FTA value in cmake files makes it confusing.
How about removing OMR_GC_TLH_PREFETCH_FTA lines from cmake files?

@knn-k
Copy link
Contributor Author

knn-k commented Feb 1, 2024

@zl-wang Do you have any thoughts on:

  • Why linux_ppc-64_le sets OMR_GC_TLH_PREFETCH_FTA to OFF while J9VM_GC_TLH_PREFETCH_FTA is ON
  • Why aix_ppc-64 sets J9VM_GC_TLH_PREFETCH_FTA to OFF

linux_ppc-64_le:

set(OMR_GC_TLH_PREFETCH_FTA OFF CACHE BOOL "")

aix_ppc-64:

set(J9VM_GC_TLH_PREFETCH_FTA OFF CACHE BOOL "")

@knn-k
Copy link
Contributor Author

knn-k commented Feb 2, 2024

AArch64 platforms (Linux and macOS): J9VM_GC_TLH_PREFETCH_FTA is enabled, but the JIT compiler does not use the tlhPrefetchFTA field in J9VMThread at all.

@zl-wang
Copy link
Contributor

zl-wang commented Feb 2, 2024

It looks to me that it is off in all cases: (my search results below)
FILE: /team/zlwang/repositories/openj9/runtime/cmake/caches/aix_ppc-64.cmake

set(J9VM_GC_TLH_PREFETCH_FTA OFF CACHE BOOL "")
FILE: /team/zlwang/repositories/openj9/runtime/cmake/caches/linux_ppc-64_le.cmake

set(OMR_GC_TLH_PREFETCH_FTA OFF CACHE BOOL "")

It is indeed confusing. Why it is even relevant to OMR for a prefetch scheme dependent on J9VMThread support. In my opinion, this should not be visible in OMR side.

@dmitripivkine
Copy link
Contributor

It is indeed confusing. Why it is even relevant to OMR for a prefetch scheme dependent on J9VMThread support. In my opinion, this should not be visible in OMR side.

I think it was an attempt to provide definition in OMR because GC TLH code lives there (in combination with desire to keep OMR independent upstream project). So, either we need to keep OMR_* and J9VM_* settings synchronized or use compromise and check J9VM_GC_TLH_PREFETCH_FTA in OMR code directly. Quick search shows we do have J9 definitions usage in OMR code already.

@knn-k
Copy link
Contributor Author

knn-k commented Feb 2, 2024

@zl-wang Note that the symbol in aix_ppc-64.cmake starts with J9VM_, and the one in linux_ppc-64_le.cmake starts with OMR_. They are different symbols.
Please also look at the definitions of those symbols in j9cfg.h and omrcfg.h that are generated during the build.

@knn-k
Copy link
Contributor Author

knn-k commented Feb 7, 2024

The following files in buildspecs set the gc_tlhPrefetchFTA flag:

  • linux_aarch64
  • linux_arm
  • linux_riscv64
  • linux_x86-64
  • linux_x86
  • osx_x86-64
  • win_x86-64
  • win_x86

linux_ppc-64_le.spec and linux_390-64.spec don't set the flag, which are inconsistent with the setting in common.cmake.

@knn-k
Copy link
Contributor Author

knn-k commented Feb 7, 2024

Looking back the histories of .cmake files:

Power

Z

  • zos_390-64.cmake: The two symbols are turned off by the initial cmake file for zos by CMake: Add initial zos caches #11994
  • linux_390-64.cmake: No specific settings for the two symbols. J9VM_GC_TLH_PREFETCH_FTA is enabled by common.cmake.

@knn-k
Copy link
Contributor Author

knn-k commented Feb 7, 2024

@joransiu FYI.

@knn-k
Copy link
Contributor Author

knn-k commented Feb 8, 2024

I ran personal builds to see if OMR_GC_TLH_PREFETCH_FTA is defined or not on Power and Z. AIX, z/OS, and Linux have different values as shown below.

  • ppc64_aix: Undefined
  • ppc64le_linux: Defined (The cmake file sets it to OFF, but that is ignored)
  • s390x_linux: Defined
  • s390x_zos: Undefined

(internal job: view/OpenJ9%20-%20Personal/job/Pipeline-Build-Test-Personal/20537/)

@zl-wang
Copy link
Contributor

zl-wang commented Feb 8, 2024

we don't need to set the macro(s) consistently across platforms, but of course setting them consistently on each single platform is a good step to take.

In order to realize the performance benefit of TLH-prefetch which has been dropped out of favour gradually in the past while, you need to walk a fine line anyway between micro-architecture and codegen. firstly, if you can gain benefit from TLH batch clearing which is default on p at least, TLH-prefetch is an overlapping action by and large. secondly, prefetch instructions with temporal characteristics were the ones having performance benefits from past experiences (since hardware prefetch does pretty much the same thing as explicit vanilla prefetch instructions and without consuming instruction cache capacity).

@knn-k
Copy link
Contributor Author

knn-k commented Feb 8, 2024

@zl-wang I can change PR #18918 not to disable J9VM_GC_TLH_PREFETCH_FTA in linux_ppc-64_le.cmake. Is that what you mean, as the first step?

@dmitripivkine
Copy link
Contributor

if you can gain benefit from TLH batch clearing which is default on p at least, TLH-prefetch is an overlapping action by and large

Good point. As I understand we are about to enable TLH batch clearing on X and ARM. So, relation between TLH batch clearing and TLH-prefetch should be considered.

@knn-k
Copy link
Contributor Author

knn-k commented Feb 9, 2024

I updated PR #18918. It does not change J9VM_GC_TLH_PREFETCH_FTA in linux_ppc-64_le.cmake and in linux_390-64.cmake now.

@knn-k
Copy link
Contributor Author

knn-k commented Feb 9, 2024

See J9VM_GC_TLH_PREFETCH_FTA settings after the change. The flag is turned off only on aix and zos.

Platform J9VM_GC_TLH_PREFETCH_FTA
x86 and x86-64 platforms ON (common)
aarch64 platforms ON (common)
aix_ppc-64 OFF
linux_ppc-64_le ON (common)
zos_390-64 OFF
linux_390-64 ON (common)

@knn-k
Copy link
Contributor Author

knn-k commented Feb 9, 2024

As I understand we are about to enable TLH batch clearing on X and ARM.

TLH batch clearing is enabled by default on AArch64.
common.cmake enables J9VM_GC_BATCH_CLEAR_TLH on all the platforms.

set(J9VM_GC_BATCH_CLEAR_TLH ON CACHE BOOL "")

On x86 platforms, you need to set the env variable TR_EnableBatchClear to allocate zeroed TLH pages. Other platforms have the env variable TR_DisableBatchClear.

#ifdef TR_HOST_X86
static char *enableBatchClear = feGetEnv2("TR_EnableBatchClear", (void *)vm);

@dmitripivkine
Copy link
Contributor

As I understand we are about to enable TLH batch clearing on X and ARM.

TLH batch clearing is enabled by default on AArch64. common.cmake enables J9VM_GC_BATCH_CLEAR_TLH on all the platforms.

set(J9VM_GC_BATCH_CLEAR_TLH ON CACHE BOOL "")

On x86 platforms, you need to set the env variable TR_EnableBatchClear to allocate zeroed TLH pages. Other platforms have the env variable TR_DisableBatchClear.

#ifdef TR_HOST_X86
static char *enableBatchClear = feGetEnv2("TR_EnableBatchClear", (void *)vm);

I believe Julian's point is: using TLH Batch clearing can make TLH Prefetching less effective significantly. Another words if we zero every TLH memory up front prefetching might be not necessary.

@knn-k
Copy link
Contributor Author

knn-k commented Feb 13, 2024

My points are as follows:

  • Setting OMR_GC_TLH_PREFETCH_FTA in .cmake files is useless and confusing. J9VM_GC_TLH_PREFETCH_FTA overriddes it during the build. PR Clean up GC_TLH_PREFETCH_FTA flag settings in cmake files #18918 fixes it.
  • Power platforms have different J9VM_GC_TLH_PREFETCH_FTA settings between Linux (ON) and AIX (OFF). Is there any reason for that?
  • Z platforms have different J9VM_GC_TLH_PREFETCH_FTA settings between Linux (ON) and z/OS (OFF). Is there any reason for that?
  • gc_tlhPrefetchFTA is disabled in UMA builds for pLinux and zLinux. They are inconsistent with CMake builds for those platforms.

I would like someone to review #18918.

@knn-k
Copy link
Contributor Author

knn-k commented Feb 15, 2024

Additional finding:
AIX JIT accesses J9VMThread->tlhPrefetchFTA, but the field is not updated by the GC side because J9VM_GC_TLH_PREFETCH_FTA is disabled on AIX.

p codegen accessing tlhPrefetchFTA: (There are some other locations)

iCursor = generateTrg1MemInstruction(cg,TR::InstOpCode::Op_load, node, temp1Reg,
TR::MemoryReference::createWithDisplacement(cg, metaReg, offsetof(J9VMThread, tlhPrefetchFTA), TR::Compiler->om.sizeofReferenceAddress()), iCursor);

GC code updating tlhPrefetchFTA: (There are some other locations)

#if defined(J9VM_GC_TLH_PREFETCH_FTA)
currentThread->tlhPrefetchFTA -= allocateSize;
#endif /* J9VM_GC_TLH_PREFETCH_FTA */

@knn-k
Copy link
Contributor Author

knn-k commented Feb 25, 2024

PR #18918 has been merged. J9VM_GC_TLH_PREFETCH_FTA and OMR_GC_TLH_PREFETCH_FTA have consistent values on every platform now.
It is up to Power and Z people to investigate the differences between AIX and pLinux, z/OS and zLinux, respectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants