Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lambdas and other generated classes in JITServer AOT cache #19549

Merged
merged 1 commit into from
Jun 21, 2024

Conversation

AlexeyKhrabrov
Copy link
Contributor

@AlexeyKhrabrov AlexeyKhrabrov commented May 24, 2024

This PR implements support for correctly and reliably serializing and deserializing/loading JITServer AOT cache methods that are defined by or refer to classes generated at runtime.

The current list of recognized types of generated classes includes:

  • Lambda classes;
  • Dynamic proxy classes;
  • Generated reflection accessors.

In each of these cases, the generated class name consists of a deterministic prefix followed by a suffix that can vary across JVMs.

Generated classes are identified across client JVMs using the combination of: the class loader ID (name of its first loaded class), the deterministic class name prefix, and the deterministic ROMClass hash. The latter is computed after re-packing the ROMClass to truncate all the instances of the class name string down to the deterministic class name prefix. In order to lookup generated classes at the client JVM during AOT deserialization and load, we maintain a mapping between <loader, prefix, hash> and the RAMClass, which is populated in the class load JIT hook, and invalidated in the class unload and redefinition hooks.

This mechanism can be disabled by passing the command line option -Xjit:aotCacheDisableGeneratedClassSupport to both the JITServer and the client JVMs.

@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Jun 13, 2024
@mpirvu mpirvu added this to In progress in JIT as a Service via automation Jun 13, 2024
@mpirvu mpirvu self-assigned this Jun 13, 2024
@mpirvu mpirvu added the comp:vm label Jun 13, 2024
@cjjdespres
Copy link
Contributor

Out of curiosity, what's the typical "class/class loader not found" rate for these runtime-generated classes during deserialization, or are those more or less gone now? Also, did you notice any increase in relocation failures?

runtime/compiler/control/JITServerHelpers.cpp Show resolved Hide resolved
runtime/compiler/runtime/JITServerROMClassHash.hpp Outdated Show resolved Hide resolved
runtime/compiler/control/JITServerHelpers.cpp Outdated Show resolved Hide resolved
runtime/compiler/control/JITServerHelpers.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good. I have some questions:

  1. How do we handle class redefinition?
  2. Is it possible to avoid computing the deterministic hash at the server by passing it from client to server and then cache it?
  3. Could you please add a high level "design" in the description of this PR?

runtime/compiler/control/JITServerHelpers.cpp Show resolved Hide resolved
runtime/compiler/runtime/JITServerAOTCache.cpp Outdated Show resolved Hide resolved
runtime/compiler/control/JITServerHelpers.cpp Show resolved Hide resolved
@mpirvu
Copy link
Contributor

mpirvu commented Jun 18, 2024

We should also have a way of disabling the entire mechanism in case there is something wrong with it in production.

@mpirvu
Copy link
Contributor

mpirvu commented Jun 18, 2024

A thought: is it possible for a generated ROMClass to refer to other generated ROMClasses (by name) and therefore encounter issues when computing the deterministic hash?

@AlexeyKhrabrov
Copy link
Contributor Author

Out of curiosity, what's the typical "class/class loader not found" rate for these runtime-generated classes during deserialization, or are those more or less gone now? Also, did you notice any increase in relocation failures?

If I remember correctly, both "class not found" and deserialization/relocation failures are reduced significantly. I'll collect some statistics.

@AlexeyKhrabrov
Copy link
Contributor Author

1. How do we handle class redefinition?

Good point. The idea is to handle redefinition transparently as a class unload followed by a new class load, but looking at HookedByTheJIT.cpp I'm not actually sure if the class load hook is called for the new version of the class after redefinition.

2. Is it possible to avoid computing the deterministic hash at the server by passing it from client to server and then cache it?

Possible and not too difficult, but with some added complexity. When the client sends class info to the server, it can include the deterministic hash (which will need to be associated with the RAMClass directly, which is not currently done). I think this optimization should be implemented in a separate PR rather than here.

@mpirvu
Copy link
Contributor

mpirvu commented Jun 18, 2024

For class redefinition we do not call the load/unload hooks, but there is a special hook.
From the server point of view, we treat the redefinition as an unload, and the server needs to delete everything that refers to the class being redefined. I think we can recompute the hash for the corresponding entry in GeneratedClassMap. The J9Class pointer stays the same as far as I know.

@mpirvu
Copy link
Contributor

mpirvu commented Jun 18, 2024

jenkins compile all jdk8

@mpirvu
Copy link
Contributor

mpirvu commented Jun 19, 2024

jenkins compile all jdk21

@mpirvu
Copy link
Contributor

mpirvu commented Jun 19, 2024

AcmeAir with JITServer AOT cache crashes during startup with the following stack trace:

_ZNK18TR_ClassLoaderInfo6equalsIL9TableKind1EEEbPKv+0x4 (0x00007EFDAE123634 [libj9jit29.so+0x210634])
_ZNK29TR_PersistentClassLoaderTable41lookupClassLoaderAssociatedWithClassChainEPv+0x73 (0x00007EFDAE123D43 [libj9jit29.so+0x210d43])
_ZN40TR_RelocationRecordProfiledInlinedMethod18preparePrivateDataEP20TR_RelocationRuntimeP19TR_RelocationTarget+0xd0 (0x00007EFDAE3E5EA0 [libj9jit29.so+0x4d2ea0])
_ZN24TR_RelocationRecordGroup16handleRelocationEP20TR_RelocationRuntimeP19TR_RelocationTargetP19TR_RelocationRecordPh+0x90 (0x00007EFDAE3DA8C0 [libj9jit29.so+0x4c78c0])
_ZN24TR_RelocationRecordGroup16applyRelocationsEP20TR_RelocationRuntimeP19TR_RelocationTargetPh+0xe0 (0x00007EFDAE3DB430 [libj9jit29.so+0x4c8430])
_ZN20TR_RelocationRuntime22relocateAOTCodeAndDataEPhS0_S0_S0_+0x25c (0x00007EFDAE3E78BC [libj9jit29.so+0x4d48bc])
_ZN20TR_RelocationRuntime29prepareRelocateAOTCodeAndDataEP10J9VMThreadP11TR_FrontEndPN2TR9CodeCacheEPK20J9JITDataCacheHeaderP8J9MethodbPNS4_7OptionsEPNS4_11CompilationEP17TR_ResolvedMethodPhP16TR_J9SharedCache+0x4c2 (0x00007EFDAE3E8102 [libj9jit29.so+0x4d5102])
_Z13remoteCompileP10J9VMThreadPN2TR11CompilationEP17TR_ResolvedMethodP8J9MethodRNS1_24IlGeneratorMethodDetailsEPNS1_28CompilationInfoPerThreadBaseE.localalias+0x2288 (0x00007EFDAE0D0858 [libj9jit29.so+0x1bd858])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x8b9 (0x00007EFDAE08D7D9 [libj9jit29.so+0x17a7d9])

@AlexeyKhrabrov
Copy link
Contributor Author

A thought: is it possible for a generated ROMClass to refer to other generated ROMClasses (by name) and therefore encounter issues when computing the deterministic hash?

I think it is possible, although probably rare in practice. We miss out on AOT cache for such classes/methods, but there are no correctness issues (class lookups will simply fail). Supporting such scenario would add complexity and overhead (e.g. require checking whether any classes named in the constant pool are generated), and I'm not sure it's worth it.

@mpirvu
Copy link
Contributor

mpirvu commented Jun 19, 2024

   //NOTE: Current implementation does not support lambda forms since there can be multiple identical
   // (except for the non-deterministic part of the class name) classes loaded in the same JVM instance.

So we can have several lambda forms loaded by the same class loader and identical in content?

@AlexeyKhrabrov
Copy link
Contributor Author

So we can have several lambda forms loaded by the same class loader and identical in content?

Yes. The only difference is the class name suffix. But if I understand correctly, AOT currently doesn't support them anyway, so we don't need to handle lambda forms at this point.

@mpirvu
Copy link
Contributor

mpirvu commented Jun 19, 2024

@tajila Since this PR has some small VM changes, please review or delegate the review for those parts. Thanks

@tajila tajila requested a review from hangshao0 June 19, 2024 13:48
@AlexeyKhrabrov
Copy link
Contributor Author

For class redefinition we do not call the load/unload hooks, but there is a special hook. From the server point of view, we treat the redefinition as an unload, and the server needs to delete everything that refers to the class being redefined.

Redefined classes still get invalidated/purged from the deserializer caches including the generated classes map:

if (auto deserializer = compInfo->getJITServerAOTDeserializer())
deserializer->invalidateClass(currentThread, classPair->oldClass);

So we miss out on AOT cache for the new class after redefinition, but I don't think this can result in any correctness issues.

I think we can recompute the hash for the corresponding entry in GeneratedClassMap.

Yes, but I think this in an edge case that should be handled in a separate PR.

The J9Class pointer stays the same as far as I know.

I thought there were two different mechanisms, and only under one of them the RAMClass stays the same. If it does stay the same, we can recompute the hash, otherwise handle it as "class unload" (JITServerAOTDeserializer::invalidateClass()) followed by a "class load" (JITServerAOTDeserializer::onClassLoad()).

@mpirvu
Copy link
Contributor

mpirvu commented Jun 19, 2024

I thought there were two different mechanisms, and only under one of them the RAMClass stays the same.

static bool classesAreRedefinedInPlace()
   {
   if(TR::Options::getCmdLineOptions()->getOption(TR_EnableHCR))
      return true;
   else return false;
   }

static bool methodsAreRedefinedInPlace()
   {
   // NOTE: making this return "true" will require careful thought.
   // Don't expect callers to respond properly.  At the time this comment was
   // written, we had never tested that configuration.  Consider calls to this
   // function just to be markers for places in the code that may require attention.
   //
   return false;
   }

Copy link
Contributor

@hangshao0 hangshao0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VM change looks good to me.

@hangshao0
Copy link
Contributor

FYI @ThanHenderson isLambdaClassName() and isLambdaFormClassName() is being updated here.

Copy link
Contributor

@ThanHenderson ThanHenderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my comments, LGTM.

runtime/util/shchelp_j9.c Outdated Show resolved Hide resolved
runtime/bcutil/ROMClassBuilder.cpp Outdated Show resolved Hide resolved
runtime/util/shchelp_j9.c Outdated Show resolved Hide resolved
runtime/util/shchelp_j9.c Outdated Show resolved Hide resolved
runtime/oti/util_api.h Outdated Show resolved Hide resolved
@AlexeyKhrabrov
Copy link
Contributor Author

AlexeyKhrabrov commented Jun 19, 2024

Some stats from AcmeAir + JMeter on JDK17:

Before:
"Failed to find class" errors: 313

    compilationRelocationFailure           299
  aotCacheDeserializationFailure           313

After:
"Failed to find class" errors: 33

    compilationRelocationFailure            23
  aotCacheDeserializationFailure            34

@AlexeyKhrabrov
Copy link
Contributor Author

@mpirvu I still need to write a high level design description, but the code should be ready now.

@AlexeyKhrabrov AlexeyKhrabrov force-pushed the aotcache_lambdas branch 3 times, most recently from d96a40a to b920442 Compare June 19, 2024 18:04
@AlexeyKhrabrov AlexeyKhrabrov changed the title (WIP) Support lambdas and other runtime generated classes in JITServer AOT cache Support lambdas and other generated classes in JITServer AOT cache Jun 19, 2024
@mpirvu
Copy link
Contributor

mpirvu commented Jun 19, 2024

jenkins test sanity xlinuxjit,zlinuxjit jdk21

@AlexeyKhrabrov AlexeyKhrabrov marked this pull request as ready for review June 19, 2024 18:06
@mpirvu
Copy link
Contributor

mpirvu commented Jun 20, 2024

On zLinux the tests have actually passed.
On xlinux we hit an infra issue. Will retry.

@mpirvu
Copy link
Contributor

mpirvu commented Jun 20, 2024

jenkins test sanity xlinuxjit jdk21

@mpirvu
Copy link
Contributor

mpirvu commented Jun 20, 2024

Tests on xlinux seem to have passed, though we are getting an infra failure late.

For local tests I can run AcmeAirEE8 with Java17 outside containers.
If I use containers I get the following assert:

Assertion failed at /home/mpirvu/FullJava17/openj9-openjdk-jdk17/openj9/runtime/compiler/runtime/JITServerAOTDeserializer.cpp:293: ramClass->romClass == h_r.first->second->romClass
        Duplicate generated class 0x116d900 != 0x115d500 ROMClass mismatch: 0x7fadb404b868 != 0x7fadb404b6a8

This commit implements support for correctly and reliably
serializing and deserializing/loading JITServer AOT cache methods
that are defined by or refer to classes generated at runtime.

The current list of recognized types of generated classes includes:
- Lambda classes;
- Dynamic proxy classes;
- Generated reflection accessors.
In each of these cases, the generated class name consists of a
deterministic prefix followed by a suffix that can vary across JVMs.

Generated classes are identified across client JVMs using the
combination of: the class loader ID (name of its first loaded
class), the deterministic class name prefix, and the deterministic
ROMClass hash. The latter is computed after re-packing the
ROMClass to truncate all the instances of the class name string
down to the deterministic class name prefix. In order to lookup
generated classes at the client JVM during AOT deserialization
and load, we maintain a mapping between <loader, prefix, hash>
and the RAMClass, which is populated in the class load JIT hook,
and invalidated in the class unload and redefinition hooks.

This mechanism can be disabled by passing the command
line option `-Xjit:aotCacheDisableGeneratedClassSupport`
to both the JITServer and the client JVMs.

Signed-off-by: Alexey Khrabrov <khrabrov@cs.toronto.edu>
@AlexeyKhrabrov
Copy link
Contributor Author

For local tests I can run AcmeAirEE8 with Java17 outside containers. If I use containers I get the following assert:

Assertion failed at /home/mpirvu/FullJava17/openj9-openjdk-jdk17/openj9/runtime/compiler/runtime/JITServerAOTDeserializer.cpp:293: ramClass->romClass == h_r.first->second->romClass
        Duplicate generated class 0x116d900 != 0x115d500 ROMClass mismatch: 0x7fadb404b868 != 0x7fadb404b6a8

The new force-pushed version only logs the error (including the class name, which would be useful in this assertion) instead of asserting. Only the first matching RAMClass is associated with a given ROMClass hash.

@mpirvu
Copy link
Contributor

mpirvu commented Jun 20, 2024

jenkins test sanity zlinuxjit jdk21

@mpirvu
Copy link
Contributor

mpirvu commented Jun 21, 2024

Performance metrics seem to be stable after this PR. I will merge it.

@mpirvu mpirvu merged commit 332a5d3 into eclipse-openj9:master Jun 21, 2024
5 checks passed
JIT as a Service automation moved this from In progress to Done Jun 21, 2024
@AlexeyKhrabrov AlexeyKhrabrov deleted the aotcache_lambdas branch June 26, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project comp:vm
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants