Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable JitDump support on JITServer #12439

Merged
merged 1 commit into from
May 13, 2021

Conversation

dmitry-ten
Copy link
Contributor

This commit allows server to produce a JitDump log when a crash occurs
during a compilation.

Once the crash happens, the JitDump handler runs normally, producing
an IL log of the crashed thread until it reaches the point when
the problematic method needs to be recompiled. There, it sends
the client a compilationThreadCrashed message, asking it to schedule
a JitDump recompilation and blocks until the recompilation is done.

When the client receives the error message, it temporarily stops all
further compilations, purges the compilation queue and sends a request
to the server to recompile the method as a JitDump compilation.
The server will compile the method on the diagnostic thread, notify
the waiting crashed thread, allowing the JitDump handler to finish and
terminate the server.

Once the diagnostic compilation is complete, the client will resume
compilations locally until another server comes up.

Closes: #12394

@dmitry-ten dmitry-ten changed the title Enable JitDump support on JITServer WIP: Enable JitDump support on JITServer Apr 14, 2021
@dmitry-ten dmitry-ten force-pushed the jitdump-jitserver branch 2 times, most recently from 3606ca5 to 4025f13 Compare April 14, 2021 23:02
@dmitry-ten dmitry-ten changed the title WIP: Enable JitDump support on JITServer Enable JitDump support on JITServer Apr 14, 2021
@dmitry-ten
Copy link
Contributor Author

@mpirvu ready for review.
The new error messages on the client look like this:

#JITServer: Client sending compReq seqNo=9 to server for method java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; @ warm.
#FAILURE:  Server compilation thread crashed while compiling java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (warm)
#JITServer: Requested JitDump recompilation of java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (warm)
#JITServer: Client sending compReq seqNo=10 to server for method java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; @ warm.
#JITServer: remoteCompile: compilationFailure statusCode 1
#FAILURE:  JITServer::ServerCompilationFailure: errCode 1 for java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; @ warm
#FAILURE:  Server diagnostic thread crashed while compiling java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; (warm)
#JITServer: t=   995 client has lost connection, resetting activation policy to AGGRESSIVE
! (warm) java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; compThreadID=0 time=969176us <TRANSLATION FAILURE: Generic JITServer compilation failure> mem=[region=64 system=2048]
     KB

While on the server the jitdump part looks like this:

JVMDUMP032I JVM requested JIT dump using '/home/dmitry-ten/debug/jitdump.20210414.190128.30565.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'JIT Compilation Thread-000' thread 0x0000000000018000
JVMDUMP049I JIT dump notified all waiting threads of the current method to be compiled
#DISPATCH: Starting to process queue entries. compThreadID=63 state=1 Q_SZ=0 Q_SZI=0 QW=0

JVMDUMP054I JIT dump is tracing the IL of the method on the crashed compilation thread
JVMDUMP048I JIT dump method being compiled is an ordinary method
JVMDUMP053I JIT dump is recompiling java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
#DISPATCH: Starting to process queue entries. compThreadID=1 state=1 Q_SZ=1 Q_SZI=1 QW=0

#DISPATCH: compThreadID=63 woke up after timeout on compMonitor

#JITServer: compThreadID=63 found clientSessionData=00007FBBA226D010 for clientUID=1137912726814822752 seqNo=10 (isCritical=0) (criticalSeqNo=9 lastProcessedCriticalReq=9)
#DISPATCH: Compilation thread executing compile(): j9method=00000000000562F8 isAotLoad=0 canDoRelocatableCompile=0 eligibleForRelocatableCompile=0 isRemoteCompReq=0 _doNotUseAotCodeFromSharedCache=0 AOTfe=0 isDLT=0
 (warm) Compiling java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;  JitDumpMethod j9m=00000000000562F8 t=2980 compThreadID=63 memLimit=524288 KB freePhysicalMemory=11109 MB
#DISPATCH: Compiling java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; @ warm
JIT: env var TR_FailOnPurpose is set to 1
Assertion failed at /home/dmitry-ten/src-11/openj9-openjdk-jdk11/openj9/runtime/compiler/optimizer/InterpreterEmulator.cpp:740: !feGetEnv("TR_FailOnPurpose")
VMState: 0x000501ff
        failing on purpose
compiling java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object; at level: warm
#0: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x8a6f05) [0x7fbc7a888f05]
#1: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x8b1960) [0x7fbc7a893960]
#2: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x57f274) [0x7fbc7a561274]
#3: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x580a2d) [0x7fbc7a562a2d]
#4: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x38aaa8) [0x7fbc7a36caa8]
#5: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x38c340) [0x7fbc7a36e340]
#6: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x381a2a) [0x7fbc7a363a2a]
#7: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x383fd2) [0x7fbc7a365fd2]
#8: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x306362) [0x7fbc7a2e8362]
#9: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x36ed39) [0x7fbc7a350d39]
#10: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x377873) [0x7fbc7a359873]
#11: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x62d64e) [0x7fbc7a60f64e]
#12: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x36e1c6) [0x7fbc7a3501c6]
#13: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x73b447) [0x7fbc7a71d447]
#14: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x73cc9b) [0x7fbc7a71ec9b]
#15: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x53f075) [0x7fbc7a521075]
#16: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x1837fc) [0x7fbc7a1657fc]
#17: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x184827) [0x7fbc7a166827]
#18: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9prt29.so(+0x28e63) [0x7fbc7c1b0e63]
#19: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x18175a) [0x7fbc7a16375a]
#20: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x1e3e25) [0x7fbc7a1c5e25]
#21: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x180783) [0x7fbc7a162783]
#22: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x180c62) [0x7fbc7a162c62]
#23: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x180d0a) [0x7fbc7a162d0a]
#24: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9prt29.so(+0x28e63) [0x7fbc7c1b0e63]
#25: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9jit29.so(+0x181164) [0x7fbc7a163164]
#26: /home/dmitry-ten/src-11/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/default/libj9thr29.so(+0xe3a2) [0x7fbc7c9ae3a2]
#27: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fbc7d1d86db]
#28: function clone+0x3f [0x7fbc7d75388f]

JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
#JITServer: compThreadID=63 has failed to compile: compErrCode 1 java/util/HashMap.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
#JITServer: compThreadID=63 MessageType::compilationFailure: statusCode 1
#DISPATCH: CompilationEnd returning startPC=0000000000000000  metadata=0000000000000000
JVMDUMP010I JIT dump written to /home/dmitry-ten/debug/jitdump.20210414.190128.30565.0004.dmp
JVMDUMP013I Processed dump event "gpf", detail ""

@mpirvu mpirvu self-assigned this Apr 15, 2021
@mpirvu mpirvu added the comp:jitserver Artifacts related to JIT-as-a-Service project label Apr 15, 2021
@mpirvu mpirvu added this to In progress in JIT as a Service via automation Apr 15, 2021
Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some inline questions/comments

runtime/compiler/control/JitDump.cpp Show resolved Hide resolved
runtime/compiler/control/JITClientCompilationThread.cpp Outdated Show resolved Hide resolved
runtime/compiler/control/JITClientCompilationThread.cpp Outdated Show resolved Hide resolved
runtime/compiler/control/JITClientCompilationThread.cpp Outdated Show resolved Hide resolved
runtime/compiler/control/JITClientCompilationThread.cpp Outdated Show resolved Hide resolved
@mpirvu
Copy link
Contributor

mpirvu commented Apr 15, 2021

Ideally, no compilation will completely fail at the client just because the server crashes. Therefore, the client should not empty its compilation queue forcefully failing compilations. Rather, after seeing that the JITServer is in a crashing mode, it should perform the compilations locally for some time, and only after a while retry the connection.
The crashing server should refuse any new non-diagnostic compilations, either by sending an error code or by simply closing the socket (which will make the client use local compilations for a while).

@dmitry-ten
Copy link
Contributor Author

I agree. I checked, and setting setDisableFurtherCompilations(true) will prevent any compilations from being queued on the client, however I'm not sure what happens to the compilations that are already in the client's queue. I think the entries would still be processed and sent to the server. To mitigate that one of the solutions you mentioned above should be enough.

@dmitry-ten dmitry-ten changed the title Enable JitDump support on JITServer WIP: Enable JitDump support on JITServer Apr 15, 2021
@dmitry-ten dmitry-ten changed the title WIP: Enable JitDump support on JITServer Enable JitDump support on JITServer Apr 15, 2021
@dmitry-ten
Copy link
Contributor Author

@mpirvu ready for another review.

@dmitry-ten
Copy link
Contributor Author

Actually, just found a bug, will mark it WIP while I fix it.

@dmitry-ten dmitry-ten changed the title Enable JitDump support on JITServer WIP: Enable JitDump support on JITServer Apr 15, 2021
@dmitry-ten
Copy link
Contributor Author

@mpirvu fixed a bug should be ready for another review.
The problem was that a recursive call to remoteCompile was deallocating the client stream, which was then double freed by the caller. Fixed it by guarding deallocation when inside a jitdump compilation.

@dmitry-ten dmitry-ten changed the title WIP: Enable JitDump support on JITServer Enable JitDump support on JITServer Apr 16, 2021
Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we prevent the client from continuing to send requests on sockets that are already open? Current changes modify CompilationInfo::requeueOutOfProcessEntry() to return abruptly once the flag _disableFurtherCompilation is true. However, the socket is still open and the client could send another compilation request which will likely timeout, because there is nobody to serve it. The identity of the open socket is also lost.
Maybe it's better to leave CompilationInfo::requeueOutOfProcessEntry() as is and at the point where requeueOutOfProcessEntry() is called, if _disableFurtherCompilation is true, then do not requeue, but instead close the socket connection, something of this sort:

         // Delete server stream
         stream->~ServerStream();
         TR::Compiler->persistentGlobalAllocator().deallocate(stream);
         entry._stream = NULL;

@dmitry-ten
Copy link
Contributor Author

I addressed the current comments, including the last one. I moved the checks for getDisableFurtherCompilations() and per-comp connection outside of requeueOutOfProcessEntry() and changed the code so that the stream is deleted if either of these conditions is satisfied. This should result in requeueOutOfProcessEntry() unconditionally requeuing the entry.

@mpirvu
Copy link
Contributor

mpirvu commented Apr 21, 2021

Another change that we should do is to move the setting of the disableFurtherCompilation flag into the critical section of compilation monitor, i.e. line https://github.com/eclipse/openj9/blob/master/runtime/compiler/control/JitDump.cpp#L319 moved at line https://github.com/eclipse/openj9/blob/master/runtime/compiler/control/JitDump.cpp#L336
It makes it easier to reason if the flag is set and read in critical sections protected by compilation monitor.

For the same reason we should probably change the code of J9CompileDispatcher::compile() to something like this

void J9CompileDispatcher::compile(JITServer::ServerStream *stream)
   {
   TR::CompilationInfo * compInfo = getCompilationInfo(_jitConfig);
   TR_MethodToBeCompiled *entry = NULL;
   bool disableFurtherCompilation = false;
   if (TR::Options::getVerboseOption(TR_VerboseJITServer))
      TR_VerboseLog::writeLineLocked(TR_Vlog_JITServer, "Server received request for stream %p", stream);
      {
      // Grab the compilation monitor to queue this entry and notify a compilation thread
      OMR::CriticalSection compilationMonitorLock(compInfo->getCompilationMonitor());
      if (!(disableFurtherCompilation = compInfo->getPersistentInfo()->getDisableFurtherCompilation())
         {
         if (compInfo->addOutOfProcessMethodToBeCompiled(stream))
            {
            // successfully queued the new entry, so notify a thread
            compInfo->getCompilationMonitor()->notifyAll();
            return;
            }
         }
      } // end critical section
      
   // If we reached this point, either there was a memory allocation failure
   // or compilation are disabled
   if (disableFurtherCompilation)
      {
      // Server disabled further compilations but client sent a compilation request anyway.
      if (TR::Options::getVerboseOption(TR_VerboseJITServer))
         TR_VerboseLog::writeLineLocked(TR_Vlog_JITServer, "Server rejected compilation request for stream %p because compilations are disabled", stream);
      stream->writeError(compilationStreamFailure);
      }
   else
      {
      if (TR::Options::getVerboseOption(TR_VerboseJITServer))
         TR_VerboseLog::writeLineLocked(TR_Vlog_JITServer, "Server rejected compilation request for stream %p because of lack of memory", stream);
      stream->writeError(compilationLowPhysicalMemory, (uint64_t) JITServer::ServerMemoryState::VERY_LOW);
      }
   }

@dmitry-ten dmitry-ten force-pushed the jitdump-jitserver branch 4 times, most recently from bd64b1f to 6144fae Compare May 6, 2021 18:08
@dmitry-ten
Copy link
Contributor Author

@mpirvu ready for another review. Could you check if you can get a dump with AcmeAir now? It seems to work for me.

@mpirvu
Copy link
Contributor

mpirvu commented May 10, 2021

Linter says:

clang++-3.8 -std=c++0x -w -fsyntax-only -ferror-limit=0  -Xclang -load -Xclang /home/runner/work/openj9/openj9/runtime/omr/tools/compiler/OMRChecker/OMRChecker.so -Xclang -add-plugin -Xclang omr-checker  -DBITVECTOR_BIT_NUMBERING_MSB -DJ9_PROJECT_SPECIFIC -DTR_HOST_X86 -DTR_HOST_64BIT -DBITVECTOR_64BIT -DLINUX -DTR_TARGET_X86 -DTR_TARGET_64BIT -DSUPPORTS_THREAD_LOCAL -D_LONG_LONG -DJ9HAMMER -I'/home/runner/work/openj9/openj9/runtime/omr/include_core' -I'/home/runner/work/openj9/openj9/runtime/omr/thread' -I'/home/runner/work/openj9/openj9/runtime/omr/gc/include' -I'../compiler/x/amd64' -I'../compiler/x' -I'../compiler' -I'../omr/compiler/x/amd64' -I'../omr/compiler/x' -I'../omr/compiler' -I'../omr' -I'..' -I'/home/runner/work/openj9/openj9/runtime/codert_vm' -I'/home/runner/work/openj9/openj9/runtime/gc_include' -I'/home/runner/work/openj9/openj9/runtime/include' -I'/home/runner/work/openj9/openj9/runtime/gc_glue_java' -I'/home/runner/work/openj9/openj9/runtime/jit_vm' -I'/home/runner/work/openj9/openj9/runtime/nls' -I'/home/runner/work/openj9/openj9/runtime/oti' -I'/home/runner/work/openj9/openj9/runtime/util' -I'/home/runner/work/openj9/openj9/build/runtime' -I'/home/runner/work/openj9/openj9/build/runtime/omr' -I'/home/runner/work/openj9/openj9/build/runtime/nls' -I'/home/runner/work/openj9/openj9/build/runtime/include' -I'/home/runner/work/openj9/openj9/build/runtime/compiler' -o ../compiler/env/CpuUtilization.cpp.linted -c ../compiler/env/CpuUtilization.cpp
../compiler/control/JitDump.cpp:376:45: error: no type named 'CompilationInfoPerThreadRemote' in namespace 'TR'; did you mean 'CompilationInfoPerThreadBase'?
      auto serverCompInfo = static_cast<TR::CompilationInfoPerThreadRemote *>(threadCompInfo);
                                        ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            CompilationInfoPerThreadBase
../compiler/control/CompilationThread.hpp:138:7: note: 'CompilationInfoPerThreadBase' declared here
class CompilationInfoPerThreadBase
      ^
../compiler/control/JitDump.cpp:377:30: error: no member named 'getClassUnloadReadMutexDepth' in 'TR::CompilationInfoPerThreadBase'
      while (serverCompInfo->getClassUnloadReadMutexDepth() > 0)
             ~~~~~~~~~~~~~~  ^
../compiler/control/JitDump.cpp:379:26: error: no member named 'getClientData' in 'TR::CompilationInfoPerThreadBase'
         serverCompInfo->getClientData()->readReleaseClassUnloadRWMutex(serverCompInfo);
         ~~~~~~~~~~~~~~  ^
3 errors generated.
make: *** [../compiler/control/JitDump.cpp.linted] Error 1

@dmitry-ten
Copy link
Contributor Author

Forgot an #ifdef

@mpirvu
Copy link
Contributor

mpirvu commented May 10, 2021

I ran liberty start-up and induced a crashed at the server. The JIT dump contains:

<?xml version="1.0" standalone="no"?>
<jitDump>

=== EXCEPTION THROWN (JITServer I/O error: read error) ===
</jitDump>

So there was some exception that prevented the dump from being generated. I cannot tell whether the client received the message to print the IL, but that did not happen. The client finished operating without a problem.

Lines from the client:

 (warm) Compiling java/lang/reflect/Array.newInstance(Ljava/lang/Class;I)Ljava/lang/Object;  OrdinaryMethod j9m=00000000001CC2F8 t=950 compThreadID=4 memLimit=262144 KB freePhysicalMemory=13332 MB
#JITServer: Client sending compReq seqNo=553 to server for method java/lang/reflect/Array.newInstance(Ljava/lang/Class;I)Ljava/lang/Object; @ warm.
...
#JITSTATE:  t=  2520 JIT changed state from STARTUP   to IDLE      cSmpl=  3 iSmpl= 19 comp=119 recomp=  0, Q_SZ= 17 CLP=OFF jvmCPU=0%
#JITServer: t=  2982 client has lost connection, resetting activation policy to AGGRESSIVE
#FAILURE:  JITServer::StreamFailure: JITServer I/O error: read error for java/lang/reflect/Array.newInstance(Ljava/lang/Class;I)Ljava/lang/Object; @ warm
#JITServer: compThreadID=4 JITServer StreamFailure: Generic stream failure
! (warm) java/lang/reflect/Array.newInstance(Ljava/lang/Class;I)Ljava/lang/Object; time=2079065us compilationStreamFailure memLimit=262144 KB freePhysicalMemory=13304 MB mem=[region=64 system=2048]KB

It's possible that the default timeout at the client is too small (2 sec) and because during the coredump generation the server does not send anything to the client, the client just closes the connection.

@mpirvu
Copy link
Contributor

mpirvu commented May 10, 2021

With a larger timeout at the client, the jitdump recompilation log is produced normally.

@mpirvu
Copy link
Contributor

mpirvu commented May 11, 2021

Looking at the client, in my tests the diagnostic recompilation at the server succeeded, which is not unrealistic.
I see the following messages:

#FAILURE:  Server compilation thread crashed while compiling org/eclipse/equinox/metatype/impl/ExtendableHelper.load(Lorg/eclipse/equinox/metatype/impl/Persistence$Reader;)Lorg/eclipse/equinox/metatype/impl/ExtendableHelper; (warm)
#JITServer: Requested JitDump recompilation of org/eclipse/equinox/metatype/impl/ExtendableHelper.load(Lorg/eclipse/equinox/metatype/impl/Persistence$Reader;)Lorg/eclipse/equinox/metatype/impl/ExtendableHelper; (warm)
#JITServer: Client sending compReq seqNo=864 to server for method org/eclipse/equinox/metatype/impl/ExtendableHelper.load(Lorg/eclipse/equinox/metatype/impl/Persistence$Reader;)Lorg/eclipse/equinox/metatype/impl/ExtendableHelper; @ warm.
#JITServer: Client successfully loaded method org/eclipse/equinox/metatype/impl/ExtendableHelper.load(Lorg/eclipse/equinox/metatype/impl/Persistence$Reader;)Lorg/eclipse/equinox/metatype/impl/ExtendableHelper; @ warm following compilation request. [metaData=00007FB865E51738, startPC=00007FB8672DC940]
#JITServer: t=  7900 client has lost connection, resetting activation policy to AGGRESSIVE
! (warm) org/eclipse/equinox/metatype/impl/ExtendableHelper.load(Lorg/eclipse/equinox/metatype/impl/Persistence$Reader;)Lorg/eclipse/equinox/metatype/impl/ExtendableHelper; time=7015902us compilationOK memLimit=262144 KB freePhysicalMemory=13010 MB mem=[region=128 system=2048]KB

What I think may not be ok is #JITServer: Client successfully loaded method
Somehow we still aborted the compilation, but we registered runtime assumptions. Ideally, we would not commit the effects of a diagnostic compilation at the client.

@mpirvu mpirvu mentioned this pull request May 11, 2021
54 tasks
@mpirvu
Copy link
Contributor

mpirvu commented May 11, 2021

Another example, where the diagnostic recompilation crashes due to a different cause. The jitDump is incomplete.

JVMDUMP010I Snap dump written to /home/mpirvu/FullJava16/Snap.20210511.110350.13656.0003.trc
JVMDUMP032I JVM requested JIT dump using '/home/mpirvu/FullJava16/jitdump.20210511.110350.13656.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'JIT Compilation Thread-001' thread 0x000000000001BC00
JVMDUMP049I JIT dump notified all waiting threads of the current method to be compiled
JVMDUMP054I JIT dump is tracing the IL of the method on the crashed compilation thread
+ (warm) sun/nio/ch/SocketChannelImpl.beginRead(Z)V @ 00007F44124A6060-00007F44124A62E2 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=0 j9m=0000000000FAC988 bcsz=37 OSR time=2278us mem=[region=1408 system=16384]KB compThreadID=6 CpuLoad=738%(92%avg) JvmCpu=360%
#JITServer: compThreadID=6 has successfully compiled sun/nio/ch/SocketChannelImpl.beginRead(Z)V memoryState=2
+ (warm) com/ibm/ws/tcpchannel/internal/NioSocketIOChannel.attemptReadFromSocketUsingNIO(Lcom/ibm/ws/tcpchannel/internal/TCPReadRequestContextImpl;[Lcom/ibm/wsspi/bytebuffer/WsByteBuffer;)J @ 00007F44123A6068-00007F44123A68D1 OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=0 j9m=0000000000EC8C48 bcsz=148 OSR time=17064us mem=[region=2432 system=16384]KB compThreadID=3 CpuLoad=738%(92%avg) JvmCpu=360%
#JITServer: compThreadID=3 has successfully compiled com/ibm/ws/tcpchannel/internal/NioSocketIOChannel.attemptReadFromSocketUsingNIO(Lcom/ibm/ws/tcpchannel/internal/TCPReadRequestContextImpl;[Lcom/ibm/wsspi/bytebuffer/WsByteBuffer;)J memoryState=2
+ (warm) com/ibm/tx/ltc/embeddable/impl/EmbeddableLocalTranCurrentImpl.begin(Z)V @ 00007F44124A6060-00007F44124A802F OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=0 j9m=0000000000747C58 bcsz=216 OSR time=60699us mem=[region=5184 system=16384]KB compThreadID=0 CpuLoad=738%(92%avg) JvmCpu=360%
#JITServer: compThreadID=0 has successfully compiled com/ibm/tx/ltc/embeddable/impl/EmbeddableLocalTranCurrentImpl.begin(Z)V memoryState=2
+ (warm) com/ibm/ws/tcpchannel/internal/SocketIOChannel.attemptReadFromSocket(Lcom/ibm/ws/tcpchannel/internal/TCPBaseRequestContext;Z)Lcom/ibm/ws/tcpchannel/internal/SocketIOChannel$IOResult; @ 00007F44124A6068-00007F44124A82FC OrdinaryMethod - Q_SZ=0 Q_SZI=0 QW=0 j9m=0000000000EC8060 bcsz=411 OSR time=42635us mem=[region=7168 system=16384]KB compThreadID=4 CpuLoad=738%(92%avg) JvmCpu=360%
#JITServer: compThreadID=4 has successfully compiled com/ibm/ws/tcpchannel/internal/SocketIOChannel.attemptReadFromSocket(Lcom/ibm/ws/tcpchannel/internal/TCPBaseRequestContext;Z)Lcom/ibm/ws/tcpchannel/internal/SocketIOChannel$IOResult; memoryState=2
JVMDUMP048I JIT dump method being compiled is an ordinary method
JVMDUMP053I JIT dump is recompiling com/ibm/wsspi/kernel/service/utils/ParserUtils.decode(Ljava/lang/String;)Ljava/lang/String;
#JITServer: compThreadID=63 found clientSessionData=00007F43135B7010 for clientUID=12419677184960038951 seqNo=4431 (isCritical=0) (criticalSeqNo=4431 lastProcessedCriticalReq=4424)
 (cold) Compiling com/ibm/wsspi/kernel/service/utils/ParserUtils.decode(Ljava/lang/String;)Ljava/lang/String;  JitDumpMethod j9m=00000000009E3620 t=20852 compThreadID=63 memLimit=4194303 KB freePhysicalMemory=12443 MB
Assertion failed at /home/mpirvu/FullJava16/openj9-openjdk-jdk16/openj9/runtime/compiler/env/J9KnownObjectTable.cpp:399: (numOfEntries == endIndex)
VMState: 0x00050080
        The client table size 2 is different from the server table size 1
compiling com/ibm/wsspi/kernel/service/utils/ParserUtils.decode(Ljava/lang/String;)Ljava/lang/String; at level: cold
#0: /home/mpirvu/FullJava16/openj9-openjdk-jdk16/build/linux-x86_64-server-release/images/jdk/lib/default/libj9jit29.so(+0x893875) [0x7f442c670875]
#1: /home/mpirvu/FullJava16/openj9-openjdk-jdk16/build/linux-x86_64-server-release/images/jdk/lib/default/libj9jit29.so(+0x89e2d0) [0x7f442c67b2d0]
...
JVMDUMP052I JIT dump recursive crash occurred on diagnostic thread
#JITServer: compThreadID=63 has failed to compile: compErrCode 1 com/ibm/wsspi/kernel/service/utils/ParserUtils.decode(Ljava/lang/String;)Ljava/lang/String;
#JITServer: compThreadID=63 MessageType::compilationFailure: statusCode 1
JVMDUMP010I JIT dump written to /home/mpirvu/FullJava16/jitdump.20210511.110350.13656.0004.dmp

@dmitry-ten
Copy link
Contributor Author

@mpirvu pushed the latest changes. One thing I changed besides what we discussed is that the server will now ignore any CH table updates for a diagnostic recompilation, because it might have missed previous updates, which will lead to an assert triggered in processEntry which results in a deadlock.

@dmitry-ten
Copy link
Contributor Author

Pushed the last updates as a new commit.

Copy link
Contributor

@mpirvu mpirvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Before triggering jenkins testing could you please rebase to make sure we test the latest code? Thanks

@mpirvu
Copy link
Contributor

mpirvu commented May 12, 2021

Also, I think we could squash the 3 commits into one.

This commit allows server to produce a JitDump log when a crash occurs
during a compilation.

Once the crash happens, the JitDump handler runs normally, producing
an IL log of the crashed thread until it reaches the point when
the problematic method needs to be recompiled. There, it sends
the client a `compilationThreadCrashed` message, asking it to schedule
a JitDump recompilation and blocks until the recompilation is done.

When the client receives the error message, it temporarily stops all
further compilations, purges the compilation queue and sends a request
to the server to recompile the method as a JitDump compilation.
The server will compile the method on the diagnostic thread, notify
the waiting crashed thread, allowing the JitDump handler to finish and
terminate the server.

Once the diagnostic compilation is complete, the client will resume
compilations locally until another server comes up.

Closes: eclipse-openj9#12394

Signed-off-by: Dmitry Ten <Dmitry.Ten@ibm.com>
@dmitry-ten
Copy link
Contributor Author

Squashed and rebased.

@mpirvu
Copy link
Contributor

mpirvu commented May 12, 2021

jenkins test sanity plinuxjit,xlinuxjit,zlinuxjit jdk11

@mpirvu mpirvu merged commit c897f1a into eclipse-openj9:master May 13, 2021
JIT as a Service automation moved this from In progress to Done May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:jitserver Artifacts related to JIT-as-a-Service project
Projects
Development

Successfully merging this pull request may close these issues.

Enable JitDump support on JITServer
2 participants