Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent z/OS crash in libhealthenter.so when stopping while invoking JNI calls #102

Closed
yathamravali opened this issue Feb 28, 2024 · 1 comment · Fixed by #103
Closed

Comments

@yathamravali
Copy link
Contributor

StackTrace of abort:

27       JNIEnv_::GetStaticMethodID(_jclass*,const char*,const char*) 
                      +00000054              *PATHNAM                  
 28       ibmras::monitoring::plugins::j9::getMXBean(JNIEnv_*,_jclass* 
                      +00000250              *PATHNAM                  
 29       ibmras::monitoring::plugins::j9::cpu::CpuPlugin::pullInt()   
                      +0000080C              *PATHNAM                  
 30       ibmras::monitoring::plugins::j9::cpu::pullWrapper()          
                      +00000054              *PATHNAM                  
 31       ibmras::monitoring::agent::threads::WorkerThread::processLoop 
                      +000001EE              *PATHNAM                  
 32       ibmras::monitoring::agent::threads::WorkerThread::threadEntry 

Assembly decoding:

0x2f971cf0 {}{} +0               eb6947100024 stmg      %r6, %r9, 0x710(%r4)
0x2f971cf6 {}{} +6               a74bff00     aghi      %r4, -0x100
0x2f971cfa {}{} +10              e39004b80017 llgt      %r9, 0x4b8
0x2f971d00 {}{} +16              e39090580004 lg        %r9, 0x58(%r9)
0x2f971d06 {}{} +22              e39090100017 llgt      %r9, 0x10(%r9)
0x2f971d0c {}{} +28              e31049800024 stg       %r1, 0x980(%r4)  0x521c1ff4c0: 0x0000000000000000  
0x2f971d12 {}{} +34              e32049880024 stg       %r2, 0x988(%r4)  0x521c1ff4c8: 0x000000525ebc9530 :  00000000800E0318 : // java/lang/Class : Class name: java/lang/management/ManagementFactory
0x2f971d18 {}{} +40              e33049900024 stg       %r3, 0x990(%r4)  0x521c1ff4d0: 0x00000050332c5130 :  6765744F70657261 74696E6753797374 656D4D584265616E 00DDDDDDDDDDDDDD [ getOperatingSystemMXBean........]
0x2f971d1e {}{} +46              44009010     ex        %r0, 0x10(%r9)
0x2f971d22 {}{} +50              4400900c     ex        %r0, 0xc(%r9)
0x2f971d26 {}{} +54              e32049880004 lg        %r2, 0x988(%r4)  0x521c1ff4c8: 0x000000525ebc9530 : 00000000800E0318 : // java/lang/Class : Class name: java/lang/management/ManagementFactory
0x2f971d2c {}{} +60              e33049900004 lg        %r3, 0x990(%r4)  0x521c1ff4d0: 0x00000050332c5130  :  getOperatingSystemMXBean
0x2f971d32 {}{} +66              e30049980004 lg        %r0, 0x998(%r4)  0x521c1ff4d8: 0x000000501aca6430 :  ()Ljava/lang/management/OperatingSystemMXBean;
0x2f971d38 {}{} +72              e31049800004 lg        %r1, 0x980(%r4)  0x521c1ff4c0: 0x0000000000000000
0x2f971d3e {}{} +78              e36010000004 lg        %r6, 0(%r1)  <- This instruction ran
0x2f971d44 {} +84      e36063880004 lg   %r6, 0x388(%r6) <- This instruction didnot (Failing instruction)

jmethodID method = env->GetStaticMethodID(*mgtBean, mxb, sig);
r1 was env
r2 was *mgtBean (the class object java/lang/management/ManagementFactory)
r3 was mxb (the method name , getOperatingSystemMXBean)
r4+998 was sig (()Ljava/lang/management/OperatingSystemMXBean)

So the problem is that the env (J9VMThread) is NULL.

@yathamravali
Copy link
Contributor Author

The problem was caused by ThreadPool::stopAll destructing the WorkerThread while it was still running in processLoop. This implicitly detached the thread from VM causing aborts while shutting down.

The solution is to use call source->complete(NULL) for other platforms except windows and zos. This function already sets running=false which will cause WorkerThread::processLoop to break the next time it comes back and calls source->complete(NULL) at the end of WorkerThread::processLoop.

yathamravali added a commit to yathamravali/omr-agentcore that referenced this issue Feb 28, 2024
on zos and windows to avoid intermittent crashes.

Fixes RuntimeTools#102

Signed-off-by: Ravali Yatham <rayatha1@in.ibm.com>
BrijeshNekkare pushed a commit that referenced this issue Aug 28, 2024
on zos and windows to avoid intermittent crashes.

Fixes #102

Signed-off-by: Ravali Yatham <rayatha1@in.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant