Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent z/OS crash in libhealthenter.so when stopping #99

Open
kgibm opened this issue Jan 26, 2021 · 3 comments
Open

Intermittent z/OS crash in libhealthenter.so when stopping #99

kgibm opened this issue Jan 26, 2021 · 3 comments

Comments

@kgibm
Copy link
Contributor

kgibm commented Jan 26, 2021

IPCS ip verbx ledata 'nthreads(*)' shows a crash in WorkerThread::processLoop:

18    abort       HLE77C0:edcabort.c                                     
19    masterSynchSignalHandler                                           
                  j20200901                                              
20    __zerro     HLE77C0:edczerro.c                                     
21    __zerros    HLE77C0:edczerro.c                                     
27    ** NoName **.......................c..F....-....WorkerThread.cpp...
28    ** NoName **.......................c..F....-....WorkerThread.cpp...
29    ** NoName **.......................c..F....-.&..Thread.cpp...UI4349

Or Semaphore::open:

18    abort       HLE77C0:edcabort.c                                     
19    mainSynchSignalHandler                                             
                  j20201102                                              
20    __zerro     HLE77C0:edczerro.c                                     
21    __zerros    HLE77C0:edczerro.c                                     
27    ibmras::common::port::Semaphore::open(int*)                        
                  .......................c..F.b..-... ....Thread.cpp...D2
28    ibmras::common::port::Semaphore::wait(unsigned int)                
                  .......................c..F.b..-... ....Thread.cpp...D2
29    ibmras::monitoring::agent::threads::WorkerThread::processLoo       
                  .......................c..F.b..-... .-..WorkerThread.cpp
30    ibmras::monitoring::agent::threads::WorkerThread::threadEntr       
                  .......................c..F.b..-... .-..WorkerThread.cpp
@kgibm
Copy link
Contributor Author

kgibm commented Jan 26, 2021

The problem was caused by ThreadPool::stopAll destructing the WorkerThread while it was still running in processLoop. This implicitly destructed the Semaphore which implicitly destructed its fields like name and led to undefined behavior which drove the crash.

The solution is to comment out the setting of stopped=true in WorkerThread::stop. This function already sets running=false which will cause WorkerThread::processLoop to break the next time it comes back from the semaphore and then will set stopped=true at the end of WorkerThread::processLoop.

I was able to consistently reproduce the problem before, but after commenting out stopped=true in WorkerThread::stop, I can no longer reproduce the problem.

@kgibm
Copy link
Contributor Author

kgibm commented Jan 26, 2021

Created PR #100

@kgibm
Copy link
Contributor Author

kgibm commented Feb 16, 2021

An additional symptom of this in jdmpview will show something like the following as frames at the top of the crash stack (particularly the WorkerThread symbol):

bp: 0x000000517faff180 pc: 0x000000003465a940 /prd/link/wlp/wlp/E4_BMIS/lib/native/zos/s390x/../../../../java/8.0/lib/s390x/libhealthcenter.so::threadEntry__Q5_6ibmras10monitoring5agent7threadsEI12WorkerThreadFPQ4_6ibmras6common4port10ThreadData+0x20
bp: 0x000000517faff280 pc: 0x000000003460bd50 /prd/link/wlp/wlp/E4_BMIS/lib/native/zos/s390x/../../../../java/8.0/lib/s390x/libhealthcenter.so::wrapper+0x60

mattcolegate pushed a commit that referenced this issue Feb 22, 2021
…readPool destructing a WorkerThread while it's in processLoop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant