Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A random failure can occur when issuing the ARCHMODE and NUMCPU commands #542

Closed
wrljet opened this issue Feb 4, 2023 · 2 comments
Closed
Assignees
Labels
BUG The issue describes likely incorrect product functionality that likely needs corrected.

Comments

@wrljet
Copy link
Member

wrljet commented Feb 4, 2023

A random failure can occur when issuing the ARCHMODE and NUMCPU commands.

While running Hercules in a shell script, many times over-and-over, trying to track down a different issue, I started noticing these errors in the log, which later caused IPL to fail.

HHC02389E CPUs must be offline or stopped
or
HHC02253E All CPU's must be stopped to switch architectures

A simple Hercules .cnf can be used when attempting to reproduce this problem:

ARCHLVL z/Arch
FACILITY DISABLE 050_CONSTR_TRANSACT
FACILITY DISABLE 073_TRANSACT_EXEC
NUMCPU 2
MAXCPU 2

Depending on the host system's CPU architecture, OS, etc. this problem may trigger quickly, perhaps one out of ten tries (NetBSD on x86_64 and Sun UltraSPARC). Or it may refuse to ever act up (ARM based Raspberry Pi with Debian, and macOS on Apple M1 CPU). Windows and Debian on x86_64 fail fairly regularly for me.

After endless fiddling I did manage to get it to stop in the Visual Studio debugger (Windows 10 VM, VS2019) and noticed the two threads involved that are the "what" of the issue.

Worker Thread	impl_thread	hengine.dll!maxcpu_cmd

>	hengine.dll!maxcpu_cmd(int argc, char * * argv, char * cmdline) Line 3811	C
 	hengine.dll!CallHercCmd(int argc, char * * argv, char * cmdline) Line 362	C
 	hengine.dll!process_config(const char * cfg_name) Line 424	C
 	hengine.dll!build_config(const char * hercules_cnf) Line 118	C
 	hengine.dll!impl(int argc, char * * argv) Line 1340	C
 	hercules.exe!main(int ac, char * * av) Line 305	C


Worker Thread	Processor CP01	hutil.dll!LeaveFT_MUTEX
        
 	hutil.dll!LeaveFT_MUTEX(_tagFT_MUTEX * pFT_MUTEX) Line 292	C
 	hutil.dll!fthread_mutex_unlock(_tagFTU_MUTEX * pFTUSER_MUTEX) Line 1459	C
 	hutil.dll!hthread_release_lock(LOCK * plk, const char * release_loc) Line 545	C
 	hengine.dll!Release_Interrupt_Lock(REGS * regs, const char * location) Line 450	C
>	hengine.dll!z900_run_cpu(int cpu, REGS * oldregs) Line 1996	C
 	hengine.dll!cpu_thread(void * ptr) Line 2355	C
 	hutil.dll!hthread_func(void * arg2) Line 1055	C
 	hutil.dll!FTWin32ThreadFunc(void * pMyArgs) Line 809	C

Some of the relevant code:

cpu.c:1926
      memset(regs, 0, sizeof(REGS));

        if (cpu_init (cpu, regs, NULL))
            return NULL;

...

cpu.c:1991
RELEASE_INTLOCK(regs);

    /* Establish longjmp destination for program check or
       RETURN_INTCHECK, or SIE_INTERCEPT, or longjmp, etc.
    */
    if (setjmp( regs->progjmp ) && sysblk.ipled)
    {

---

in cpu_init( )

cpu.c:
if (!hostregs)
    {
        /* regs points to host regs */
        regs->cpustate = CPUSTATE_STOPPING;
        ON_IC_INTERRUPT(regs);



This bug affects Hercules going back at least 2 years in the git commit history.

I have reported this bug to Fish privately and worked with him to help reproduce it. I have tested his proposed fix, which is forthcoming.

Bill

Fish-Git added a commit that referenced this issue Feb 5, 2023
@Fish-Git
Copy link
Member

Fish-Git commented Feb 5, 2023

Fixed by commit f83880b.

Closing.

@Fish-Git Fish-Git closed this as completed Feb 5, 2023
@Fish-Git Fish-Git self-assigned this Feb 5, 2023
@Fish-Git Fish-Git added the BUG The issue describes likely incorrect product functionality that likely needs corrected. label Feb 5, 2023
@wrljet
Copy link
Member Author

wrljet commented Feb 6, 2023

Fish, excellent work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUG The issue describes likely incorrect product functionality that likely needs corrected.
Projects
None yet
Development

No branches or pull requests

2 participants