Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workload never starts (FahCore returned: INTERRUPTED (102 = 0x66)) #1602

Open
smiba opened this issue Nov 30, 2020 · 3 comments
Open

Workload never starts (FahCore returned: INTERRUPTED (102 = 0x66)) #1602

smiba opened this issue Nov 30, 2020 · 3 comments
Labels
1.Type - Defect Reported issue is a defect. 3.Component - GROMACS Core Reported issue relates to FahCore_a7. 4.OS - Debian Reported issue occurs on Debian based OS (Debian, Mint, Ubuntu).

Comments

@smiba
Copy link

smiba commented Nov 30, 2020

Your issue may already be reported!
Please search on the issue tracker before creating one.

Your Environment

  • F@H Software version: 7.6.21 and 7.4.4
  • Operating System: Debian 9
  • Browser: N/A (FAHControl)

Expected Behavior

The work queue progresses and the core gets started


Current Behavior

The core crashes(?) and never starts, the system keeps returning "FahCore returned: INTERRUPTED (102 = 0x66)"
System doesn't drop the work and gets stuck on waiting for the Core to start, retrying this every minute.

09:26:11:WU01:FS00:Starting
09:26:11:WU01:FS00:Removing old file 'work/01/logfile_01-20201130-085702.txt'
09:26:11:WU01:FS00:Running FahCore: /usr/bin/FAHCoreWrapper /var/lib/fahclient/cores/cores.foldingathome.org/lin/64bit-avx-256/a8-0.0.9/Core_a8.fah/FahCore_a8 -dir 01 -suffix 01 -version 706 -lifeline 14344 -checkpoint 15 -np 8
09:26:11:WU01:FS00:Started FahCore on PID 14414
09:26:11:WU01:FS00:Core PID:14418
09:26:11:WU01:FS00:FahCore 0xa8 started
09:26:12:WU01:FS00:FahCore returned: INTERRUPTED (102 = 0x66)

It used to work though, but I think this may have been FahCore_a7 or a different work/project.
I have two Linux machines that are stuck on this project and PRCG. (Project: 16926, PRCG: 16926(78, 786, 4) & PRCG: 16926(29, 636, 5)


Steps To Reproduce

  1. Install FAH, both version 7.4.4 or the latest 7.6.21 will do
  2. Have a CPU Folding slot
  3. At some point it will stop working, but I'm unsure if this is due to this specific workload or the use of a newer FahCore

Context

Due to this issue the system is now idle, "wasting" CPU cycles. Also at the moment the system partially heats my room, so I'm colder :)


@smiba
Copy link
Author

smiba commented Nov 30, 2020

Possible duplicate of #1570 -- However my issue is about CPU folding

@smiba
Copy link
Author

smiba commented Nov 30, 2020

Dropped the workload and received a new workload from project 16926 that is working without issues.
Still on FahCore a8, no idea why the old workload had problems.. I think its likely it will come back at some point.

EDIT: As expected, got another workload with the same issue. PRCG 16926 (59, 832, 1)

@PantherX
Copy link
Contributor

Hiya @smiba

Not sure what to make of it since your CPU does have 8 CPUs so in theory, the Project should work fine. Since it only happens on a single Project, I will ask around and see what happens 😄

@PantherX PantherX added 1.Type - Defect Reported issue is a defect. 3.Component - GROMACS Core Reported issue relates to FahCore_a7. 4.OS - Debian Reported issue occurs on Debian based OS (Debian, Mint, Ubuntu). labels Dec 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.Type - Defect Reported issue is a defect. 3.Component - GROMACS Core Reported issue relates to FahCore_a7. 4.OS - Debian Reported issue occurs on Debian based OS (Debian, Mint, Ubuntu).
Projects
None yet
Development

No branches or pull requests

2 participants