Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATAL: J state unsupported #238

Closed
GabeAl opened this issue Apr 24, 2021 · 7 comments
Closed

FATAL: J state unsupported #238

GabeAl opened this issue Apr 24, 2021 · 7 comments

Comments

@GabeAl
Copy link

GabeAl commented Apr 24, 2021

I've compiled v3.3.2 of the source code with native compiler flags for the Zen v2 architecture. I've also added -ffast-math.

I get this occasionally while running:
FATAL: J state unsupported
Fatal exception (source file p7_trace.c, line 163):
realloc for size 0 failed
sh: line 1: 3642453 Aborted (core dumped) hmmalign --outformat Pfam /tmpus/b26dfdff-eb39-47f3-8950-3ae0f3af697c tempg_2/out/storage/aai_qa/SGB-03449/PF03710.10.unaligned.faa > tempg_2/out/storage/aai_qa/SGB-03449/PF03710.10.aligned.faa

As well as a bunch of other "FATAL: J state unsupported" sprinkled throughout.

What is a J state and why is it failing? Code that prints this:
src/tracealign.c: case p7T_J: p7_Die("J state unsupported");

Other info:
256 thread CPU system
2TB RAM

Running in the prokka pipeline on rep genomes from the SGB (Pasolli et al).

@cryptogenomicon
Copy link
Member

HMMER depends on IEEE754-compliant floating point arithmetic, and --ffast-math allows the compiler to make unsafe and noncompliant optimizations. Why did you use --ffast-math, what happens if you just compile the code normally, and what is the result of make check with and without your custom compiler options?

@GabeAl
Copy link
Author

GabeAl commented Apr 27, 2021

Thanks! This is a helpful explanation. I'd used it because it afforded a small 1-2% speedup (I'm trying to squeeze as much performance as I can out of it, since it is a core component of a few QC and annotation pipelines I'm kicking the tires on).

Indeed once -ffast-math is removed, the error disappears. Interestingly, all other combinations of compiler optimization flags I've tried, including code profiling options, even using the intel compiler, all seemed to proceed without this issue. In practice I've rarely come across a case that so intimately depends on IEEE754, but they exist, clearly (compiling glibc is another prominent example that actually stops in its tracks without compiling if -ffast-math is detected)! Feel free to close this.

The only other interesting problem I've had is if hmmer is using more than 128 threads on a system, split across hmmer instances. I get strange non-deterministic segfaults even in the standard conda compile whenever using > 128 threads in total (across concurrent hmmsearch runs, not within a single instance of hmmsearch). But since I can't fathom what threading model could possibly lead to this behavior (signed char as # threads? Or uint8_t but not accounting for the I/O thread?), and because I haven't seen other reports of this using the standard conda compile, I am hesitant to report the behavior without a clear confirmation that it is indeed an issue with hmmer and not some other aspect of the pipeline/stack.

@npcarter
Copy link
Member

npcarter commented Apr 27, 2021 via email

@GabeAl
Copy link
Author

GabeAl commented Apr 27, 2021

Interesting, what kind of RAM use are we talking here (as ballpark)? My system has 2TB of RAM and it is memory-defragmented before any HPC run (e.g. all caches are dropped and no other processes/ramdisks are loaded beyond the basic Fedora OS). Working disk space is also large at ~90TB free (in case of large temporary files). My experience on laptops with 4GB RAM and 8 instances (4C/8T CPU) of the prokka and checkm pipelines I'm using this for haven't run into this issue. This new rig has 500x the memory for 32x the number of processes. I should try with just 129 vs 128 processes and see if there is a definite break there.

@npcarter
Copy link
Member

npcarter commented Apr 27, 2021 via email

@GabeAl
Copy link
Author

GabeAl commented Apr 28, 2021

Thanks, this is great context. Yes, I had been running up to 256 separate instances of hmmsearch --cpu 1. (This way it multi-threads quite well indeed!)

But as there is another hidden I/O thread running anyway, and I can stagger other process to run concurrently, I now limit runs to 128 instances of hmmsearch --cpu 1. Because it is all running asynchronously and being aggregated in the background, there is minimal (or negative!) performance loss by dropping the hyperthreading!

@GabeAl
Copy link
Author

GabeAl commented Apr 29, 2021

there is minimal (or negative!) performance loss by dropping the hyperthreading!

Actually this led to some head-scratching so I decided to investigate what was going on with my hmmscan threading where running fewer instances would yield higher performance... then I spotted it. Opened a new discussion, #240

@GabeAl GabeAl closed this as completed May 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants