Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR /home/ec2-user/claire3/output-final/tmp/phase_output/phase_bam/.bam not found #24

Closed
fidibidi opened this issue Jun 8, 2021 · 2 comments

Comments

@fidibidi
Copy link

fidibidi commented Jun 8, 2021

Hello!

Been enjoying playing around with this software, and haven't had issues until running it on a ONT Flongle dataset.
I have noted that this issue is very similar to the issue right before mine, however, reading through and trying to apply some of the suggestions there didn't seem to be the issue?

Ran the following command,
./run_clair3.sh --bam_fn=${INPUT_DIR}/A0035.bam --ref_fn=${INPUT_DIR}/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --threads="8" --sample_name="A0035" --platform="ont" --model_path=pwd"/models/ont" --output=${OUTPUT_DIR}

It seems to make it to step 6/7, where the following error is given

`[INFO] 6/7 Calling variants using Full Alignment
ESC[91m[ERROR] file /home/ec2-user/claire3/output-final/tmp/phase_output/phase_bam/.bam not foundESC[0m
parallel: This job failed:
python3 /home/ec2-user/Clair3/scripts/../clair3.py CallVarBam --chkpnt_fn /home/ec2-user/Clair3/models/ont/full_alignment --bam_fn /home/ec2-user/claire3/output-final/tmp/phase_output/phase_bam/''.bam --call_fn /home/ec2-user/claire3/output-final/tmp/full_alignment_output/full_alignment_''.vcf --sampleName A0035 --vcf_fn EMPTY --ref_fn /home/ec2-user/claire3/input/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --full_aln_regions '' --ctgName '' --add_indel_length --phasing_info_in_bam --gvcf False --python python3 --pypy pypy3 --samtools samtools --platform ont
ESC[91m[ERROR] file /home/ec2-user/claire3/output-final/tmp/phase_output/phase_bam/.bam not foundESC[0m
parallel: This job failed:
python3 /home/ec2-user/Clair3/scripts/../clair3.py CallVarBam --chkpnt_fn /home/ec2-user/Clair3/models/ont/full_alignment --bam_fn /home/ec2-user/claire3/output-final/tmp/phase_output/phase_bam/''.bam --call_fn /home/ec2-user/claire3/output-final/tmp/full_alignment_output/full_alignment_''.vcf --sampleName A0035 --vcf_fn EMPTY --ref_fn /home/ec2-user/claire3/input/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --full_aln_regions '' --ctgName '' --add_indel_length --phasing_info_in_bam --gvcf False --python python3 --pypy pypy3 --samtools samtools --platform ont

real 0m0.188s
user 0m0.767s
sys 0m0.175s
cat: /home/ec2-user/claire3/output-final/tmp/full_alignment_output/full_alignment_*.vcf: No such file or directory
ESC[91m[ERROR] No vcf file found, please check the settingESC[0m
`

I've gone ahead and made a google drive of the output, as hopefully, this can be useful for helping determine the issue.
https://drive.google.com/drive/folders/1NHXW76whGZCwRbzvcfBr5n8hYt8uafiO?usp=sharing

@aquaskyline
Copy link
Member

aquaskyline commented Jun 9, 2021

The reason why some jobs failed is that Clair3 was requesting more processes than the user environment allows ulimit -u. We have added more running environment checks and automatic retries in v0.1-r3.

Clair3 uses Tensorflow and pypy. These libraries open quite a few threads in each running instance. The THREADS parameter controls how many Clair3 instances can run concurrently, but each instance, as we've summarized, consumes up to 40-50 processes at peak. The number of processes a user could create is limited to a number that could be checked using ulimit -a. In an Ubuntu system, the limitation is usually over 10k (unless otherwise reduced), thus not a problem. But in RedHat or CentOS, which is commonly used in grids and institutions, the limitation is usually at 1024 or 2048, thus setting the THREADS to a number above 20 would reach the limit at some points. Setting ulimit -u to a higher number can solve the problem, but that requires the root privilege (or a blessing from the system admin team).

In v0.1-r3, we check ulimit -u and lower the THREADS accordingly. We also added automatic retries on failed jobs before handing them to users.

@fidibidi
Copy link
Author

fidibidi commented Jun 9, 2021

Thank you! These updates appear to have worked!

I still receive the error message but the program resolves them as it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants