ClinSV script stops at the first step # Create sample info file from bam files ... #23

jordimaggi · 2022-03-15T08:56:11Z

Hi,

I am testing ClinSV on a Ubuntu 20.04 VM. I pulled the docker image and tried to run the following command:

sudo docker run kccg/clinsv -r all -i $PWD/WGS/*.bam -ref $PWD/WGS/Reference_hg19/hg19.fa -p $PWD/test_run

The script seems to start correctly, but stops right away at the first task. This is the console output I get:

##############################################
####                ClinSV                ####
##############################################
# 15/03/2022 08:40:29

# clinsv dir: /app/clinsv
# projectDir: /media/analyst/Data/test_run
# sampleInfoFile: /media/analyst/Data/test_run/sampleInfo.txt 
# name stem: test_run
# lumpyBatchSize: 15
# genome reference: /media/analyst/Data/WGS/Reference_hg19/hg19.fa
# run steps: all
# number input bams: 1

# Create sample info file from bam files ...
ln -s  /media/analyst/Data/test_run/alignments//.bam

Any idea where the problem may lie?

Thanks for your help.

The text was updated successfully, but these errors were encountered:

halessi · 2022-03-15T13:38:11Z

This is my exact problem as well, identical output using singularity.

Cluster reports job as having finished. PLEASE let's figure this out.

NOTE that if you try to run it again, it will work UNTIL a later step, when it looks for the BAM file to have been linked into alignments.

I think it's something to do with the formatting of our BAM headers?

##############################################
####                ClinSV                ####
##############################################
# 15/03/2022 09:33:04

# clinsv dir: /opt/clinsv
# projectDir: /data/LAB_FOLDER/project_folder_using_separate_data_input
# sampleInfoFile: /data/LAB_FOLDER/project_folder_using_separate_data_input/sampleInfo.txt 
# name stem: project_folder_using_separate_data_input
# lumpyBatchSize: 15
# genome reference: /data/LAB_FOLDER/clinsv/refdata-b37
# run steps: all
# number input bams: 44

# Create sample info file from bam files ...
ln -s /vf/users/LAB_FOLDER/BAMs/bqsr-cleaned-SAMPLE.bam /data/LAB_FOLDER/project_folder_using_separate_data_input/alignments/SAMPLE/SAMPLE.bam

I went and tried to see if the ln -s command worked if I ran it manually, and the file was already linked, so it ran successfully and then just quit, so I don't know what is going on.

halessi · 2022-03-15T13:58:41Z

@drmjc Any chance you have any insight on this? I think both of us are trying v1.0 (not GRCh38), but your input would be appreciated.

Thanks!!

drmjc · 2022-03-15T18:16:20Z

I think the issue is that neither V0.9 or v1.0 support hg19, and you're using v0.9. Andre is best placed to respond as he wrote it. V1.1 will support hg19, hs37d5, grch38.

…

On Wed, 16 Mar 2022, 12:58 am Hugh Alessi, ***@***.***> wrote: @drmjc <https://github.com/drmjc> Any chance you have any insight on this? I think both of us are trying v1.0 (not GRCh38), but your input would be appreciated. Thanks!! — Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEQQM7SRGQQ7XPJKK2OG5DVACJRZANCNFSM5QX7IXLQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

halessi · 2022-03-15T18:23:34Z

Thank you for the reply.

This would make sense -- if the BAM headers or something are formatted differently w/ hg19, then it would follow that ClinSV fails to link the files (if this data were necessary or it ignores improperly formatted data).

So, in order to use hg19 I will need to wait for v1.1, is that correct?

Thanks again!

drmjc · 2022-03-15T18:28:31Z

I think so, or liftover your bam files to hs37d5, or realign to grch38 (see the other issue about refactoring clinsv). If you don't have too many files, the latter might be the best option.

…

On Wed, 16 Mar 2022, 5:23 am Hugh Alessi, ***@***.***> wrote: Thank you for the reply. This would make sense -- if the BAM headers or something are formatted differently w/ hg19, then it would follow that ClinSV fails to link the files (if this data were necessary or it ignores improperly formatted data). So, in order to use hg19 I will need to wait for v1.1, is that correct? Thanks again! — Reply to this email directly, view it on GitHub <#23 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEQQM3RQU7SVGE27M6G3U3VADITBANCNFSM5QX7IXLQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you were mentioned.Message ID: ***@***.***>

halessi · 2022-03-21T16:18:54Z

Update: was able to fix the linking issue at the start of ClinSV by fixing my bam.bai files --> I had the .bam.bai files soft linked to the .bai, which ClinSV didn't like. By creating hard links from .bai to .bam.bai files, I was able to resolve this issue.

drmjc · 2022-03-22T03:30:08Z

how intriguing, thanks for the update.

@J-Bradlee, please note this & we should test with

test.bam + test.bai
test.bam + test.bam.bai

Both forms of naming the bai index file are acceptable in practice (even though the SAM specs don't define this).

halessi · 2022-04-10T14:45:56Z

@drmjc -- just a quick question. Does annotation often take upwards of 4+ days? For 45 BAMs, my annotation phase has been going for 4.5 days at this point. Not sure if that's expected or not (200gb RAM, 32 CPUs).

Thank you!

Hugh

J-Bradlee · 2022-04-11T01:53:34Z

Hi @halessi, thought I would jump in here and say that for a single 72gb BAM file it took at least 24 hours to run through all of ClinSV's steps on a similarly spec machine as yours. It also took around 6 hours to finish all the steps for a single 6gb BAM file. Roughly what is the total size for all 45 of your BAM files?

halessi · 2022-04-11T01:59:07Z

@J-Bradlee Thanks so much for your reply.

I would guess about ~650GB would be the total size for all BAM files. Maybe this was too large of a run? I would estimate total running time at this point for all steps to be in the 10 day range, so perhaps I should have split this up more effectively...

Anyways, I guess it sounds like this amount of time isn't crazy. But I'm a little worried it's going to take like 20 days at this point...

Can you speak a bit more on the distribution of time? I.e., for your 72gb BAM run, was the majority of it during lumpy/cnvator?

Note that I originally provided ClinSV with even more resources (64 cpus, I think 400gb of RAM?) but the job was killed due to a cluster error, and it didn't seem like ClinSV was even eating up anywhere near that much, so I cut it back for resuming the job.

Thank you!

Hugh

J-Bradlee · 2022-04-11T02:21:25Z

No problem @halessi .

Most of the time is spent on the bigwig step followed by the annotation and then CNVnator steps. Below is my output of a successful run for a subsampled 6gb BAM file. Hopefully it can give you a rough idea of how long it would take for your BAM files.

Note this is being used with ClinSV v1.0 with reference genome b38. However I think it should give similar duration to v0.9's b37 ref genome

##############################################
####                ClinSV                ####
##############################################
# 28/03/2022 18:25:00

# clinsv dir: /app/clinsv
# projectDir: /app/project_folder
# sampleInfoFile: /app/project_folder/sampleInfo.txt 
# name stem: project_folder
# lumpyBatchSize: 5
# genome reference: /app/ref-data/refdata-b38
# run steps: all
# number input bams: 1

# Create sample info file from bam files ...
ln -s /app/input/NA12878.grch38.subsampled.bam /app/project_folder/alignments/FR05812606/FR05812606.bam
ln -s /app/input/NA12878.grch38.subsampled.bam.bai /app/project_folder/alignments/FR05812606/FR05812606.bam.bai
# Read Sample Info from /app/project_folder/sampleInfo.txt
# use: FR05812606       H7LH3CCXX_6             /app/input/NA12878.grch38.subsampled.bam
# 1 samples to process
# If not, please exit make a copy of sampleInfo.txt, modify it and rerun with -s sampleInfo_mod.txt pointing to the new sample info file. 

###### Generate the commands and scripts ######

# bigwig

# lumpy

# cnvnator

# annotate

# prioritize

# qc

###### Run jobs ######

 ### executing: sh /app/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.sh &> /app/project_folder/alignments/FR05812606/bw/sh/bigwig.createWigs.FR05812606.e  ...  

 ### finished after (hh:mm:ss): 01:31:33
 ### exist status: 0

 ### executing: sh /app/project_folder/alignments/FR05812606/bw/sh/bigwig.q0.FR05812606.sh &> /app/project_folder/alignments/FR05812606/bw/sh/bigwig.q0.FR05812606.e  ...  

 ### finished after (hh:mm:ss): 00:37:20
 ### exist status: 0

 ### executing: sh /app/project_folder/alignments/FR05812606/bw/sh/bigwig.q20.FR05812606.sh &> /app/project_folder/alignments/FR05812606/bw/sh/bigwig.q20.FR05812606.e  ...  

 ### finished after (hh:mm:ss): 00:36:05
 ### exist status: 0

 ### executing: sh /app/project_folder/alignments/FR05812606/bw/sh/bigwig.mq.FR05812606.sh &> /app/project_folder/alignments/FR05812606/bw/sh/bigwig.mq.FR05812606.e  ...  

 ### finished after (hh:mm:ss): 00:37:10
 ### exist status: 0

 ### executing: sh /app/project_folder/SVs/FR05812606/lumpy/sh/lumpy.preproc.FR05812606.sh &> /app/project_folder/SVs/FR05812606/lumpy/sh/lumpy.preproc.FR05812606.e  ...  

 ### finished after (hh:mm:ss): 00:12:51
 ### exist status: 0

 ### executing: sh /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.sh &> /app/project_folder/SVs/joined/lumpy/sh/lumpy.caller.joined.e  ...  

 ### finished after (hh:mm:ss): 00:26:51
 ### exist status: 0

 ### executing: sh /app/project_folder/SVs/joined/lumpy/sh/lumpy.depth.joined.sh &> /app/project_folder/SVs/joined/lumpy/sh/lumpy.depth.joined.e  ...  

 ### finished after (hh:mm:ss): 00:00:54
 ### exist status: 0

 ### executing: sh /app/project_folder/SVs/FR05812606/cnvnator/sh/cnvnator.caller.FR05812606.sh &> /app/project_folder/SVs/FR05812606/cnvnator/sh/cnvnator.caller.FR05812606.e  ...  

 ### finished after (hh:mm:ss): 00:56:31
 ### exist status: 0

 ### executing: sh /app/project_folder/SVs/joined/sh/annotate.main.joined.sh &> /app/project_folder/SVs/joined/sh/annotate.main.joined.e  ...  

 ### finished after (hh:mm:ss): 01:27:03
 ### exist status: 0

 ### executing: sh /app/project_folder/SVs/joined/sh/prioritize.main.joined.sh &> /app/project_folder/SVs/joined/sh/prioritize.main.joined.e  ...  

 ### finished after (hh:mm:ss): 00:00:07
 ### exist status: 0

 ### executing: sh /app/project_folder/SVs/qc/sh/qc.main.joined.sh &> /app/project_folder/SVs/qc/sh/qc.main.joined.e  ...  

 ### finished after (hh:mm:ss): 00:00:48
 ### exist status: 0

# 29/03/2022 00:52:13 Project project_folder project_folder | Total jobs 11 | Remaining jobs 0 | Remaining steps bigwig,lumpy,cnvnator,annotate,prioritize,qc  11 | Total time: 386 min

# 29/03/2022 00:52:13 Project project_folder project_folder | Total jobs 11 | Remaining jobs 0 | Remaining steps   0 | Total time: 386 min

# Everything done! Exit

# writing igv session files...

xml file: /app/project_folder/igv/FR05812606.xml

I also want to add, that you may experience even slower times for the CNVnator section as the job resources are hard coded to 16 cpus and 30 gb of memory. See the source code line here. So it is not using all the resources that are available to it.

halessi mentioned this issue Mar 15, 2022

lumpy.caller.joined_1-15.sh: line 6: readLArr[]: bad array subscript #22

Closed

J-Bradlee closed this as completed Jun 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClinSV script stops at the first step # Create sample info file from bam files ... #23

ClinSV script stops at the first step # Create sample info file from bam files ... #23

jordimaggi commented Mar 15, 2022

halessi commented Mar 15, 2022 •

edited

Loading

halessi commented Mar 15, 2022

drmjc commented Mar 15, 2022 via email

halessi commented Mar 15, 2022

drmjc commented Mar 15, 2022 via email

halessi commented Mar 21, 2022

drmjc commented Mar 22, 2022

halessi commented Apr 10, 2022

J-Bradlee commented Apr 11, 2022

halessi commented Apr 11, 2022 •

edited

Loading

J-Bradlee commented Apr 11, 2022 •

edited

Loading

ClinSV script stops at the first step # Create sample info file from bam files ... #23

ClinSV script stops at the first step # Create sample info file from bam files ... #23

Comments

jordimaggi commented Mar 15, 2022

halessi commented Mar 15, 2022 • edited Loading

halessi commented Mar 15, 2022

drmjc commented Mar 15, 2022 via email

halessi commented Mar 15, 2022

drmjc commented Mar 15, 2022 via email

halessi commented Mar 21, 2022

drmjc commented Mar 22, 2022

halessi commented Apr 10, 2022

J-Bradlee commented Apr 11, 2022

halessi commented Apr 11, 2022 • edited Loading

J-Bradlee commented Apr 11, 2022 • edited Loading

halessi commented Mar 15, 2022 •

edited

Loading

halessi commented Apr 11, 2022 •

edited

Loading

J-Bradlee commented Apr 11, 2022 •

edited

Loading