problem with sambamba #232

CapitanFlint · 2019-07-29T18:13:42Z

Hi,

I've installed the new 2.5.0 version. Then, i ran the .sh but told me that sambamba wasn't in my path.
So i installed sambamba by bioconda and made a new environment for it.
I ran the .sh again and i got this:

lun jul 29 14:00:59 -04 2019 Running GRIDSS. The full log is in ./gridss.full.20190729_140059.PC-1.1006.log
lun jul 29 14:00:59 -04 2019 Start pre-processing /home/bromero/2019/data/bamfiles/clean_normalized_bam/costa/Ccost2-8_clean.bam
lun jul 29 14:00:59 -04 2019 CollectInsertSizeMetrics /home/bromero/2019/data/bamfiles/clean_normalized_bam/costa/Ccost2-8_clean.bam first 10000000 records
lun jul 29 14:02:50 -04 2019 CollectGridssMetricsAndExtractSVReads|sambamba /home/bromero/2019/data/bamfiles/clean_normalized_bam/costa/Ccost2-8_clean.bam

Then nothing more happened, it just stopped.
Apparently there is something wrong with sambamba.. or i'm missing something.
Or maybe the anaconda version of sambamba is old and there is some uncompatibility?

Thank you D. Cameron for answer my previous question. The new version is way better than the 2.1.0 that is stored in the anaconda cloud.

Greetings,

Bruno.

EDIT: i checked bioconda sambamba version and is old (0.6.6 vs 0.7.0). The problem is that installing sambamba by myself is a hard task . . so if there is a simpler solution than compile the whole sambamba with the source package please let me know, assuming that sambamba is the problem.

d-cameron · 2019-07-30T07:20:11Z

jul 29 14:00:59 -04 2019 Running GRIDSS. The full log is in ./gridss.full.20190729_140059.PC-1.1006.log

Did it actually stop? If you've got a large BAM file then each step from that step onward will take hours. I've had feedback from other users about how spammy the full log file is so I opted for the new driver script to only output the very high-level progress. Do you see any progress if you tail -f ./gridss.full.20190729_140059.PC-1.1006.log? Is this new behaviour confusing, or just unexpected because you're used to the spammy output of gridss 2.1.0?

d-cameron · 2019-07-30T07:21:53Z

Then nothing more happened, it just stopped.

Just to be clear: that is expected behaviour for the driver script. The status messages every 1,000,000 reads are now in the full log file. Do I need to add a more text to the default output to make this clearer?

CapitanFlint · 2019-08-01T14:52:12Z

Thank you Cameron, for you very quick response.
Two days ago, I deleted every gridss related files, including the environment. Then, I downloaded again the new version (just the jar-with-dependencies and gridss.sh files), created a new environment for gridss and move the files into the server (important: i do not have sudo) into a single directory.
I modified the gridss.sh by adding this variables:

workingdir="/2019/data/bamfiles/prueba_20x/working_data"
reference="/2019/data/gen_ref/Cistanthe_genoma_v3.1a.fasta"
output_vcf="/2019/data/vcf_files/output_gridss/"
assembly="/2019/data/vcf_files/output_gridss/"
threads=$(nproc)
gridss_jar="~/programas/gridss/gridss-2.5.0-gridss-jar-with-dependencies.jar"
jvmheap="28g"
blacklist=""
metricsrecords=10000000
steps="all"
config_file=""
maxcoverage=50000
labels=""

I ran gridss again with this command:

$ bash gridss_20x.sh --reference ~/2019/data/gen_ref/Cistanthe_genoma_v3.1a.fasta --output ~~/2019/data/vcf_files/output_gridss/Cist3-5_gridss.vcf --assembly ~/2019/data/vcf_files/output_gridss/Cist3-5_gridss.assembly.bam --threads 8 --jar ~/programas/gridss/gridss-2.5.0-gridss-jar-with-dependencies.jar ~/2019/data/bamfiles/prueba_20x/Cist3-5_clean.bam

And told me that 'sambamba is not in my PATH'. So i supposed that sambamba must be in the .jar file (all dependencies), so in /home/ -> nano ./.bashrc i added in the last line:

export PATH="~/programas/gridss/gridss-2.5.0-gridss-jar-with-dependencies.jar:$PATH"

But it doesn't work. I don't know what to do, my knowledge in bioinformatics is very limited.

Given that, i installed sambamba (in the same environment) with:

conda install -c bioconda sambamba

then i ran it again with the same command line and stopped working when the sambamba argument comes into the stream. Two days ago and still in the same process, but everytime when i do 'htop', its never using any memory nor process data.
I checked if there was any progress during the process with tail -f in the working directory but told me that nothing was there.
Thank you again for everything, and sorry if this is too much verbose. I really want to use gridss, because of this: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1720-5/tables/1

Greets,

Bruno

I ran all processes with 8 (for one sample) cores (2.22 GHz) and the server has 250 GB RAM.

d-cameron · 2019-08-02T01:38:16Z

Given that, i installed sambamba (in the same environment) with:
conda install -c bioconda sambamba

It appears that there's still something wrong with your environment. What happens when you run sambamba --version on the command line? Maybe your bioconda environment isn't loaded.

If your attempts to get sambamba running in your environment fail, then you're still able to run GRIDSS by calling it directly (it'll just run a bit slower).

The following command line should work for you:

java -ea -Xmx31g \
	-Dsamjdk.create_index=true \
	-Dsamjdk.use_async_io_read_samtools=true \
	-Dsamjdk.use_async_io_write_samtools=true \
	-Dsamjdk.use_async_io_write_tribble=true \
	-Dgridss.gridss.output_to_temp_file=true \
	-cp ~/programas/gridss/gridss-2.5.0-gridss-jar-with-dependencies.jar gridss.CallVariants \
	TMP_DIR=~/2019/data/vcf_files/output_gridss/ \
	WORKING_DIR=~/2019/data/vcf_files/output_gridss/ \
	REFERENCE_SEQUENCE=~/2019/data/gen_ref/Cistanthe_genoma_v3.1a.fasta \
	INPUT=~/2019/data/bamfiles/prueba_20x/Cist3-5_clean.bam \
	OUTPUT=~/2019/data/vcf_files/output_gridss/Cist3-5_gridss.vcf \
	ASSEMBLY=~/2019/data/vcf_files/output_gridss/Cist3-5_gridss.assembly.bam \
	WORKER_THREADS=16 \
	2>&1 | tee -a gridss.$HOSTNAME.$$.log

d-cameron · 2019-08-02T01:42:12Z

I modified the gridss.sh by adding this variables:

With the latest gridss.sh, there's no need to modify the script itself, you can do it all from the command line by adding --jvmheap 28g

d-cameron · 2019-08-02T01:43:30Z

Calling GRIDSS directly still requires R and bwa to be on path, but not sambamba

CapitanFlint · 2019-08-06T21:11:18Z

Thank you so much Cameron. Wish you the best. The command line worked nicely, then i used the variantannotation.R and i got DELs,DUPs and so on.
I'm closing this.

Bruno

…sions of sambamba Outputting version numbers of all software dependencies Default JVM heap size reduced to 25g

d-cameron added question and removed question labels Jul 30, 2019

CapitanFlint closed this as completed Aug 6, 2019

d-cameron pushed a commit that referenced this issue Aug 7, 2019

#232 using "sambamba -n" for sorting for compatibility with older ver…

3958686

…sions of sambamba Outputting version numbers of all software dependencies Default JVM heap size reduced to 25g

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem with sambamba #232

problem with sambamba #232

CapitanFlint commented Jul 29, 2019 •

edited

Loading

d-cameron commented Jul 30, 2019 •

edited

Loading

d-cameron commented Jul 30, 2019

CapitanFlint commented Aug 1, 2019

d-cameron commented Aug 2, 2019 •

edited

Loading

d-cameron commented Aug 2, 2019

d-cameron commented Aug 2, 2019 •

edited

Loading

CapitanFlint commented Aug 6, 2019

problem with sambamba #232

problem with sambamba #232

Comments

CapitanFlint commented Jul 29, 2019 • edited Loading

d-cameron commented Jul 30, 2019 • edited Loading

d-cameron commented Jul 30, 2019

CapitanFlint commented Aug 1, 2019

d-cameron commented Aug 2, 2019 • edited Loading

d-cameron commented Aug 2, 2019

d-cameron commented Aug 2, 2019 • edited Loading

CapitanFlint commented Aug 6, 2019

CapitanFlint commented Jul 29, 2019 •

edited

Loading

d-cameron commented Jul 30, 2019 •

edited

Loading

d-cameron commented Aug 2, 2019 •

edited

Loading

d-cameron commented Aug 2, 2019 •

edited

Loading