Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem with tigmint-cut #48

Closed
Dharmendra-G-1 opened this issue Jan 7, 2019 · 5 comments
Closed

problem with tigmint-cut #48

Dharmendra-G-1 opened this issue Jan 7, 2019 · 5 comments
Assignees

Comments

@Dharmendra-G-1
Copy link

Dharmendra-G-1 commented Jan 7, 2019

Hello,
I am running arcs with arch-make arcs-tigmint

I am running the command:
./arcs-master/Examples/arcs-make arcs-tigmint draft=draft reads=reads m=40

which I believe run tigmint-make and arch-make, and this with -n are running as follows:

For ./arcs-master/Examples/arcs-make arcs-tigmint -n draft=draft reads=reads m=40

ln -s draft.fasta draft.fa
./tigmint/bin/tigmint tigmint draft=draft reads=reads minsize=2000 as=0.65 nm=5 dist=50000 mapq=0 trim=0 span=20 window=1000 t=40
touch empty.fof
perl -ne 'chomp; if(/>/){$ct+=1; print ">$ct\n";}else{print "$_\n";} ' < draft.tigmint.fa > draft.tigmint.renamed.fa
bwa index draft.tigmint.renamed.fa
sh -c 'bwa mem -t40 -C -p draft.tigmint.renamed.fa reads.fq.gz | /mid_large_2t/important/samtools-1.6/samtools view -Sb - | /mid_large_2t/important/samtools-1.6/samtools sort -@40 -n - -o draft.tigmint.sorted.bam'
echo draft.tigmint.sorted.bam > draft.tigmint_bamfiles.fof
./arcs-master/Arcs/arcs --bx -v -f draft.tigmint.renamed.fa -a draft.tigmint_bamfiles.fof -c 5 -m 40 -s 98 -r 0.05 -e 30000 -z 500 -d 0 --gap 100 -b draft.tigmint_c5_m40_s98_r0.05_e30000_z500
python /mid_large_2t/running_arcs/arcs-master/Examples/makeTSVfile.py draft.tigmint_c5_m40_s98_r0.05_e30000_z500_original.gv draft.tigmint_c5_m40_s98_r0.05_e30000_z500.tigpair_checkpoint.tsv draft.tigmint.renamed.fa
ln -s draft.tigmint_c5_m40_s98_r0.05_e30000_z500.tigpair_checkpoint.tsv draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3.tigpair_checkpoint.tsv
/opt/links_v1.8.6/LINKS -f draft.tigmint.renamed.fa -s empty.fof -b draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3 -l 5 -a 0.3 -z 500
rm draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3.tigpair_checkpoint.tsv

./tigmint/bin/tigmint tigmint -n draft=draft reads=reads minsize=2000 as=0.65 nm=5 dist=50000 mapq=0 trim=0 span=20 window=1000 t=40

bwa index draft.fa
bwa mem -t40 -pC draft.fa reads.fq.gz | /mid_large_2t/important/samtools-1.6/samtools view -u -F4 | /mid_large_2t/important/samtools-1.6/samtools sort -@40 -tBX -m 2G -T tmp/ -o draft.reads.sortbx.bam
tigmint/bin/tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 draft.reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >draft.reads.as0.65.nm5.molecule.size2000.bed
/mid_large_2t/important/samtools-1.6/samtools faidx draft.fa
tigmint/bin/tigmint-cut -p40 -w1000 -n20 -t0 -o draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.fa draft.reads.as0.65.nm5.molecule.size2000.bed
ln -sf draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.tigmint.fa

The arch with tigmint only run until tigmint-molecule and then in the tigmint-cut step it only produces
.fa and .bed but .fa is blank and fa.bed is as follows:

-rw-rw-r-- 1 ubuntu ubuntu 697296771 Jan 7 03:20 draft.reads.as0.65.nm5.molecule.size2000.bed
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 7 04:00 draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
-rw-rw-r-- 1 ubuntu ubuntu 342869 Jan 7 04:00 draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed

DETAIL OF fa.bed file out of tigmint-cut
head draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
0 0 15517657 0-1
0 15517657 15517670 0-2
0 15517670 17147778 0-3
0 17147778 17147785 0-4
0 17147785 23376074 0-5
0 23376074 23376100 0-6
0 23376100 29696924 0-7
0 29696924 29697326 0-8
0 29697326 29697665 0-9
0 29697665 29697943 0-10

tail draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
204375 0 33106 204375
204415 0 57665 204415
204459 0 25333 204459
204499 0 64113 204499
204541 0 58322 204541
204581 0 3400 204581
204621 0 53185 204621
204661 0 20990 204661
204701 0 60883 204701
204741 0 113995 204741

If you don't mind can you please let me know what I may be doing wrong that I am unable to get .fa file after tigmint-cut but only getting .fa.bed file. Is anything I am missing in this command and as this draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa is input for arcs-tigmint all downstream are empty.

-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 5 19:17 draft.tigmint.renamed.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 5 19:17 draft.tigmint.renamed.fa.pac
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 5 19:17 empty.fof

Thanks,

Dharm

@lcoombe
Copy link
Member

lcoombe commented Jan 7, 2019

Hi @Dharmendra-G-1 -
Could you please enclose copy-and-paste blocks in triple back ticks? Using that formatting will make your pasted commands much easier to read.
(https://help.github.com/articles/basic-writing-and-formatting-syntax/#quoting-code)

Also, can you include the standard output from your the tigmint command? It looks like something went wrong in the tigmint-cut command.

@lcoombe lcoombe self-assigned this Jan 7, 2019
@Dharmendra-G-1
Copy link
Author

Hello Lauren,
Thanks for your response, Sorry for confusion with format: As per your request here are details:

I am using
draft.fasta and rename linked reads as reads.fa.gz

The main error I am getting is from tigmint-cut:


ubuntu@ip-10-68-102-172:/mid_large_2t/running_arcs$ python3 tigmint/bin/tigmint-cut -p 40 -w 1000 -n 20 -t 0 draft.fa draft.reads.as0.65.nm5.molecule.size2000.bed --fastaout draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
Started at: 2019-01-06 18:04:53.506702
Reading contig lengths...
Finding breakpoints...
Cutting assembly at breakpoints...

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

Options: 
	-fi	Input FASTA file
	-bed	BED/GFF/VCF file of ranges to extract from -fi
	-fo	Output file (can be FASTA or TAB-delimited)
	-name	Use the name field for the FASTA header
	-split	given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
	-tab	Write output in TAB delimited format.
		- Default is FASTA format.

	-s	Force strandedness. If the feature occupies the antisense,
		strand, the sequence will be reverse complemented.
		- By default, strand information is ignored.

	-fullHeader	Use full fasta header.
		- By default, only the word before the first space or tab is used.

DONE!
Ended at: 2019-01-06 18:16:46.167144

The initial command I ran are as follows:

./arcs-master/Examples/arcs-make arcs-tigmint draft=draft reads=reads m=40

which ran this 
./arcs-master/Examples/arcs-make arcs-tigmint -n draft=draft reads=reads m=40
ln -s draft.fasta draft.fa
./tigmint/bin/tigmint tigmint draft=draft reads=reads minsize=2000 as=0.65 nm=5 dist=50000 mapq=0 trim=0 span=20 window=1000 t=40 
touch empty.fof
perl -ne 'chomp; if(/>/){$ct+=1; print ">$ct\n";}else{print "$_\n";} ' < draft.tigmint.fa > draft.tigmint.renamed.fa 
bwa index draft.tigmint.renamed.fa 
sh -c 'bwa mem -t40 -C -p draft.tigmint.renamed.fa reads.fq.gz | /mid_large_2t/important/samtools-1.6/samtools view -Sb - | /mid_large_2t/important/samtools-1.6/samtools sort -@40 -n - -o draft.tigmint.sorted.bam' 
echo draft.tigmint.sorted.bam > draft.tigmint_bamfiles.fof
./arcs-master/Arcs/arcs --bx -v -f draft.tigmint.renamed.fa -a draft.tigmint_bamfiles.fof -c 5 -m 40 -s 98 -r 0.05 -e 30000 -z 500 -d 0 --gap 100 -b draft.tigmint_c5_m40_s98_r0.05_e30000_z500
python /mid_large_2t/running_arcs/arcs-master/Examples/makeTSVfile.py draft.tigmint_c5_m40_s98_r0.05_e30000_z500_original.gv draft.tigmint_c5_m40_s98_r0.05_e30000_z500.tigpair_checkpoint.tsv draft.tigmint.renamed.fa
ln -s draft.tigmint_c5_m40_s98_r0.05_e30000_z500.tigpair_checkpoint.tsv draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3.tigpair_checkpoint.tsv
/opt/links_v1.8.6/LINKS -f draft.tigmint.renamed.fa -s empty.fof -b draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3 -l 5 -a 0.3 -z 500 
rm draft.tigmint_c5_m40_s98_r0.05_e30000_z500_l5_a0.3.tigpair_checkpoint.tsv
I believe the Tigmint tigmint-make will run as follows from the starting part of arcs-tigment
./tigmint tigmint draft=draft reads=reads minsize=2000 as=0.65 nm=5 dist=50000 mapq=0 trim=0 span=20 window=1000 t=40
./tigmint/bin/tigmint tigmint -n draft=draft reads=reads minsize=2000 as=0.65 nm=5 dist=50000 mapq=0 trim=0 span=20 window=1000 t=40
bwa index draft.fa
bwa mem -t40 -pC draft.fa reads.fq.gz | /mid_large_2t/important/samtools-1.6/samtools view -u -F4 | /mid_large_2t/important/samtools-1.6/samtools sort -@40 -tBX -m 2G -T tmp/ -o draft.reads.sortbx.bam
tigmint/bin/tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 draft.reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >draft.reads.as0.65.nm5.molecule.size2000.bed
/mid_large_2t/important/samtools-1.6/samtools faidx draft.fa
tigmint/bin/tigmint-cut -p40 -w1000 -n20 -t0 -o draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.fa draft.reads.as0.65.nm5.molecule.size2000.bed
ln -sf draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.tigmint.fa

The tail end of error message I got is as follows:

[M::mem_pestat] mean and std.dev: (242.76, 127.77)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 849)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 2619274 reads in 1927.504 CPU sec, 50.692 real sec
[main] Version: 0.7.15-r1140
[main] CMD: bwa mem -t40 -pC draft.fa reads.fq.gz
[main] Real time: 52106.558 sec; CPU: 1897517.768 sec
[bam_sort_core] merging from 440 files and 40 in-memory blocks...
tigmint/bin/tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 draft.reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >draft.reads.as0.65.nm5.molecule.size2000.bed
/mid_large_2t/important/samtools-1.6/samtools faidx draft.fa
tigmint/bin/tigmint-cut -p40 -w1000 -n20 -t0 -o draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.fa draft.reads.as0.65.nm5.molecule.size2000.bed
Started at: 2019-01-05 19:06:03.814077
Reading contig lengths...
Finding breakpoints...

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

Options: 
	-fi	Input FASTA file
	-bed	BED/GFF/VCF file of ranges to extract from -fi
	-fo	Output file (can be FASTA or TAB-delimited)
	-name	Use the name field for the FASTA header
	-split	given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
	-tab	Write output in TAB delimited format.
		- Default is FASTA format.

	-s	Force strandedness. If the feature occupies the antisense,
		strand, the sequence will be reverse complemented.
		- By default, strand information is ignored.

	-fullHeader	Use full fasta header.
		- By default, only the word before the first space or tab is used.

Cutting assembly at breakpoints...
DONE!
Ended at: 2019-01-05 19:17:49.508013
ln -sf draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.tigmint.fa
make[1]: Leaving directory '/mid_large_2t/running_arcs'
touch empty.fof
perl -ne 'chomp; if(/>/){$ct+=1; print ">$ct\n";}else{print "$_\n";} ' < draft.tigmint.fa > draft.tigmint.renamed.fa 
bwa index draft.tigmint.renamed.fa 
[bwa_index] Pack FASTA... [bns_fasta2bntseq] Failed to allocate 0 bytes at bntseq.c line 303: Success
arcs-master/Examples/arcs-make:157: recipe for target 'draft.tigmint.renamed.fa.bwt' failed
make: *** [draft.tigmint.renamed.fa.bwt] Error 1

the output file I am getting are as follows:

wxrwxrwx  1 ubuntu ubuntu           11 Jan  4 19:03 draft.fa -> draft.fasta
-rw-rw-r--  1 ubuntu ubuntu       406757 Jan  4 19:40 draft.fa.amb
-rw-rw-r--  1 ubuntu ubuntu      1411478 Jan  4 19:40 draft.fa.ann
-rw-rw-r--  1 ubuntu ubuntu   2494337664 Jan  4 19:40 draft.fa.bwt
-rw-rw-r--  1 ubuntu ubuntu       440254 Jan  5 19:06 draft.fa.fai
-rw-rw-r--  1 ubuntu ubuntu    623584396 Jan  4 19:40 draft.fa.pac
-rw-rw-r--  1 ubuntu ubuntu   1247168840 Jan  4 19:51 draft.fa.sa
-rw-rw-r--  1 ubuntu ubuntu   2526643067 Jan  3 04:49 draft.fasta
-rw-rw-r--  1 ubuntu ubuntu    697296771 Jan  7 03:20 draft.reads.as0.65.nm5.molecule.size2000.bed
-rw-rw-r--  1 ubuntu ubuntu            0 Jan  7 04:00 draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
-rw-rw-r--  1 ubuntu ubuntu       342869 Jan  7 04:00 draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
-rw-rw-r--  1 ubuntu ubuntu 261106244339 Jan  5 12:02 draft.reads.sortbx.bam
lrwxrwxrwx  1 ubuntu ubuntu           77 Jan  5 19:17 draft.tigmint.fa -> draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
-rw-rw-r--  1 ubuntu ubuntu            0 Jan  5 19:17 draft.tigmint.renamed.fa
-rw-rw-r--  1 ubuntu ubuntu            0 Jan  5 19:17 draft.tigmint.renamed.fa.pac
-rw-rw-r--  1 ubuntu ubuntu            0 Jan  5 19:17 empty.fof

So after the tigmint-molecule I get the bed file " draft.reads.as0.65.nm5.molecule.size2000.bed" which has 16813449 lines but when it ran the tigmint-cut we get .bed file but no .fa file. draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa file get created but completely empty, so no problem with permission as such. The bed file output are as follows :


head draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
0	0	15517657	0-1
0	15517657	15517670	0-2
0	15517670	17147778	0-3
0	17147778	17147785	0-4
0	17147785	23376074	0-5
0	23376074	23376100	0-6
0	23376100	29696924	0-7
0	29696924	29697326	0-8
0	29697326	29697665	0-9
0	29697665	29697943	0-10

tail draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed
204375	0	33106	204375
204415	0	57665	204415
204459	0	25333	204459
204499	0	64113	204499
204541	0	58322	204541
204581	0	3400	204581
204621	0	53185	204621
204661	0	20990	204661
204701	0	60883	204701
204741	0	113995	204741

Thanks

@lcoombe
Copy link
Member

lcoombe commented Jan 7, 2019

Thanks @Dharmendra-G-1 -
Please try using a newer version of bedtools. The issue is that the more recent versions (I am using 2.27.1) output the fasta to standard out by default, whereas the older version you are using requires an output fasta file specified by -fo. You can see that the help page is being printed in standard out, indicating that the bedtools getfasta command isn't running properly as a result.

Take a look at the different help pages:

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

Options: 
	-fi	Input FASTA file
	-bed	BED/GFF/VCF file of ranges to extract from -fi
	-fo	Output file (can be FASTA or TAB-delimited)
	-name	Use the name field for the FASTA header
	-split	given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
	-tab	Write output in TAB delimited format.
		- Default is FASTA format.

	-s	Force strandedness. If the feature occupies the antisense,
		strand, the sequence will be reverse complemented.
		- By default, strand information is ignored.

	-fullHeader	Use full fasta header.
		- By default, only the word before the first space or tab is used.

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.27.1
Summary: Extract DNA sequences from a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf>

Options: 
	-fi	Input FASTA file
	-fo	Output file (opt., default is STDOUT
	-bed	BED/GFF/VCF file of ranges to extract from -fi
	-name	Use the name field for the FASTA header
	-name+	Use the name field and coordinates for the FASTA header
	-split	given BED12 fmt., extract and concatenate the sequences
		from the BED "blocks" (e.g., exons)
	-tab	Write output in TAB delimited format.
		- Default is FASTA format.

	-s	Force strandedness. If the feature occupies the antisense,
		strand, the sequence will be reverse complemented.
		- By default, strand information is ignored.

	-fullHeader	Use full fasta header.
		- By default, only the word before the first space or tab 
		is used.

Hope that helps!
Lauren

@Dharmendra-G-1
Copy link
Author

Hello Lauren,
Thanks for pointing to the bedtools Version: v2.27.1, after updating from Version: v2.25.0 to this newer version (v2.27.1) now I have the .fa and .bed output after running tigmint-cut.
Thanks for your help !!

@lcoombe
Copy link
Member

lcoombe commented Jan 7, 2019

Excellent - I'm glad you got it working!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants