Skip to content
This repository has been archived by the owner on Jul 19, 2021. It is now read-only.

bbmapskimmer.sh returns inconsistant results #16

Open
dgg32 opened this issue Sep 5, 2018 · 0 comments
Open

bbmapskimmer.sh returns inconsistant results #16

dgg32 opened this issue Sep 5, 2018 · 0 comments

Comments

@dgg32
Copy link

dgg32 commented Sep 5, 2018

Hi developers. I have tried to use bbmapskimmer.sh to map some primer sequences on my pacbio reads. It seems that a read gets different results depending on it being in a single fasta or in a multifasta.

The command I run:

bbmapskimmer.sh in=primer.fasta out=samout.sam ref=$STR idfilter=0.1 k=8 noheader=t threads=4 ambiguous=all nodisk

primer.fasta:

ssu_1
AGAGTTTGATCATGGCTCAG
ssu_2
AGAGTTTGATCCTGGCTCAG
lsu_1
GGGTTCCCCCATTCGG
lsu_2
GGGTTCCCCCATTCAG
lsu_3
GGGTTTCCCCATTCGG
lsu_4
GGGTTTCCCCATTCAG
lsu_5
GGGTTGCCCCATTCGG
lsu_6
GGGTTGCCCCATTCAG

The sequence in question

problematic_seq
AGAGTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGATGAAGGGAGCTTGCTCCTGGATTCAGCGGCGGACGGGTGAGTAATGCCTAGGAATCTGCCTGGTAGTGGGGGATAACGTCCGGAAACGGGCGCTAATACCGCATACGTCCTGAGGGAGAAAGTGGGGGATCTTCGGACCTCACGCTATCAGATGAGCCTAGGTCGGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCCGTAACTGGTCTGAGAGGATGATCAGTCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCAGTAAGTTAATACCTTGCTGTTTTGACGTTACCAACAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTCAGCAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCCAAAACTACTGAGCTAGAGTACGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTAGCCGTTGGGATCCTTGAGATCTTAGTGGCGCAGCTAACGCGATAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTGGCCTTGACATGCTGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCAGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCTCCAGAAGTAGCTAGTCTAACCGCAAGGGGGACGGTTACCACGGAGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTAGGGGAACCTGCGGCTGGATCACCTCCTTAATCGAAGATCTCAGCTTCTTCATAAGCTCCCACACGAATTGCTTGATTCACTGGTTAGACGATTGGGTCTGTAGCTCAGTTGGTTAGAGCGCACCCCTGATAAGGGTGAGGTCGGCAGTTCGAATCTGCCCAGACCCACCAATTGTTGGTGTGCTGCGTGATCCGATACGGGGCCATAGCTCAGCTGGGAGAGCGCCTGCTTTGCACGCAGGAGGTCAGGAGTTCGATCCTCCTTGGCTCCACCATCTAAAACAATCGTCGAAAGCTCAGAAATGAATGTTCGTAGATGAACATTGATTTCTGGTCTTTGCACCAGAACTGTTCTTTAAAAATTCGGGTATGTGATAGAAGTAAGACTGAATGATCTCTTTCACTGGTGATCATTCAAGTCAAGGTAAAATTTGCGAGTTCAAGCGCGAATTTTCGGCGAATGTCGTCTTCACAGTATAACCAGATTGCTTGGGGTTATATGGTCAAGTGAAGAAGCGCATACGGTGGATGCCTTGGCAGTCAGAGGCGATGAAAGACGTGGTAGCCTGCGAAAAGCTTCGGGGAGTCGGCAAACAGACTTTGATCCGGAGATCTCTGAATGGGGAACCC

Run the command on this sequence alone, I get:

ssu_1	4	*	0	0	*	*	0	0	AGAGTTTGATCATGGCTCAG	*
ssu_2	0	problematic_seq	1	28	4=1I15=	*	0	0	AGAGTTTGATCCTGGCTCAG	*	NM:i:1	AM:i:28	NH:i:1
lsu_1	16	problematic_seq	2114	13	1=1X4=1I9=	*	0	0	CCGAATGGGGGAACCC	*	NM:i:2	AM:i:13	NH:i:1
lsu_2	16	problematic_seq	2114	26	6=1I9=	*	0	0	CTGAATGGGGGAACCC	*	NM:i:1	AM:i:26	NH:i:1
lsu_3	16	problematic_seq	2114	13	1=1X8=1I5=	*	0	0	CCGAATGGGGAAACCC	*	NM:i:2	AM:i:13	NH:i:1
lsu_4	16	problematic_seq	2114	25	10=1I5=	*	0	0	CTGAATGGGGAAACCC	*	NM:i:1	AM:i:25	NH:i:1
lsu_5	4	*	0	0	*	*	0	0	GGGTTGCCCCATTCGG	*
lsu_6	16	problematic_seq	2114	25	10=1I5=	*	0	0	CTGAATGGGGCAACCC	*	NM:i:1	AM:i:25	NH:i:1

Notice that "ssu_2" has a flag 0 and I am certain that except an insertion "T", ssu_2 maps to "problematic_seq".

However, if I add two others into the file:

m54122_180320_131917/45548132/ccs,2130
AGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGATGAAGGGAGCTTGCTCCTGGATTCAGCGGCGGACGGGTGAGTAATGCCTAGGAATCTGCCTGGTAGTGGGGGATAACGTCCGGAAACGGGCGCTAATACCGCATACGTCCTGAGGGAGAAAGTGGGGGATCTTCGGACCTCACGCTATCAGATGAGCCTAGGTCGGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCCGTAACTGGTCTGAGAGGATGATCAGTCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCAGTAAGTTAATACCTTGCTGTTTTGACGTTACCAACAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTCAGCAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCCAAAACTACTGAGCTAGAGTACGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTAGCCGTTGGGATCCTTGAGATCTTAGTGGCGCAGCTAACGCGATAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTGGCCTTGACATGCTGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCAGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCTCCAGAAGTAGCTAGTCTAACCGCAAGGGGGACGGTTACCACGGAGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTAGGGGAACCTGCGGCTGGATCACCTCCTTAATCGAAGATCTCAGCTTCTTCATAAGCTCCCACACGAATTGCTTGATTCACTGGTTAGACGATTGGGTCTGTAGCTCAGTTGGTTAGAGCGCACCCCTGATAAGGGTGAGGTCGGCAGTTCGAATCTGCCCAGACCCACCAATTGTTGGTGTGCTGCGTGATCCGATACGGGGCCATAGCTCAGCTGGGAGAGCGCCTGCTTTGCACGCAGGAGGTCAGGAGTTCGATCCTCCTTGGCTCCACCATCTAAAACAATCGTCGAAAGCTCAGAAATGAATGTTCGTAGATGAACATTGATTTCTGGTCTTTGCACCAGAACTGTTCTTTAAAATTCGGGTATGTGATAGAAGTAAGACTGAATGATCTCTTTCACTGGTGATCATTCAAGTCAAGGTAAAATTTGCGAGTTCAAGCGCGAATTTTCGGCGAATGTCGTCTTCACAGTATAACCAGATTGCTTGGGGTTATATGGTCAAGTGAAGAAGCGCATACGGTGGATGCCTTGGCAGTCAGAGGCGATGAAAGACGTGGTAGCCTGCGAAAAGCTTCGGGGAGTCGGCAAACAGACTTTGATCCGGAGATCTCCTGAATGGGGCAACCC
m54122_180320_131917/44957937/ccs,2130
AGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGATGAAGGGAGCTTGCTCCTGGATTCAGCGGCGGACGGGTGAGTAATGCCTAGGAATCTGCCTGGTAGTGGGGGATAACGTCCGGAAACGGGCGCTAATACCGCATACGTCCTGAGGGAGAAAGTGGGGGATCTTCGGACCTCACGCTATCAGATGAGCCTAGGTCGGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCCGTAACTGGTCTGAGAGGATGATCAGTCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCAGTAAGTTAATACCTTGCTGTTTTGACGTTACCAACAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTCAGCAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCCAAAACTACTGAGCTAGAGTACGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTAGCCGTTGGGATCCTTGAGATCTTAGTGGCGCAGCTAACGCGATAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTGGCCTTGACATGCTGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCAGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCTCCAGAAGTAGCTAGTCTAACCGCAAGGGGGACGGTTACCACGGAGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTAGGGGAACCTGCGGCTGGATCACCTCCTTAATCGAAGATCTCAGCTTCTTCATAAGCTCCCACACGAATTGCTTGATTCACTGGTTAGACGATTGGGTCTGTAGCTCAGTTGGTTAGAGCGCACCCCTGATAAGGGTGAGGTCGGCAGTTCGAATCTGCCCAGACCCACCAATTGTTGGTGTGCTGCGTGATCCGATACGGGGCCATAGCTCAGCTGGGAGAGCGCCTGCTTTGCACGCAGGAGGTCAGGAGTTCGATCCTCCTTGGCTCCACCATCTAAAACAATCGTCGAAAGCTCAGAAATGAATGTTCGTAGATGAACATTGATTTCTGGTCTTTGCACCAGAACTGTTCTTTAAAAATTCGGGTATGTGATAGAAGTAAGACTGAATGATCTCTTTCACTGGTGATCATTCAAGTCAAGGTAAAATTTGCGAGTTCAAGCGCGAATTTTCGGCGAATGTCGTCTTCACAGTATAACCAGATTGCTTGGGGTTATATGGTCAAGTGAAGAAGCGCATACGGTGGATGCCTTGGCAGTCAGAGGCGATGAAAGACGTGGTAGCCTGCGAAAAGCTTCGGGGAGTCGGCAAACAGACTTTGATCCGGAGATCTCCGAATGGGGCAACCC

I get:

ssu_1	0	m54122_180320_131917/45548132/ccs,2130	1	3	11=1X8=	*	0	0	AGAGTTTGATCATGGCTCAG	*	XT:A:R	NM:i:1	AM:i:3	NH:i:2
ssu_1	256	m54122_180320_131917/44957937/ccs,2130	1	3	11=1X8=	*	0	0	*	*	NM:i:1	AM:i:3	NH:i:2
ssu_2	0	m54122_180320_131917/45548132/ccs,2130	1	3	20=	*	0	0	AGAGTTTGATCCTGGCTCAG	*	XT:A:R	NM:i:0	AM:i:3	NH:i:2
ssu_2	256	m54122_180320_131917/44957937/ccs,2130	1	3	20=	*	0	0	*	*	NM:i:0	AM:i:3	NH:i:2
lsu_1	16	m54122_180320_131917/44957937/ccs,2130	2115	28	10=1X5=	*	0	0	CCGAATGGGGGAACCC	*	NM:i:1	AM:i:28	NH:i:2
lsu_1	272	problematic_seq	2114	14	1=1X4=1I9=	*	0	0	*	*	NM:i:2	AM:i:14	NH:i:2
lsu_2	16	m54122_180320_131917/45548132/ccs,2130	2115	2	10=1X5=	*	0	0	CTGAATGGGGGAACCC	*	XT:A:R	NM:i:1	AM:i:2	NH:i:2
lsu_2	272	problematic_seq	2114	2	6=1I9=	*	0	0	*	*	NM:i:1	AM:i:2	NH:i:2
lsu_3	16	m54122_180320_131917/44957937/ccs,2130	2115	28	10=1X5=	*	0	0	CCGAATGGGGAAACCC	*	NM:i:1	AM:i:28	NH:i:2
lsu_3	272	problematic_seq	2114	14	1=1X8=1I5=	*	0	0	*	*	NM:i:2	AM:i:14	NH:i:2
lsu_4	16	m54122_180320_131917/45548132/ccs,2130	2115	2	10=1X5=	*	0	0	CTGAATGGGGAAACCC	*	XT:A:R	NM:i:1	AM:i:2	NH:i:2
lsu_4	272	problematic_seq	2114	2	10=1I5=	*	0	0	*	*	NM:i:1	AM:i:2	NH:i:2
lsu_5	16	m54122_180320_131917/44957937/ccs,2130	2115	40	16=	*	0	0	CCGAATGGGGCAACCC	*	NM:i:0	AM:i:40	NH:i:2
lsu_5	272	m54122_180320_131917/45548132/ccs,2130	2115	29	1=1X14=	*	0	0	*	*	NM:i:1	AM:i:29	NH:i:2
lsu_6	16	m54122_180320_131917/45548132/ccs,2130	2115	40	16=	*	0	0	CTGAATGGGGCAACCC	*	NM:i:0	AM:i:40	NH:i:2
lsu_6	272	m54122_180320_131917/44957937/ccs,2130	2115	29	1=1X14=	*	0	0	*	*	NM:i:1	AM:i:29	NH:i:2

So the "problematic_seq" is not having the ssu_2 hit as in my first attempt.

Please help! Thank you!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant