Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple assembly question #1657

Open
standage opened this issue Mar 14, 2017 · 0 comments
Open

Simple assembly question #1657

standage opened this issue Mar 14, 2017 · 0 comments

Comments

@standage
Copy link
Member

I have 13 reads that form the following graph structure at k=31 (reads here).

screen shot 2017-03-14 at 1 39 12 pm

With some pretty vanilla parameters, the 13 reads align very well and have nearly perfect consensus.

mafft \
    --clustalout \
    --reorder \
    --adjustdirection \
    AAAGTTTTCTTAAAAACATATATGGCCGGGCGCGGTGGCTC.reads.fa
CLUSTAL format alignment by MAFFT FFT-NS-2 (v7.305b)


ERR899711.82875 cggggtttccccgtgttagccaggatggtctcgatctcctgacctcgtgatccgcccgcc
ERR899711.22833 -----------cgtgttagccaggatggtctcgatctcctgacctcgtgatccgcccgcc
ERR894724.61176 ------------------gccaggatggtctcgatctcctgacctcgtgatccgcccgcc
ERR899711.20356 -------------------ccaggatggtctcgatctcctgacctcgtgatccgcccgcc
ERR899709.43552 --------------------------------gatctcctgacctcgtgatccgcccgcc
ERR899709.86863 ---------------------------------------tgacctcgtgatccgcccgcc
_R_ERR899711.11 ------------------------------------------------------------
_R_ERR894723.11 ------------------------------------------------------------
_R_ERR894723.58 ------------------------------------------------------------
_R_ERR894724.22 -----------------------------------------------------------c
_R_ERR894724.17 ------------------------------------------------------------
_R_ERR899711.21 ------------------------------------------------------------
_R_ERR894724.13 ------------------------------------------------------------
                                                                            

ERR899711.82875 tcggcctcccaaagtgctgggattacaggcgtgagccaccgcgcccggccatatatgttt
ERR899711.22833 tcggcctcccaaagtgctgggattacaggcgtgagccaccgcgcccggccatatatgttt
ERR894724.61176 tcggcctcccaaagtgctgggattacaggcgtgagccaccgcgcccggccatatatgttt
ERR899711.20356 tcggcctcccaaagtgctgggattacaggcgtgagccaccgcgcccggccatatatgttt
ERR899709.43552 tcggcctcccaaagtgctgggattacaggcgtgagccaccgcgcccggccatatatgttt
ERR899709.86863 tcggcctcccaaagtgctgggattacaggcgcgagccaccgcgcccggccatatatgttt
_R_ERR899711.11 --------ccaaagtgctgggattacaggcgtgagccaccgcgcccggccatatatgttt
_R_ERR894723.11 ------tcccaaagtgctgggattacaggcgtgagccaccgcgcccggccatatatgttt
_R_ERR894723.58 ----cctcccaaagtgctgggattacaggcgtgagccaccgcgcccggccatatatgttt
_R_ERR894724.22 tcggcctcccaaagtgctgggattacaggcgtgagccaccgcgcccggccatatatgttt
_R_ERR894724.17 ---gcctcccaaagtgctgggattacaggcgtgagcaaccgcgcccggccatatatgttt
_R_ERR899711.21 ------------------------------------caccgcgcccggccatatatgttt
_R_ERR894724.13 ---------------------------ggcgtgagccaccgcgcccggccatatatgttt
                                                     ***********************

ERR899711.82875 ttaaga------------------------------------------------------
ERR899711.22833 ttaagaaaacttttttt-------------------------------------------
ERR894724.61176 ttaagaaaactttttttggatgcc------------------------------------
ERR899711.20356 ttaagaaaactttctttggatgccc-----------------------------------
ERR899709.43552 ttaagaaaactttttttggatgcccaggccgacagatc----------------------
ERR899709.86863 ttaagaaaactttttttggatgcccaggccgacagatcgctttga---------------
_R_ERR899711.11 ttaagaaaactttttttggatgcccaggccgacagatcgctttgagctcaggagtttgag
_R_ERR894723.11 ttaagaaaactttttttggatgcccaggccgacagatcgctttgagctcaggagtttgag
_R_ERR894723.58 ttaagaaaacttttcttggctgcccaggccgacagatcgctttgagctcaggagtttgag
_R_ERR894724.22 ttaagaaaactttttttggatgcccaggccgacagatcgctttgagctcaggagtttgag
_R_ERR894724.17 ttaagaaaactttttttggatgcccaggtcgacagatcgctttgagctcaggagtttgag
_R_ERR899711.21 ttaagaaaactttttttggatgcccaggccgacagatcgctttgagctcaggagtttgag
_R_ERR894724.13 ttaagaaaactttttatggttgcccaggccgtcagatcgctttgtgctcaggagttttag
                ******                                                      

ERR899711.82875 ------------------------------------------
ERR899711.22833 ------------------------------------------
ERR894724.61176 ------------------------------------------
ERR899711.20356 ------------------------------------------
ERR899709.43552 ------------------------------------------
ERR899709.86863 ------------------------------------------
_R_ERR899711.11 accagcctgggcaa----------------------------
_R_ERR894723.11 accagcctgggc------------------------------
_R_ERR894723.58 accagcctgg--------------------------------
_R_ERR894724.22 accag-------------------------------------
_R_ERR894724.17 tctagcctg---------------------------------
_R_ERR899711.21 accagcctgggcaatatggcaaaaccctgtctctacaaaaaa
_R_ERR894724.13 accagcctgtgcaatatggcaaaacccagtctt---------

The consensus between these reads is clear, but I'm having trouble assembling the entire contig. There are a few sequencing errors (especially in the T-rich region following the 13-read exact consensus) that are truncating the assembly. Linear path assembly isn't designed to handle this case, but the junction count assembler is producing very small contigs as well.

My thought was to try the SimpleLabeledAssembler to label across high-degree nodes, but this requires passing in a sequence when finding high-degree nodes and labeling. What should this sequence be? In the general case I won't have the consensus sequence beforehand: this is what I'm trying to assemble. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant