Skip to content

MikhailBazhenov/Consensus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Consensus

The program could be used in reference-guided de-novo assembly of short DNA sequences (PCR amplicons or separate genes).

The program makes a consensus DNA sequence from some contigs aligned to a reference sequence using CAP3 Assembly Program (http://seq.cs.iastate.edu/cap3.html). The benefit is that consensus sequence is build after removing the reference from the alignment. The consensus is made by calling the most frequent nucleotides from the alignment. In case of equal frequencies, the consensus nucleotide will be designated as 'N'.

To perform assembly, first, you should place a single reference sequence in FASTA format into a file named 'reference.fa'. Then concatenate it with a file containing several contigs or Sanger reads in FASTA format.

cat reference.fa contigs.fasta > ref-plus-cont.fasta

Then use CAP3 program to assemble sequences, and place the output to the file named 'out.txt'. Mild parameters for CAP3 program should be used if reference sequence is different enough from the newly-sequenced one.

cap3 ref-plus-cont.fasta -m 40 -p 70 -g 1 > out.txt

Then start this program ‘consensus.py’ from a directory containing both 'reference.fa' and 'out.txt' files. The file 'reference.fa' will be used to extract the name of the reference sequence (which must be different from any name of contigs).

consensus.py

The file 'res.txt' generated by a program will contain the required consensus sequence.

About

A program for making consensus sequence from CAP3 output

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages