Skip to content

Latest commit

 

History

History
19 lines (11 loc) · 1.42 KB

README.md

File metadata and controls

19 lines (11 loc) · 1.42 KB

Consensus

The program could be used in reference-guided de-novo assembly of short DNA sequences (PCR amplicons or separate genes).

The program makes a consensus DNA sequence from some contigs aligned to a reference sequence using CAP3 Assembly Program (http://seq.cs.iastate.edu/cap3.html). The benefit is that consensus sequence is build after removing the reference from the alignment. The consensus is made by calling the most frequent nucleotides from the alignment. In case of equal frequencies, the consensus nucleotide will be designated as 'N'.

To perform assembly, first, you should place a single reference sequence in FASTA format into a file named 'reference.fa'. Then concatenate it with a file containing several contigs or Sanger reads in FASTA format.

cat reference.fa contigs.fasta > ref-plus-cont.fasta

Then use CAP3 program to assemble sequences, and place the output to the file named 'out.txt'. Mild parameters for CAP3 program should be used if reference sequence is different enough from the newly-sequenced one.

cap3 ref-plus-cont.fasta -m 40 -p 70 -g 1 > out.txt

Then start this program ‘consensus.py’ from a directory containing both 'reference.fa' and 'out.txt' files. The file 'reference.fa' will be used to extract the name of the reference sequence (which must be different from any name of contigs).

consensus.py

The file 'res.txt' generated by a program will contain the required consensus sequence.