Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

ADDA is a method to find protein domains in protein sequences.
Briefly, ADDA attempts to split domains into segments that 
correspond as closely as possible to all-on-all pairwise
alignments. A detailed description of the method can be found

Heger A, Holm L. (2003)
Exhaustive enumeration of protein domain families.
J Mol Biol. 2003 May 2;328(3):749-67.
PMID: 12706730 


ADDA is controlled with the script The script expects a file adda.ini 
with configuration options in the directory from which it is called. An example 
is in the directory ./test.


ADDA requires three input files.

   1. sequences in fasta format

   2. the results from an all-on-all sequence comparison (sequence alignment graph)

   3. domain assignments from a reference domain assignment

ADDA proceeds in stages. Each stage corresponds to a command to
the script To run all stages, run as

   python --steps=all

The specific stages are:

1. Pre-processing of the input. These steps can be performed in

   1. indexing the sequence database - "index"

   2. building sequence profiles - "profiles"

   3. formatting and filtering the alignment graph - "graph"

   4. indexing the alignment graph - "index"

   5. estimating the error parameters - "fit"

2. Decomposing sequences into domains - "optimise"

3. Convert sequence alignment graph to domain alignment graph - "convert"

4. Build minimum spanning tree of domains - "mst"

5. Align domains - "align"


A toy example can be found at
The files are:

nrdb.fasta.gz: a file with protein sequences in fasta format. These have been 
filtered to be less than 40% identical (Park et al. 2000).

pairsdb.links.gz: a list of pairwise alignments. These have been obtained by
running BLASTP (Altschul et al. 1997) all-on-all and parsed into a tab-separated 
table. The columns are:

1. query: identifier of the query sequence
2. sbjct: identifier of the sbjct sequence
3. evalue: natural log of the E-Value
4-6. query_start, query_end, query_ali: alignment of the query
7-9. sbjct_start, sbjct_end, sbjct_ali: alignment of the sbjct
10. alignment score (not used)
11. percent identity (not used)

The alignment coordinates are inclusive/exclusive 0-based coordinates "[)".
The alignment is stored in compressed form as alternating integer numbers
with the prefix "+" and "-". Positive numbers signify character emissions 
and negative numbers insertions. For example, "+3-3+2" with the sequence 
ABCDE will result in "ABC---DE". a list of domains for a subset of sequences. The
domains in this file were derived from structural domain definitions in
SCOP (Andreeva et al. 2008).


Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ.
(1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res. Sep 1;25(17):3389-402. Review.

Park J, Holm L, Heger A, Chothia C. (2000) RSDB: representative protein 
sequence databases have high information content. Bioinformatics. May;16(5):458-64.

Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG.
(2008) Data growth and its impact on the SCOP database: new developments.
Nucleic Acids Res. Jan;36(Database issue):D419-25. Epub 2007 Nov 13.


1. auto-calibrate the alignment score threshold 
2. speed up the initial graph parsing using cython


Automated detection and delineation of protein domains







No releases published


No packages published