-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
127 lines (87 loc) · 4.23 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
Copyright (c) David Powell <david@drp.id.au>
M-align aligns two sequences using the M-alignment algorithm.
M-align is provided under he GNU Public License v2 and
comes with ABSOLUTELY NO WARRANTY, or details see file COPYRIGHT
Please cite:
L. Allison, D. R. Powell and T. I. Dix
"Compression and Approximate Matching"
The Computer Journal, 1999 42:1 pp1-10
COMPILING
---------
This code is known to compile under Linux using gcc versions
3.x and 2.95. It should be as simple as running 'make'.
The programs in this package were specifically written for
DNA sequences. It should be a relatively simple to modify
the code to handle different alphabets. Simply edit
'common.h' and change 'ALPHA_SIZE' to the number of characters
in the alphabet, and change the array 'alphabet' to the
actual characters in the alphabet.
Good luck!
THE ALGORITHM
-------------
This implementation of the M-alignment algorithm is specifically for
testing the relatedness of two DNA sequences. It also differs from
the description abovementioned paper in the following ways:
- It performs local alignment.
- Uses affine gap costs.
The key idea to m-alignment is to incorporate a model of each sequence
into the alignment process. This program comes with code to model
sequence by a fixed order Markov Model. But it is possible to use any
sequence model that gives a probabilistic prediction for each
character in the sequences.
This implementation sums over all possible alignments to calculate
how "related" the sequences are. So, it does not produce an actual
alignment. The algorithm is iterated to find a (possibly local)
optimum for the alignment parameters.
----------------------------------------------------------------------
PROGRAM: m-align
USAGE
------
Usage: m-align [-mfiabh] <seqA> <seqB>
Where <seqA> and <seqB> are files in FASTA format.
-m <n> Use a n'th order Markov Model
-f <file> Read the Markov Model parameters from <file>
-s <file> Save the Markov Model parameters to <file> after fitting the sequences
-i <n> Set max iterations to <n>. Use -1 for unlimited
-a <file> Read character prediction probabilites for seqA from <file>
-b <file> Read character prediction probabilites for seqB from <file>
-h This help
The program takes two files on the commandline. These two files
each contain one sequence. The two sequences are to be
tested for relatedness and must be in FASTA format.
By default m-align fits a fixed order Markov Model to the sequences.
The order of this Markov Model may be set using the '-m' option, the
default value is 0. To use a uniform model (ie. 2 bits per DNA
character), specify the Markov Model order as -1.
The Markov Model parameters that are calculated by fitting the the
sequences can be saved using the '-s' option. This parameters file
can be loaded later using the '-f' option (be sure to specify the
correct Markov Model order using '-m').
The maximum number of iterations may be specified with the '-i'
option. A value of -1 means unlimited iterations, that is, iterate to
convergence.
----------------------------------------------------------------------
PROGRAM: markov_pred
This program is a utility to fit a fixed order markov model
to a sequence. It will outputs a probabilistic prediction
for each alphabet character at every position along the
sequence. The format of the output is directly readable by
the 'm-align' program using the '-a' or '-b' options.
USAGE
------
Usage: ./markov_pred [-mfh] <seq>
Where <seq> is a sequence file in FASTA format.
-m <n> Use a n'th order Markov Model
-f <file> Read the Markov Model parameters from <file>
-s <file> Save the Markov Model parameters to <file>
-h This help
The sequence must be in the file <seq> yin FASTA format.
The desired order of the markov model can be specified using
the '-m' option.
Instead of fitting the markov model to the sequence, the
markov model counts can be loaded from a file using the '-f'
option.
After fitting the markov model to the sequence, it is
possible to save the markov model counts to a file using the
'-s' option. This file can then be read by either the
'markov_pred' or 'm-align' programs with the '-f' option.