Permalink
Fetching contributors…
Cannot retrieve contributors at this time
333 lines (264 sloc) 16.5 KB
********************************************************************************
MAST - Motif Alignment and Search Tool
********************************************************************************
MAST version 3.0 (Release date: 2004/08/18 09:07:01)
For further information on how to interpret these results or to get
a copy of the MAST software please access http://meme.sdsc.edu.
********************************************************************************
********************************************************************************
REFERENCE
********************************************************************************
If you use this program in your research, please cite:
Timothy L. Bailey and Michael Gribskov,
"Combining evidence using p-values: application to sequence homology
searches", Bioinformatics, 14(48-54), 1998.
********************************************************************************
********************************************************************************
DATABASE AND MOTIFS
********************************************************************************
DATABASE farntrans5.s (peptide)
Last updated on Mon Aug 16 21:19:59 2004
Database contains 5 sequences, 1900 residues
MOTIFS meme.farntrans5.tcm.txt (peptide)
MOTIF WIDTH BEST POSSIBLE MATCH
----- ----- -------------------
1 30 GGFQGRPNKEVHTCYTYWALAALAILNKLH
2 14 INKEKLIQWIKSCQ
PAIRWISE MOTIF CORRELATIONS:
MOTIF 1
----- -----
2 0.22
No overly similar pairs (correlation > 0.60) found.
Random model letter frequencies (from non-redundant database):
A 0.073 C 0.018 D 0.052 E 0.062 F 0.040 G 0.069 H 0.022 I 0.056 K 0.058
L 0.092 M 0.023 N 0.046 P 0.051 Q 0.041 R 0.052 S 0.074 T 0.059 V 0.064
W 0.013 Y 0.033
********************************************************************************
********************************************************************************
SECTION I: HIGH-SCORING SEQUENCES
********************************************************************************
- Each of the following 5 sequences has E-value less than 10.
- The E-value of a sequence is the expected number of sequences
in a random database of the same size that would match the motifs as
well as the sequence does and is equal to the combined p-value of the
sequence times the number of sequences in the database.
- The combined p-value of a sequence measures the strength of the
match of the sequence to all the motifs and is calculated by
o finding the score of the single best match of each motif
to the sequence (best matches may overlap),
o calculating the sequence p-value of each score,
o forming the product of the p-values,
o taking the p-value of the product.
- The sequence p-value of a score is defined as the
probability of a random sequence of the same length containing
some match with as good or better a score.
- The score for the match of a position in a sequence to a motif
is computed by by summing the appropriate entry from each column of
the position-dependent scoring matrix that represents the motif.
- Sequences shorter than one or more of the motifs are skipped.
- The table is sorted by increasing E-value.
********************************************************************************
SEQUENCE NAME DESCRIPTION E-VALUE LENGTH
------------- ----------- -------- ------
BET2_YEAST YPT1/SEC4 PROTEINS GERANY... 2.9e-27 325
RATRABGERB Rat rab geranylgeranyl tr... 1.4e-25 331
CAL1_YEAST RAS PROTEINS GERANYLGERAN... 9.7e-22 376
PFTB_RAT PROTEIN FARNESYLTRANSFERA... 7.6e-21 437
RAM1_YEAST PROTEIN FARNESYLTRANSFERA... 6.2e-20 431
********************************************************************************
********************************************************************************
SECTION II: MOTIF DIAGRAMS
********************************************************************************
- The ordering and spacing of all non-overlapping motif occurrences
are shown for each high-scoring sequence listed in Section I.
- A motif occurrence is defined as a position in the sequence whose
match to the motif has POSITION p-value less than 0.0001.
- The POSITION p-value of a match is the probability of
a single random subsequence of the length of the motif
scoring at least as well as the observed match.
- For each sequence, all motif occurrences are shown unless there
are overlaps. In that case, a motif occurrence is shown only if its
p-value is less than the product of the p-values of the other
(lower-numbered) motif occurrences that it overlaps.
- The table also shows the E-value of each sequence.
- Spacers and motif occurences are indicated by
o -d- `d' residues separate the end of the preceding motif
occurrence and the start of the following motif occurrence
o [n] occurrence of motif `n' with p-value less than 0.0001.
********************************************************************************
SEQUENCE NAME E-VALUE MOTIF DIAGRAM
------------- -------- -------------
BET2_YEAST 2.9e-27 6_[2]_3_[1]_1_[2]_4_[1]_4_[2]_
3_[1]_1_[2]_3_[1]_21_[1]_1_[2]_
4_[1]_24
RATRABGERB 1.4e-25 65_[2]_3_[1]_1_[2]_3_[1]_1_[2]_
3_[1]_18_[1]_1_[2]_4_[1]_26
CAL1_YEAST 9.7e-22 125_[2]_50_[2]_1_[1]_4_[2]_22_
[1]_22_[1]_5_[2]_1
PFTB_RAT 7.6e-21 120_[2]_3_[1]_4_[2]_3_[1]_1_[2]_
3_[1]_1_[2]_4_[1]_14_[2]_4_[1]_60
RAM1_YEAST 6.2e-20 144_[1]_5_[2]_4_[1]_1_[2]_4_[1]_
1_[2]_4_[1]_4_[2]_5_[1]_35_[2]_4
********************************************************************************
********************************************************************************
SECTION III: ANNOTATED SEQUENCES
********************************************************************************
- The positions and p-values of the non-overlapping motif occurrences
are shown above the actual sequence for each of the high-scoring
sequences from Section I.
- A motif occurrence is defined as a position in the sequence whose
match to the motif has POSITION p-value less than 0.0001 as
defined in Section II.
- For each sequence, the first line specifies the name of the sequence.
- The second (and possibly more) lines give a description of the
sequence.
- Following the description line(s) is a line giving the length,
combined p-value, and E-value of the sequence as defined in Section I.
- The next line reproduces the motif diagram from Section II.
- The entire sequence is printed on the following lines.
- Motif occurrences are indicated directly above their positions in the
sequence on lines showing
o the motif number of the occurrence,
o the position p-value of the occurrence,
o the best possible match to the motif, and
o columns whose match to the motif has a positive score (indicated
by a plus sign).
********************************************************************************
BET2_YEAST
YPT1/SEC4 PROTEINS GERANYLGERANYLTRANSFERASE BETA SUBUNIT (EC 2.
LENGTH = 325 COMBINED P-VALUE = 5.77e-28 E-VALUE = 2.9e-27
DIAGRAM: 6_[2]_3_[1]_1_[2]_4_[1]_4_[2]_3_[1]_1_[2]_3_[1]_21_[1]_1_[2]_4_[1]_24
[2] [1] [2] [1]
5.2e-05 2.7e-10 6.6e-10 5.9
INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ GGF
+++ +++++++ ++ + +++ +++ +++++ +++++++ + ++++++++++++++ + +
1 MSGSLTLLKEKHIRYIESLDTNKHNFEYWLTEHLRLNGIYWGLTALCVLDSPETFVKEEVISFVLSCWDDKYGAF
[2] [1]
e-14 4.8e-07 2.3e-17
QGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILN
+ +++++++ + ++++++ +++++ +++ +++++++ ++ + ++++ +++++++ +++++++++++
76 APFPRHDAHLLTTLSAVQILATYDALDVLGKDRKVRLISFIRGNQLEDGSFQGDRFGEVDTRFVYTALSALSILG
[2] [1] [1]
5.1e-07 1.4e-18 4.6
KLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILNKLH GGF
+++ ++++ ++++++++ ++++ ++++++++++++++++++++++++ +++
151 ELTSEVVDPAVDFVLKCYNFDGGFGLCPNAESHAAQAFTCLGALAIANKLDMLSDDQLEEIGWWLCERQLPEGGL
[2] [1]
e-22 2.0e-13 3.8e-17
QGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILNKL
++++ ++++++++++++++++++++++ ++ +++++++++++ ++++++++++++++++ ++++++++++ +
226 NGRPSKLPDVCYSWWVLSSLAIIGRLDWINYEKLTEFILKCQDEKKGGISDRPENEVDVFHTVFGVAGLSLMGYD
H
+
301 NLVPIDPIYCMPKSVTSKFKKYPYK
RATRABGERB
Rat rab geranylgeranyl transferase beta-subunit
LENGTH = 331 COMBINED P-VALUE = 2.83e-26 E-VALUE = 1.4e-25
DIAGRAM: 65_[2]_3_[1]_1_[2]_3_[1]_1_[2]_3_[1]_18_[1]_1_[2]_4_[1]_26
[2]
1.0e-11
INKEKLIQWI
+++++++ ++
1 MGTQQKDVTIKSDAPDTLLLEKHADYIASYGSKKDDYEYCMSEYLRMSGVYWGLTVMDLMGQLHRMNKEEILVFI
[1] [2] [1]
1.6e-14 1.4e-09 5.4e-19
KSCQ GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWAL
++++ ++ + +++++++ ++ ++++++++++++ +++++++ ++++++ + ++++++++++++++++++
76 KSCQHECGGVSASIGHDPHLLYTLSAVQILTLYDSIHVINVDKVVAYVQSLQKEDGSFAGDIWGEIDTRFSFCAV
[2] [1]
3.8e-12 4.8e-19
AALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILNKLH
++++++++++ ++++++++++++++ ++++++++ ++++++++++++ ++++++++
151 ATLALLGKLDAINVEKAIEFVLSCMNFDGGFGCRPGSESHAGQIYCCTGFLAITSQLHQVNSDLLGWWLCERQLP
[1] [2] [1]
3.9e-21 1.2e-12 9.6e-18
GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAI
+++++++++++++++++++++++ ++++++ ++++++++++++++ ++++++++++++ +++ ++++++++
226 SGGLNGRPEKLPDVCYSWWVLASLKIIGRLHWIDREKLRSFILACQDEETGGFADRPGDMVDPFHTLFGIAGLSL
LNKLH
+++++
301 LGEEQIKPVSPVFCMPEEVLQRVNVQPELVS
CAL1_YEAST
RAS PROTEINS GERANYLGERANYLTRANSFERASE (EC 2.5.1.-) (PROTEIN GER
LENGTH = 376 COMBINED P-VALUE = 1.94e-22 E-VALUE = 9.7e-22
DIAGRAM: 125_[2]_50_[2]_1_[1]_4_[2]_22_[1]_22_[1]_5_[2]_1
[2]
1.8e-08
INKEKLIQWIKSCQ
+++++++++++++
76 LDDTENTVISGFVGSLVMNIPHATTINLPNTLFALLSMIMLRDYEYFETILDKRSLARFVSKCQRPDRGSFVSCL
[2] [1]
4.8e-10 8.7e-14
INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALA
+++++++ ++++++ + + + +++++ +++ ++++
151 DYKTNCGSSVDSDDLRFCYIAVAILYICGCRSKEDFDEYIDTEKLLGYIMSQQCYNGAFGAHNEPHSGYTSCALS
[2] [1]
5.9e-08 5.9e-20
ALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAIL
+++++++++ ++++++++++++ +++++++++ +++++++++++++ ++
226 TLALLSSLEKLSDKFKEDTITWLLHRQVSSHGCMKFESELNASYDQSDDGGFQGRENKFADTCYAFWCLNSLHLL
[1] [2]
4.0e-13 2.1e-07
NKLH GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ
++++ ++++ + ++++++++++ + +++++++ + ++ +++++++++
301 TKDWKMLCQTELVTNYLLDRTQKTLTGGFSKNDEEDADLYHSCLGSAALALIEGKFNGELCIPQEIFNDFSKRCC
PFTB_RAT
PROTEIN FARNESYLTRANSFERASE BETA SUBUNIT (EC 2.5.1.-) (CAAX FARNES
LENGTH = 437 COMBINED P-VALUE = 1.53e-21 E-VALUE = 7.6e-21
DIAGRAM: 120_[2]_3_[1]_4_[2]_3_[1]_1_[2]_3_[1]_1_[2]_4_[1]_14_[2]_4_[1]_60
[2] [1]
1.3e-07 2.8e-19
INKEKLIQWIKSCQ GGFQGRPNKEVHT
++ ++++++++ ++ +++++++++ +++
76 EKHFHYLKRGLRQLTDAYECLDASRPWLCYWILHSLELLDEPIPQIVATDVCQFLELCQSPDGGFGGGPGQYPHL
[2] [1] [2]
2.3e-09 2.1e-14 1.8e-0
CYTYWALAALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKL
+ +++++++++++++++ ++++++++++ +++ + + ++ +++++++ +++++++++++++++ + +++
151 APTYAAVNALCIIGTEEAYNVINREKLLQYLYSLKQPDGSFLMHVGGEVDVRSAYCAASVASLTNIITPDLFEGT
[1] [2] [1]
8 7.4e-20 1.8e-08 2.2e-16
IQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCY
++++ +++ +++++ +++++++++++++++++ ++++++ + +++++++++++ ++++++ ++++++++
226 AEWIARCQNWEGGIGGVPGMEAHGGYTFCGLAALVILKKERSLNLKSLLQWVTSRQMRFEGGFQGRCNKLVDGCY
[2] [1]
5.0e-08 3.1e-15
TYWALAALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILNK
++++++ + ++++ ++++++++++++++ +++ +++++ +++++++++++++++++
301 SFWQAGLLPLLHRALHAQGDPALSMSHWMFHQQALQEYILMCCQCPAGGLLDKPGKSRDFYHTCYCLSGLSIAQH
LH
+
376 FGSGAMLHDVVMGVPENVLQPTHPVYNIGPDKVIQATTHFLQKPVPGFEECEDAVTSDPATD
RAM1_YEAST
PROTEIN FARNESYLTRANSFERASE BETA SUBUNIT (EC 2.5.1.-) (CAAX FARN
LENGTH = 431 COMBINED P-VALUE = 1.24e-20 E-VALUE = 6.2e-20
DIAGRAM: 144_[1]_5_[2]_4_[1]_1_[2]_4_[1]_1_[2]_4_[1]_4_[2]_5_[1]_35_[2]_4
[1]
8.8e-1
GGFQGR
+ ++++
76 PALTKEFHKMYLDVAFEISLPPQMTALDASQPWMLYWIANSLKVMDRDWLSDDTKRKIVVKLFTISPSGGPFGGG
[2] [1]
7 6.4e-07 1.0e-13
PNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILNK
++++++++ ++++++++++ ++++ ++++++++++ +++ + ++ ++++++++ +++++++++++++
151 PGQLSHLASTYAAINALSLCDNIDGCWDRIDRKGIYQWLISLKEPNGGFKTCLEVGEVDTRGIYCALSIATLLNI
[2] [1] [2] [1]
2.5e-08 3.1e-17 4.7e-11 2.4e-
LH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ GGFQG
++ + ++++++++++++ + ++ +++++++++++++++++++++++ ++++++++++++++ ++ +
226 LTEELTEGVLNYLKNCQNYEGGFGSCPHVDEAHGGYTFCATASLAILRSMDQINVEKLLEWSSARQLQEERGFCG
[2] [1]
16 4.9e-09 2.7e-13
RPNKEVHTCYTYWALAALAILNKLH INKEKLIQWIKSCQ GGFQGRPNKEVHTCYTYWALAALAILN
+ ++++++++++++ +++++++++ +++++++++++ ++ +++++++++++++++ +++ ++++++
301 RSNKLVDGCYSFWVGGSAAILEAFGYGQCFNKHALRDYILYCCQEKEQPGLRDKPGAHSDFYHTNYCLLGLAVAE
[2]
9.8e-05
KLH INKEKLIQWIKSCQ
+ +++++++ + +++
376 SSYSCTPNDSPHNIKCTPDRLIGSSKLTDVNPVYGLPIENVRKIIHYFKSNLSSPS
********************************************************************************
CPU: pmgm2
Time 0.130000 secs.
mast meme.farntrans5.tcm.txt -text -stdout