-
Notifications
You must be signed in to change notification settings - Fork 1
/
IDEAS
20 lines (14 loc) · 991 Bytes
/
IDEAS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1) Instead of doing q-gram search / match for every position in the haystack, upon a hit, move |q-gram| and start from there. Anything inbetween will be added during the seed-extension phase.
Useful algorithms / problems
1) longest common extension
2) k-mismatch: - P, T, fixed number of k errors, a k-mismatch of P is: a substring of T that matches P with at most k mismatches. O(k*|T|)
- speedup for DNA: generate all possible P' by changing k chars of P and search for each P' in the ST/SA of T
=== TO TEST ===
Y 1) several tts sequences in input file
2) different c-s (consecutive errors, not yet impl.?)
Y 3) forward reverse stuff
=== WARNING !!! ===
1. if mismatch occurs at the ends the initial seed, handle it! (although it's discarded at the end, possibly)
2. TODO!! keep mad of max seeds as these are quite redundant, check before extending them
=== OPTIMIZATIONS TO DO ===
1. keep offset mismatch arrays for each tts-tfo pair after initially computing it