Skip to content

ambimorph/sequences

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 

Repository files navigation

Missing Sequence Finder

Finds subsequences of up to length n not present in a sequence (s) made from an alphabet.

Gist

We pass through the sequence s up to n times, where n is the length of subsequence we want to find that is not in s.

invariant:

At the beginning of the ith pass, the dictionary d holds all |alphabet|^(i-1) keys, mapped to empty sets. E.g. if the alphabet is {A,C,G,T}, then at the beginning of the third pass, d holds the 16 keys AA, AC, AG,... TT.

During the ith pass, we look at all subsequences w in s of length i. We insert the ith character of w in d[w[:-1]] (the prefix). If w[:-1] has size |alphabet|, we insert the |alphabet| new keys, e.g. w[:-1]A, w[:-1]C, w[:-1]G, w[:-1]T into d, and delete the key w[:-1].

If we get to the end of the ith pass and the size of d is less than |alphabet|^i, we now have at least one prefix of a subsequence not in s.

From there we generate all possible words of length n containing the subsequences.

Usage:

$ python missing_subsequence_finder.py filename n

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages