## MotifEnumeration
Implement MotifEnumeration (reproduced below).

    Input: Integers k and d, followed by a collection of strings Dna.
    Output: All (k, d)-motifs in Dna.

---
**Sample Input:**

3 1

ATTTGGC

TGCCTTA

CGGTATC

GAAAATT

---
**Sample Output:**

ATA ATT GTT TTT

In [1]:
def HammingDistance(p, q):
    num = 0
    for i in range(len(p)):
        if p[i] != q[i]:
            num += 1
    return num

In [10]:
def ApproximatePatternCount(Text, Pattern, d):
    count = 0
    for i in range(len(Text) - len(Pattern) + 1):
        if HammingDistance(Text[i:i + len(Pattern)], Pattern) <= d:
            count += 1
    return count

In [2]:
def Neighbors(Pattern, d):
    if d == 0:
        return {Pattern}
    if len(Pattern) == 1:
        return {'A', 'C', 'G', 'T'}
    Neighborhood = set()
    suff = Pattern[1:]
    SuffixNeighbors = Neighbors(suff, d)
    for neib in SuffixNeighbors:
        if HammingDistance(neib, suff) < d:
            for x in ['A', 'C', 'G', 'T']:
                Neighborhood.add(x + neib)
        else:
            Neighborhood.add(Pattern[0] + neib)
    return list(Neighborhood)

In [12]:
def MotifEnumeration(dna, k, d):
    Patterns = set()
    string = ''.join(dna)
    for i in range(len(string) - k + 1):
        for pat in Neighbors(string[i:i + k], d):
            if all(ApproximatePatternCount(x, pat, d) for x in dna):
                Patterns.add(pat)
    return list(Patterns)

In [13]:
k = 3
d = 1
dna = ['ATTTGGC', 'TGCCTTA', 'CGGTATC', 'GAAAATT']
MotifEnumeration(dna, k, d)

['ATA', 'GTT', 'TTT', 'ATT']

In [23]:
with open('dataset_156_8.txt', 'r') as f:
    k, d = map(int, f.readline().strip().split(' '))
    dna = list(map(lambda x: x.strip(), f.readlines()))
print(' '.join(MotifEnumeration(dna, k, d)))

GGACA GGACT GGACC GACAA GGACG GACCA CAATT ACAAT TGGAC


## MedianString

     Input: An integer k, followed by a collection of strings Dna.
     Output: A k-mer Pattern that minimizes d(Pattern, Dna) among all k-mers Pattern. (If there are multiple such strings Pattern, then you may return any one.)
     
---
**Sample Input:**

3

AAATTGACGCAT

GACGACCACGTT

CGTCAGCGCCTG

GCTGAGCACCGG

AGTACGGGACAG

---
**Sample Output:**

ACG

In [30]:
def NumberToPattern(num, k):
    slovar = {0: "A", 1: "C", 2: "G", 3: "T"}
    pattern = ''
    while num > 3:
        pattern += slovar[num % 4]
        num = num // 4
        if num < 4:
            pattern += slovar[num]
    return 'A' * (k - len(pattern)) + pattern[::-1]

In [28]:
def Distance(Pattern, Dna):
    dist = 0
    for string in Dna:
        dist_s = len(string)
        for i in range(len(string) - len(Pattern) + 1):
            if HammingDistance(Pattern, string[i:i + len(Pattern)]) < dist_s:
                dist_s = HammingDistance(Pattern, string[i:i + len(Pattern)])
        dist += dist_s
    return dist

In [37]:
def MedianString(Dna, k):
    disnance = len(Dna[0])
    med = ''
    for i in range(4 ** k):
        patt = NumberToPattern(i, k)
        d = Distance(patt, Dna)
        if disnance > d:
            disnance = d
            med = patt
    return med

In [38]:
k = 3
dna = ['AAATTGACGCAT','GACGACCACGTT','CGTCAGCGCCTG','GCTGAGCACCGG','AGTACGGGACAG']
MedianString(dna, k)

'ACG'

In [41]:
with open('dataset_158_9.txt', 'r') as f:
    k = int(f.readline().strip())
    dna = list(map(lambda x: x.strip(), f.readlines()))
MedianString(dna, k)

'CATGAA'