# All $k$-nucleotides

Today's challenge is to implement the function `get_k_nucleotides` from CME 211 quiz 1.

## Description of problem from quiz

One common technique in DNA analysis is comparing the frequencies of $k$-nucleotide sequences. A $k$-nucleotide is a sequence of of $k$ DNA nucleotides. In order to perform any analysis of $k$-nucleotides, it is useful to have a container with all $k$-nucleotides.

Write the function `get_k_nucleotides(k)` which takes in `k` as the nucleotide sequence length as an input argument and returns a list object containing all length `k` nucleotide sequences as Python strings.

Remember the DNA "letters" are A, C, G, and T.

In [3]:
# example function
def get_k_nucleotides(k):
    pass

In [8]:
# solution 1
def get_k_nucleotides(k):
    # return empty list for k == 0
    if k == 0:
        return []
    # list of nucleotides
    nts = ['A', 'C', 'G', 'T']
    # list of k-nucleotides, that will be expanded
    knts = ['A', 'C', 'G', 'T']
    for i in xrange(k-1):
        temp_knts = []
        for nt in nts: # for each nucleotide
            for knt in knts:
                # extend sequence in knt with current nucleotide (nt)
                temp_knts.append(knt + nt)
        # reassign to knts
        knts = temp_knts
    return knts

In [9]:
# example 0
get_k_nucleotides(0)

[]

In [10]:
# example 1
get_k_nucleotides(1)

['A', 'C', 'G', 'T']

In [11]:
# example 2
get_k_nucleotides(2)

['AA',
 'CA',
 'GA',
 'TA',
 'AC',
 'CC',
 'GC',
 'TC',
 'AG',
 'CG',
 'GG',
 'TG',
 'AT',
 'CT',
 'GT',
 'TT']

## Solution notes

This is direct enumeration of a combinatorial sequence.  It is often useful to do this in practice for small values of `k`.  Of course the size the of the output list is exponential in `k`, so this is impractical for large `k`.

We can also implement this with recursion

In [13]:
# recursive solution
def recursive_get_k_nuc(k):
    # base case: k == 0
    if k == 0:
        return []
    if k == 1:
        return ['A', 'C', 'G', 'T']
    nts = ['A', 'C', 'G', 'T']
    # get k-1 nucleotides
    km1nts = recursive_get_k_nuc(k-1)
    # construct k nucleotides
    knts = []
    for nt in nts:
        for km1nt in km1nts:
            knts.append(km1nt + nt)
    return knts

In [14]:
recursive_get_k_nuc(0)

[]

In [15]:
recursive_get_k_nuc(1)

['A', 'C', 'G', 'T']

In [16]:
recursive_get_k_nuc(2)

['AA',
 'CA',
 'GA',
 'TA',
 'AC',
 'CC',
 'GC',
 'TC',
 'AG',
 'CG',
 'GG',
 'TG',
 'AT',
 'CT',
 'GT',
 'TT']

## List comprehension

The following operation is quite common in python, where `xs` is a list and `f` is a function:

```
ys = []
for x in xs:
    ys.append(f(x))
```

This can be accomplished in a single line of code with:

```
ys = [f(x) for x in xs]
```

Pretty sweet!


In [17]:
# example of a list comprehension
names = ['nick', 'jane', 'bob', 'sally']
greetings = ['hi ' + name for name in names]
print(greetings)

['hi nick', 'hi jane', 'hi bob', 'hi sally']


In [20]:
# solution using list comprehensions
# recursive solution with list comprehensions
def recursive_get_k_nuc_2(k):
    # base case: k == 0
    if k == 0:
        return []
    if k == 1:
        return ['A', 'C', 'G', 'T']
    nts = ['A', 'C', 'G', 'T']
    # get k-1 nucleotides
    km1nts = recursive_get_k_nuc(k-1)
    # construct k nucleotides
    knts = []
    for nt in nts:
        knts.extend([km1nt + nt for km1nt in km1nts])
    return knts

In [29]:
# use itertools.product
from itertools import product
def recursive_get_k_nuc_3(k):
    # base case: k == 0
    if k == 0:
        return []
    if k == 1:
        return ['A', 'C', 'G', 'T']
    nts = ['A', 'C', 'G', 'T']
    # get k-1 nucleotides
    km1nts = recursive_get_k_nuc(k-1)
    # construct k nucleotides
    knts = [km1nt + nt for km1nt, nt in product(km1nts,nts)]
    return knts

In [52]:
# use python magic
from itertools import product
def get_k_nuc_4(k):
    nts = ['A', 'C', 'G', 'T']
    knts = [''.join(x) for x in product(*[['A', 'C', 'G', 'T']]*k)]
    return knts

In [32]:
nts = ['A', 'C', 'G', 'T']

In [60]:
# one liner
k = 2
knts = [''.join(x) for x in product(*[['A', 'C', 'G', 'T']]*k)]
print(knts)

['AA', 'AC', 'AG', 'AT', 'CA', 'CC', 'CG', 'CT', 'GA', 'GC', 'GG', 'GT', 'TA', 'TC', 'TG', 'TT']
