# W3D2 Lists revisited

## Exercise 1: SUBS

Find **every** match of a target string in a DNA sequence:

`GATATATGCATATACTT
 ATAT
   ATAT
         ATAT
`

Finding a solution - how? 
Lets start with a string method... which one might be appropriate?
Again, just ASK the object what you can do with it, you might finds some clues...

In [None]:
seq = 'GATATATGCATATACTT'
target = 'ATAT'

seq.find(target)

So, 'str.find' does not give a perfect solution. What about a regular expression maybe? We could use the 're', regular expression library of Python.

In [None]:
import re
re.findall(target,seq)

A bit better, but we only find two out of three matches. This is due to the same issue we've encountered before when trying regular expressions: re matches can not overlap. If they do, you have to apply a bit of trickery. That trickery is 'slicing': chewing up your string in smaller bits, e.g. by 'sliding' over it. For instance:

In [None]:
#seq = raw_input('s = ')
#target = raw_input('t = ')

seq = 'GATATATGCATATACTT'
target = 'ATAT'
n = len(target)
locs = list()
for i in range(len(seq)-n):
    #print seq[i:i+n]
    if seq[i:i+n] == target:
        locs += [i+1] #alternatively: locs.append(i+1)
        #print locs
for loc in locs:
    print loc,
print


Ok, so that works! Is this the only way it would work then? No, as usual there are many ways to do it. Even the 'str.find()' method can be made to work. Lets see what hidden features the 'str.find() method holds...

In [None]:
seq.find?

This means you can in fact use an index value, from which the 'find' method will start searching in the string. 
`GATATATGCATATACTT
 ATAT
   ATAT
         ATAT
`
This way you can have different starting point for your 'find' operation, starting at the beginning and sliding one DNA base for every looping. This way you can also find the overlapping ones. But, you do find the same location multiple times this way, and we're intersted only in the unique ones. For this we can apply the 'set()' function, which returns a set of unique values from a list. And if there is no match, the method will return '-1'. This last one we can ignore. 

In [None]:
locs = list()
for i in range(len(seq)-len(target)):
    loc = seq.find(target,i)
    print loc
    if loc > -1:
        locs.append(loc+1)
print 'complete list: ',locs
print 'applying the set function: ',set(locs)

## Exercise 5: Fibonacci - rabbits

![Rabbits](Rabbits_explained.jpg)
The solution is to 'grow' a list based on its previous two elements. You simply do three times the before-last element plus the last element.

In [None]:
n = int(raw_input('n = '))
k = int(raw_input('k = '))

seq = [0, 1, 1]

for i in range(3, n+1):
    seq_i = seq[i-1] + k * seq[i-2]
    seq += [seq_i]
    print seq

print 'answer: ',seq[n]
