# Python elements for exercises on "Decisions and Loops"

## Finding a motif in DNA (SUBS)
Here we have to find locations of a given substring `t` in a given string `s`.  
The result for Rosalind has to be line of numbers separated by spaces.  
For practising lists, we first build a list of numbers (positions).  
Suppose we have found list `locs`.

In [None]:
locs = [2, 4, 10]

Write a small loop that prints this list as  
`2 4 10`  
(of course it should work for any `locs`).

In [None]:
print locs # too crude; Rosalind won't accept this format

Now, finding `locs`: we don't know how many elements the list will have, so we build the list on the fly:

In [44]:
locs = []
# inside loop:
locs += [2]
# or:
locs.append(4)
# or:
locs.extend([10])
#after the loop:
print locs

[2, 4, 10]


If you don't see yet what the code does, do some more experiments. Create new code cells as you need them.

Finally, finding the motifs (substrings).  
Make a loop that visits each potential starting position `p`.  
Inside the loop, use slicing in `s` and compare the result to `s`.  
And then, conditionally, add an element to the list of locations (as above).  
Use as many code cells as you want to create the program parts.

In [45]:
seq = raw_input('what is the source sequence?')
sub_seq = raw_input('what is the sub sequence?')

pos_list = []
length = len(seq)
length_sub = len(sub_seq)

for start_point in range(length - length_sub):
    end_point = start_point + length_sub
    small_seq = seq[start_point:end_point]
    if small_seq == sub_seq:
        pos = start_point + 1
        pos_str = str(pos)
        pos_list.append(pos_str)
positions = ' '.join(pos_list)
print positions

what is the source sequence?GATATATGCATATACTT
what is the sub sequence?ATAT
2 4 10


Put it all together into one script. Here you can copy the input from Rosalind into the input fields of `raw_input`.

In [None]:
t = raw_input('... ') # fill in the dots
s = raw_input('... ') # fill in the dots
# ... your code
print 'answer' # too be replaced by your code

Now copy this script into a file of its own and save it with file type `.py`.  
Then run it from a terminal with `python <filename>` (where `<filename>` is the name of your Python script.  
[Or include the shebang line in the script and make the file executable.]

## Binomial distribution

First let's try the random generator. It is defined in module `random`.

In [18]:
import random # needed only once at the start

Function `random()` (in module `random`) gives a number between 0 (inclusive) and 1 (exclusive).  
Each time the function is called it gives another random number, or perhaps very rarely the same number twice.  
Because of the underlying algorithms, the sequence of numbers generated will have very good statistical properties.

In [19]:
print random.random()
print random.random()
print random.random()
# and try it again, and again, ...

0.261214686953
0.744018408776
0.633866850179


Let's use the random generator for deciding something with probability `p`.

In [55]:
p = 0.2
if p > random.random():
    print 'Y'
else:
    print 'n'

n


How do we see that the probability is accually equal to `p`?  
Let's repeat these lines to show a series of results.  
Per line we should have 2x Y (for `p=0.2`), or a bit more or less.  
Occasionally it's much more. In fact that is exactly the binomial distribution.

In [21]:
for x in range(5): # five lines
    for i in range(10): # ten entries per line
        if p > random.random():
            print 'Y', # comma means no newline
        else:
            print 'n',
    print # finally a newline

n Y Y n Y n n n Y n
n n n n n Y n n n n
n n n n Y n n n n n
Y n n n n n n n n n
n Y n n n Y n n n n


Now, we want to keep counts instead of printing all single choices.  
Let's say we have done 100 series of 10 with probability 0.2.  
The counts could be: 12x 0, 30x 1, 34x 2, 14x 3, 5x 4, 4x 5, 1x 6, 0x 7, ... 0x 10.   
(this is actual output of an experiment with the binomial distribution).

In [22]:
counts = [12, 30, 34, 14, 5, 4, 1, 0, 0, 0, 0]
print 'total', sum(counts)
for c in counts:
    print c

total 100
12
30
34
14
5
4
1
0
0
0
0


Of course, you can do better than that. Make it into a nice table.  
[Tooltip: use string formatting.]

In [4]:
# print nice table
counts = [12, 30, 34, 14, 5, 4, 1, 0, 0, 0, 0]
num = range(10)
for c in num:
    print('%d\t%d' % (c,counts[c]))


0	12
1	30
2	34
3	14
4	5
5	4
6	1
7	0
8	0
9	0


What remains is creating and filling list `counts`.  
In this case, we can create the list as `n+1` zeros (0..`n` including `n`)

In [None]:
# create a list of n zeros
range(0,n+1)

In [None]:
# fill the list by a loop like the one that printed repeated decisions

Put it all together into one script.  
Maybe the script can actually ask for values of `n` and `p`.

In [13]:
# your complete script
import random
p = input('p: ')
n = input('n: ')
num = int(n) + 1
p_flo = float(p)
k = range(0,num)
re_list = []
counts = []
for x in k:
    for y in range(0,x):
        if p_flo > random.random():
            result = 'Y'
            re_list.append(result)
        else:
            result = 'n'
            re_list.append(result)
        count = re_list.count('Y')
        counts.append(count)

for c in k:
    print('%d\t%d' % (c,counts[c]))


p: 0.5
n: 20
0	1
1	1
2	1
3	1
4	1
5	1
6	1
7	1
8	2
9	2
10	3
11	3
12	3
13	3
14	3
15	4
16	5
17	6
18	7
19	7
20	7


Again, save and run your script also outside this notebook.

## Weighted chances (WoF)

This list of chances corresponds to the picture in the assignment text

In [15]:
chances = [100.0, 50.0, 50.0, 75.0, 75.0, 150.0]

The total of these chances is not 1.0 or 100.

In [16]:
sum(chances) # or store it in a variable for later reference

500.0

Find the (relative) distance as a random number between 0 and this total.  
[Tooltip: multiply the outcome of the random number generator by this total.]

In [17]:
rand_n = random.random()
print(rand_n)
total_chances = sum(chances)
distance = total_chances * rand_n
print(distance)

0.7755189782523502
387.7594891261751


For each chance in the list, subtract it from the distance.  
Meanwhile, for each element of the list, check whether this is the position where distance will become negative.  
Store that position in a variable. After the loop print its value. (Oh, and you might have to initialize the variable before the loop.)

In [18]:
# initialize result position
# loop and adjust distance, meanwhile checking position
for chance in chances:
    pos = distance - chance
    if pos > 0 and pos < distance:
        position = pos
# print result position
print(position)

237.75948912617508


In [9]:
# total script; when copying to file, also include import random
chances = [100.0, 50.0, 50.0, 75.0, 75.0, 150.0]
rand_n = random.random()
total_chances = sum(chances)
distance = total_chances * rand_n
distance_start = 0
distance_end = 0
pos_ind = 0
for chance in chances:
        distance_end = distance_start + chance
        if distance > distance_start and distance < distance_end:
            position.append(pos_ind)
        distance_start += chance
        pos_ind += 1
position = pos_ind
print(position)

6


## Testing the Wheel of Fortune
In fact, almost all elements have been done in the previous two exercises.  
When copying code from last exercise into a loop, you have to increase the indentation level.

In [8]:
# we leave any remaining details to you
# again copy the script into its own file and run it
import random
chances = [100.0, 50.0, 50.0, 75.0, 150.0]
names = ['A','B','C','D','E']
n = input('how many times you want to run? ')
total_num = int(n) + 1
range_list = range(0,total_num)
position = []
for num in range_list:
    rand_n = random.random()
    total_chances = sum(chances)
    distance = total_chances * rand_n
    distance_start = 0
    distance_end = 0
    pos_ind = 0
    for chance in chances:
        distance_end = distance_start + chance
        if distance > distance_start and distance < distance_end:
            position.append(pos_ind)
        distance_start += chance
        pos_ind += 1
counts_list = []
for pos_index in range_list:
    count = position.count(pos_index)
    counts_list.append(count)

print('%s\t%s\t%s' % ('student','chance','count'))
for ind in range(len(chances)):
    print('%s\t%1.1f\t%d' % (names[ind],chances[ind],counts_list[ind]))

how many times you want to run? 50000
student	chance	count
A	100.0	11634
B	50.0	6006
C	50.0	6048
D	75.0	8840
E	150.0	17473


## Rabbits and Recurrence Relations (FIB)

First get `n` and `k` from the 'user'.  
(Blank code cells mean you have to do it yourself.)

Create a list of results and set it to `[0, 1, 1]` as given in the assigment text.  
So we have the values for `i=0 i=1 i=2`.

For filling the rest of the list we need a loop. (actual filling: see below)  
The first entry to fill is for `i=3`; the last one for `i=n`.  
[Tooltip: function `range(x,y)` goes from `x` inclusive to `y` exclusice.]

For computing the next entry, you have to refer to the preceeding two entries.  
If `i` is the position of the entry to fill, the preceeding entries have positions `i-2` and `i-1`.  
Inside the loop, write a line of code that computes the next extry and then expands the list with that new entry.

In [None]:
# copy the loop and add the necessary code.

Print the result and it's all done.

Finally, put everything together again, etc.