# Midterm exam 1

There are 4 questions, each worth 25 points. Write Python code to solve each question.

Points will be deducted for 

- Functions or classes without `docstrings`
- Grossly inefficient or redundant code
- Excessively verbose code
- Use of *magic* numbers

Partial credit may be given for incomplete or wrong answers but not if you do not attempt the question.

You should only have this notebook tab open during the exam and stay on the same notebook throughout. You may use built-in help, accessed via `?foo`, `foo?` or `help(foo)`.

**IMPORTANT**

- This is a **closed book** exam meant to evaluate fluency in Python
- Use a stopwatch to record the number of minutes you took to complete the exam in the cell below **honestly**. 1 point will be deducted for every 2 minutes beyond 75 minutes. So if you take 90 minutes to complete the exam, 8 points will be deducted.
- Upload the notebook to Sakai when done

**Honor Code**: You agree to follow the Duke Honor code when taking this exam.

**Time taken**

Time: xx mins

**1**. (25 points)

Create a DataFrame showing the number of all possible transitions between the letters A, C, T in the file `seq/txt`- this should have a shape (3,3) and each cell should contain the number of transitions (e.g. $A \to C$) found. Rows and columns of the DataFrame should be the letters A, C, T.

- Convert this to a stochastic matrix - i.e. one where each *row* sums to 1


**Hint**: For the sequence `AAATAT` the transition counts would be

- `AA` = 2
- `AT` = 2
- `TA` = 1

In [None]:
with open('seq.txt') as f:
    seq = f.read()

In [None]:
d = {}
for x in zip(seq, seq[1:]):
    k = ''.join(x)
    d[k] = d.get(k, 0) + 1

In [None]:
import numpy as np

In [None]:
m = np.zeros((3,3), dtype='int')
for i,x in enumerate('ACT'):
    for j,y in enumerate('ACT'):
        k = ''.join([x, y])
        m[i,j] = d.get(k, 0)

In [None]:
m

In [None]:
import pandas as pd

In [None]:
pd.DataFrame(m / m.sum(axis=1)[:, None], index=list('ACT'), columns=list('ACT'))

**2**. (25 points)

Using only `map` and `reduce` (from `functools`) and anonymous functions, convert the strings given into a generator of lower case words. Find the most frequently occurring word using only a Python dictionary.

In [None]:
s1 = 'The quick brown fox jumps over the lazy brown dog'
s2 = 'How now brown cow'
s3 = 'Jack and Jill went up the hill'

In [None]:
ss =[s1, s2, s3]

In [None]:
from functools import reduce

In [None]:
counter = {}
for word in (reduce(lambda x, y: x + y,
                    map(lambda x: x.lower().split(), ss))):
    counter[word] = counter.get(word, 0) + 1

In [None]:
n = max(counter.items(), key=lambda x: x[1])[1]
{(k, v) for k, v in counter.items() if v==n}

**3**. (25 points)

Define a function that returns True if a given integer $n$ is prime and false otherwise. Do this as efficiently as possible. 

- Count the number of primes in `nums.txt`

In [None]:
def is_prime(n):
    """Check if n is prime."""
    
    if n == 2:
        return True
    elif n < 2 or n % 2 == 0:
        return False
    else:
        for i in range(3, int(np.sqrt(n))+1, 2):
            if n % i == 0:
                return False
    return True

In [None]:
nums = np.loadtxt('nums.txt', dtype='int')

In [None]:
[n for n in range(-5, 20) if is_prime(n)]

In [None]:
len(nums)

In [None]:
len(list(filter(is_prime, nums)))

**4**. (25 points)

Consider the following function $f(x) = rx(1-x)$. 

For a particular value of $r$, iteratively evaluate $f$ 100 times, each time using the output as the next input $x$. 

- Let $r$ take values from the sequence [0, 0.01, ..., 4]
- For each value of $r$ find the final value $y$ returned by $f$ for $m = 50$ different random starting $x$ drawn from the standard uniform distribution

For example if $r=2$ and $x=0.1$, the iterations would return the values

$ 0.1 \to 0.2 \to 0.3 \to 0.4 \to 0.5 \to 0.5 \to \ldots \to 0.5$

and the value recorded as $y$ would be 0.5

- Make a scatter plot of $y$ against $r$, using `s=1` for the marker size

Your figure should look like this

![img](bif.png)

In [None]:
def f(x, r):
    """Logistic function."""
    
    return r*x*(1-x)

def fn(x, r, n):
    """Nested logistic function."""
    
    for i in range(n):
        x = f(x, r)
    return x

In [None]:
n = 100
m = 50
ys = []

r = np.linspace(0, 4, 401)
xs = np.random.rand(m)

for x in xs:
    ys.append(fn(x, r, n))

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.scatter(np.tile(r, m), np.r_[ys], s=1)
pass