# Midterm exam 1

There are 4 questions, each worth 25 points. Write Python code to solve each question.

Points will be deducted for 

- Functions or classes without `docstrings`
- Grossly inefficient or redundant code
- Excessively verbose code
- Use of *magic* numbers

Partial credit may be given for incomplete or wrong answers but not if you do not attempt the question.

You should only have this notebook tab open during the exam and stay on the same notebook throughout. You may use built-in help, accessed via `?foo`, `foo?` or `help(foo)`.

**IMPORTANT**

- This is a **closed book** exam meant to evaluate fluency in Python
- Use a stopwatch to record the number of minutes you took to complete the exam in the cell below **honestly**. 1 point will be deducted for every 2 minutes beyond 75 minutes. So if you take 90 minutes to complete the exam, 8 points will be deducted.
- Upload the notebook to Sakai when done

**Honor Code**: You agree to follow the Duke Honor code when taking this exam.

**Time taken**

Time: xx mins

**1**. (25 points)

Find the number of times `CATCAT` appears in the file `seq.txt`.

- Count overlapping occurrences - i.e. `CATCATCAT` should count as 2 occurrences.

In [1]:
with open('seq.txt') as f:
    seq = f.read()

In [2]:
import re

In [3]:
len(re.findall('(?=(CATCAT))', seq))

141

In [4]:
count = 0
for x in zip(seq, seq[1:], seq[2:], seq[3:], seq[4:], seq[5:]):
    if ''.join(x) == 'CATCAT':
        count += 1
count

141

**2**. (25 points)

Supposed you had two sets of cards numbered from 1 to 1,000. We define a *match* to occur if the cards in the same position in both decks have the same number. For example, if deck 1 is [1,3,2,4] and deck 2 is [3,4,2,1], there is a single match at position 3 for the card with value 2.

Assuming the cards in each set are randomly shuffled, use 100,000 simulations to estimate

- the expected number of matches (this should be an integer)
- the probability of finding at least one match

Hint: You can use `np.random.permutation`

In [5]:
import numpy as np

In [6]:
n = 10
reps = 100000

s = 0
t = 0
for r in range(reps):
    x = np.random.permutation(n)
    y = np.random.permutation(n)
    s += np.sum(x == y)
    t += np.any(x == y)

In [7]:
int(round(s/reps))

1

In [8]:
t/reps

0.6308

**3**. (25 points)

One way to find a root (zero) of a function between two points $(a, b)$ is to bisect (find the midpoint $c$) of $(a, b)$, identify if the root is now in $(a, c)$ or $(c, b)$, and repeat the bisection until the function value is sufficiently close to zero.

Write a bisection function with signature `bisect(f, a, b, tol)` and use it to find the square root of 2 given $a=0, b=2$. Stop when the function evaluated at the bisected point is within $10^{-6}$ of 0.

- Hint 1: There is a root between $a$ and $b$ if $f(a)$ and $f(b)$ have opposite signs
- Hint 2: Think about what the function $f$ should be

In [9]:
def bisect(f, a, b, tol=1e-6):
    """Bisectin to find roots of f given brackets (a, b)."""
    
    c = (a + b) / 2
    while np.abs(f(c)) > tol:
        if f(a) * f(c) < 0:
            b = c
        else:
            a = c
        c = (a + b) / 2
    return c

In [10]:
bisect(lambda x: 2-x**2, 0, 2)

1.4142136573791504

**4**. (25 points)

In a coin tossing example, you count the number of tosses till one of the following sequence appears

- Seq 1: `HT`
- Seq 2: `HH`

For example, `HTTHH` would be of type Seq 2 with a run length of 5.

Simulate 10,000 coin tossing experiments of the following kind:

- Expt 1: Stop when Seq 1 is observed
- Expt 2: Stop when Seq 2 is observed
- Expt 3: Stop when Seq 1 *or* Seq 2 is observed

Report the average run length of experiments 1, 2 and 3, rounding to the nearest integer.

In [11]:
def runs(n, stop):
    runs = np.zeros(n, dtype='int')
    for i in range(n):
        seq = ''
        while True:
            seq += np.random.choice(['H', 'T'])
            if stop(seq[-2:]):
                runs[i] = len(seq)
                break
    return runs

In [12]:
seq1 = 'HT'
seq2 = 'HH'
tosses = 10000

e1 = runs(tosses, lambda x: x==seq1)
e2 = runs(tosses, lambda x: x==seq2)
e3 = runs(tosses, lambda x: (x == seq1) or (x == seq2))

In [13]:
[int(np.round(np.mean(e))) for e in [e1,e2,e3]]

[4, 6, 3]