**Runs Assignment**

**Important - read carefully**

- Failure to follow instructions will lead to point penalties.

- You may use numpy but **no other packages** in this assignment.

- When you are done with this assignment, evaluation of all cells should produce no errors.

- When asked to write a function, **use exactly the function name requested** and 
do not define any other function with the same name anywhere else in the notebook.

- When asked to make a **literal assignment** to a variable, you should 
    - use exactly the name of the variable requested on the left hand side
of the assignment
    - do **not** assign a value to that variable name anywhere else in the notebook
    - use **no functions, no variables, and no arithmetic operations on the right hand side (you can use a different cell to print out a number and then copy the number** to the right hand side of the assignment.
    - when asked for a literal floating point value in an assignment, for example
        - x = 67.5 **is** ok
        - x = "67.5" is **not** ok (that's a literal **string** assignment!)

**Execute the following cell**

In [1]:
import numpy as np

**Problem 1 (5 points):  Compute run data from list**

Given a list $L,$ we say a *run* of the value $v$ starts at position $i$ in the list if either

- $i=0$ and $L[i]=v$ or
- $i>0,$ $L[i]=v$ and $L[i-1]\neq v.$

when a run of the value $v$ starts at position $i$ we define the **length** of that run to be the **maximum** value of $m$ such that
$$
L[j]=v \mbox{ for } j=i,i+1,\ldots,i+m-1
$$

So, for example, in the list [1,1,1,2,3,3,1,1,2,1,2,5,5] if we consider runs of the value 1, there are 3 of them:

- starting at 0 and having length 3
- starting at 6 and having length 2
- starting at 9 and having length 1

Write a function *get_run_data* that takes as input the following arguments

- a list or 1-d numpy array **L**
- a value **v**

and whose output is a 

- a **list** of 2-tuples giving all pairs of the form $(p,m)$ where $p$ is the starting position of a run, and $m$ is its length.

If there are no runs of the value v the output should be an empty list.

In the example above, 

> get_run_data([1,1,1,2,3,3,1,1,2,1,2,5,5])

should return 

> [(0,3),(6,2),(9,1)].

We will refer to the length of the list you get in the output of the get_run_data program as the **number of runs** of the value $v$ in the input list.


**Important instructions**
- Your function should make use of Python **CONTROL FLOW** methods and **not any numpy functions.**
- Your code should work if the input is a list or a 1d numpy array
- Your code should be **self-contained** so if your code requires any extra user-defined functions put them **inside** the body of the get_run_data definition.
- Make sure you test your code on examples to ensure your code works correctly.
- An error in your code wil affect subsequent answers that rely on this code.
- The function definition for **get_run_data** should not appear in any other part of this notebook.

**Put your code for the function in the following cell.**

In [2]:
def get_run_data(L,v):

    #init
    res = []
    res_pair = tuple()
    point = 0
    num = 0
    #iterate
    for index in range(0, len(L) - 1):
        if L[index] == v:
            num += 1
        else:
            if L[index - 1] == v:
                point = abs(num - index)
                res_pair = (point, num)
                res.append(res_pair)
                num = 0
            else:
                continue  
    return res
    
get_run_data([1,1,1,2,3,3,1,1,2,1,2,5,5],1)
            
            

[(0, 3), (6, 2), (9, 1)]

**Problem 2 (5 points):**
    
If a biased coin with the probability of heads being 2/3 is flipped independently 250 times, let $N$ denote the number of runs of heads that occur. Use Monte-Carlo simulation with 100,000 trials and your get_run_data code to provide estimates for

- $E[N],$ the expected number of runs that occur, and call this **ENest**.
- $\sqrt{\mbox{Var}(N)},$ the standard deviation of $N$ and call this **SDNest**.
- a 95% confidence interval for the expected value of $N$ and call the lower and upper confidence bounds for this interval **LowerConfidenceBoundEN** and **UpperConfidenceBoundEN.**

Note - you should estimate the <u>standard deviation</u> of the number of runs using the <u>sample standard deviation</u>, which for a sequence of sample values $N_1,\ldots,N_n$ is defined as the quantity $S$ where

$$S^2=\sum_{i=0}^{n-1} (N_i - \overline{N})^2/(n-1) = {\sum_{i=0}^{n-1} N_i^2 - n \overline{N}^2 \over n-1}
$$

Be careful. This is not the same as the <u> standard error </u> of the estimator $\overline{N}$ of $E[N],$ which we can write as $S/\sqrt{n}.$ 
The standard error is an estimator of the quantity 

$\sqrt{\mbox{Var}(\overline{N})}$ 

Recall from basic statistics, that

$\sqrt{\mbox{Var}(\overline{N})}= \sqrt{\mbox{Var}(N)}/\sqrt{n}$ 

so while typically, $S$ gets closer and closer to the standard deviation
$\sqrt{\mbox{Var}(N)}$ as $n\rightarrow \infty,$ (typically a positive quantity)
the standard error approahes zero 
as $n\rightarrow \infty.$

Use the following cell for your code for this problem.

In [3]:
import random

#init
flip_times = 250
trials = 100000
ENest = 0
ENest_sum = 0
std_list = []

#biased coin simulate func
def biasedcoin():
    return random.choice(['H','T','H'])

#iterate
for i in range(trials):
    temp_list = []   
    for i in range(flip_times):
        res = biasedcoin()
        temp_list.append(res)
    #call the predefined func, return the length instead
    N_in_one_set = len(get_run_data(temp_list, 'H'))
    ENest_sum += N_in_one_set
    std_list.append(N_in_one_set)
ENest = ENest_sum / trials

#std
SDNest = np.std(std_list)

# confidence interval
LowerConfidenceBoundEN = ENest - 1.96*SDNest
UpperConfidenceBoundEN = ENest + 1.96*SDNest

print(ENest_sum)
print("ENest: = {0:7.5f}".format(ENest))   
print("SDNest: = {0:7.5f}".format(SDNest))   
print("95% Confidence Interval: = ({0:7.5f},{1:7.5f})".format(LowerConfidenceBoundEN,UpperConfidenceBoundEN))
            

5533001
ENest: = 55.33001
SDNest: = 4.30511
95% Confidence Interval: = (46.89200,63.76802)


**Answers for Problem 2**

In the following cell, assign **literal float values** to the variables 

- ENest
- SDNest
- LowerConfidenceBoundEN
- UpperConfidenceBoundEN

**Additional instructions - these apply to <u> all subsequent questions</u> below**

- Given that these are estimates you should round your answers. As a general rule of thumb **througout this notebook and throughout this course,** only include digits that you are **somewhat certain to be correct** (and please don't ask me to define what this means since I want you to use some judgement). 
- You should **not** assign values to these variables anywhere else in your notebook.
- In the work cell above you can print out values of variables using different names and copy those *literal** values in the right hand side below.

In [4]:
ENest=55.33001
SDNest=4.30511
LowerConfidenceBoundEN=46.89200
UpperConfidenceBoundEN=63.76802

**Problem 3 (5 points)**
    
If a biased coin with the probability of heads being 2/3 is flipped independently 250 times, let $M$ denote the length of the longest run of heads that occurs. Use Monte-Carlo simulation with 100,000 trials and your get_run_data code to provide estimates for

- $E[M],$ the expected length of the longest run that occur, and call this *EMest*.
- $\sqrt{\mbox{Var}(M)},$ the standard deviation of the longest run and call this *SDMest*.
- a 95% confidence interval for the expected value of $M$ and call the lower and upper confidence bounds for this interval *LowerConfidenceBoundEM* and *UpperConfidenceBoundEM.*

**Put your code for this problem in the following cell.**

In [5]:
#init
flip_times = 250
trials = 100000
EMest = 0
EMest_sum = 0
std_list = []

#biased coin simulate func
def biasedcoin():
    return random.choice(['H','T','H'])

#iterate
for i in range(trials):
    temp_list = []   
    for i in range(flip_times):
        res = biasedcoin()
        temp_list.append(res)
        #call the predefined func, return the length instead
    temp_res = get_run_data(temp_list, 'H')
    #extract the longest run for each 250 times set
    temp_longest_run_list = []
    for i in range(0, len(temp_res) - 1):
        temp_longest_run_list.append(temp_res[i][1])
    longest_run = max(temp_longest_run_list)
    #aggregation
    EMest_sum += longest_run
    std_list.append(longest_run)
EMest = EMest_sum / trials
#std
SDMest = np.std(std_list)
# confidence interval
LowerConfidenceBoundEN = EMest - 1.96*SDMest
UpperConfidenceBoundEN = EMest + 1.96*SDMest

print("EMest: = {0:7.5f}".format(EMest))   
print("SDMest: = {0:7.5f}".format(SDMest))   
print("95% Confidence Interval: = ({0:7.5f},{1:7.5f})".format(LowerConfidenceBoundEN,UpperConfidenceBoundEN))
            

1176890
EMest: = 11.76890
SDMest: = 3.07912
95% Confidence Interval: = (5.73383,17.80397)


**Answers for Problem 3**

In the following cell, assign **literal float values** to the variables 
- EMest
- SDMest
- LowerConfidenceBoundEM
- UpperConfidenceBoundEM

In [14]:
EMest=11.76890
SDMest=3.07912
LowerConfidenceBoundEM=5.73383
UpperConfidenceBoundEM=17.80397

**Problem 4 (5 points):**

Let $M$ denote the size of the longest run when 250 **fair** coins are flipped. Estimate the probability that $M \geq 10$ and get a 95% confidence interval for this probability based on 100,000 trials.

- call the estimate of the probability **pfairest** 
- call the confidence bounds **pfairlower** and **pfairupper**


**Put your code for this problem in the following cell**

In [7]:
#init
flip_times = 250
trials = 100000
pfairest = 0
M_10_count = 0
std_list = []


#biased coin simulate func
def biasedcoin():
    return random.choice(['H','T'])

#iterate
for i in range(trials):
    temp_list = []   
    for i in range(flip_times):
        res = biasedcoin()
        temp_list.append(res)
        #call the predefined func, return the length instead
    temp_res = get_run_data(temp_list, 'H')
    #extract the longest run for each 250 times set
    temp_longest_run_list = []
    for i in range(0, len(temp_res) - 1):
        temp_longest_run_list.append(temp_res[i][1])
    longest_run = max(temp_longest_run_list)
    #aggregation
    if longest_run >= 10:
        M_10_count += 1
        std_list.append(longest_run)
#prob
pfairest = M_10_count / trials

#std
M_std = np.std(std_list)
M_stderr = M_std / np.sqrt(len(std_list))

pfairlower = pfairest - 1.96*M_stderr
pfairupper = pfairest + 1.96*M_stderr
 
print("pfairest: = {0:7.5f}".format(pfairest))   
print("95% Confidence Interval: = ({0:7.5f},{1:7.5f})".format(pfairlower,pfairupper))
            

pfairest: = 0.10873
95% Confidence Interval: = (0.08152,0.13594)


**Answers to Problem 4**

In the following cell, assign **literal float values** to the variables

- pfairest
- pairlower
- pfairupper

In [13]:
pfairest=0.10873
pfairlower=0.08152
pfairupper=0.13594

**Problem 5 (5 points):**
Let $M$ denote the size of the longest run when 250 **biased** coins with the probability of heads being 2/3 are flipped. Estimate the probability that $M \geq 10$ and get a 95% confidence interval for this probability based on 100,000 trials.

- call the estimate of the probability **pbiasedest** 
- call the confidence bounds **pbiasedlower** and **pbiasedupper**


**Put your code for this problem in the following cell**

In [9]:
#init
flip_times = 250
trials = 100000
pfairest = 0
M_10_count = 0
std_list = []


#biased coin simulate func
def biasedcoin():
    return random.choice(['H', 'T', 'H'])

#iterate
for i in range(trials):
    temp_list = []   
    for i in range(flip_times):
        res = biasedcoin()
        temp_list.append(res)
        #call the predefined func, return the length instead
    temp_res = get_run_data(temp_list, 'H')
    #extract the longest run for each 250 times set
    temp_longest_run_list = []
    for i in range(0, len(temp_res) - 1):
        temp_longest_run_list.append(temp_res[i][1])
    longest_run = max(temp_longest_run_list)
    #aggregation
    if longest_run >= 10:
        M_10_count += 1
        std_list.append(longest_run)
#prob
pfairest = M_10_count / trials

#std
M_std = np.std(std_list)
M_stderr = M_std / np.sqrt(len(std_list))

pfairlower = pfairest - 1.96*M_stderr
pfairupper = pfairest + 1.96*M_stderr
 
print("pfairest: = {0:7.5f}".format(pfairest))   
print("95% Confidence Interval: = ({0:7.5f},{1:7.5f})".format(pfairlower,pfairupper))
            

pfairest: = 0.76194
95% Confidence Interval: = (0.74230,0.78158)


**Answers to Problem 5**

In the following cell, assign **literal float values** to the variables

- pbiasedest
- pbiasedlower
- pbiasedupper

In [12]:
pbiasedest=0.76194
pbiasedlower=0.74230
pbiasedupper=0.078158

**Final Instructions:**
1) save your notebook befor submitting it in Blackboard
2) do not zip your notebook 
3) you needn't change the name of your notebook as your jhed id is automatically attached to the name when you upload it
4) make sure you followed the rules about only assigning values to variables asked for at most a single time.