In [1]:
"Library Import"

import numpy as np
import pandas as pd 
import xarray as xr
import matplotlib.pyplot as plt

### 1) Basic Statistics

#### a) Bayes Theorem. Assume background rates of COVID are 90% negative, 10% positive AND COVID tests are accurate 80% of the time, but fail 20% of the time. Your friend goes and gets a COVID test. Your friend test negative. What is the probability that your friend is actually negative? Explain to your friend how you are using Bayes theorem to informyour thinking. Hint: Review Lecture #1 and the 1.2.2.2 of the Barnes Notes. (10 points)

From the lecture, Bayes theorem take <b><i>Pr(A|B)</b></i>, the probability that the test is accurate (test is negative <u> and you are negative</u>, and turns it into <b><i>Pr(B|A)</b></i>, the probability that the test had failed test is negative, <u>but you are actually positive</u>).
    
In the context of this question, the background and test reliability information is 
    
- P(N): % negative population, 90%
- P(T): % positive pouplation, 10%
- P(T|N): probability of accurate test, 80%
- P(N|T): probability of failed test, 20%

We want to know the probability of actually being negative given the test's reliability of saying you are negative, P(N|T).
    
Application of Bayes Theorem describes the expression for <b><i>P(N|T)</b></i> in the context of this test as:

$$P(N|T) = \frac{P(T|N)P(N)}{P(T|N)P(N)+P(T|P)P(P)}$$

Where the numerator is the test is correct and the denominator is all possible outcomes. 

In [12]:
def Bayes(P_BA, P_B, P_AB,P_A):
    """
    Calculates basic Bayes theorem.
    Input:
        - P_BA: P(B|A), the probability of A declares B occured
        - P_B: P(B), the probability B occuring
        - P_AB: P(A|B), the probability of A inaccurately declares B occured
        - P_A: P(A), the probability of A occuring
    """
    test_is_correct = P_BA*P_B
    all_poss_outcomes = (P_BA*P_B)+(P_AB*P_A)
    return test_is_correct/all_poss_outcomes

#Background Information
P_N = 0.9 # Percent of negative population
P_P = 0.1 # Percent of positive population

#COVID Test Reliability
P_TN = 0.80 # Probabiilty test is accurate (Test is Neg and You are Neg)
P_TP = 0.20 # Probability test is incorrect (Test is Neg but you are Pos)

P_NT = Bayes(P_TN,P_N, P_TP, P_P)
print('The probability that this friend is actually negative is: ' + str(np.round(P_NT,2)*100)+'%')

The probability that this friend is actually negative is: 97.0%


To explain this to a friend, I would list the background and COVID Test reliability probabilities. Given those probabilities, a realistic view of the test is dependent given how many have/don't have COVID and the accuracy of the test themselves. Like in easier and more relatable probability calculations we did in grade school (ex: likeliness of drawing a specific card from a deck followed by a second specific card from a deck) I would compare Bayes theorem to that practice. The given percentages of background and COVID test reliability influence the true outcome of something happening. In this case, given that the majority of the population is negative, the test will naturally be more reliable as we are looking for test accuracy given that most people are likely negative. Since the accuracy of the test is relatively consistent, I would futher mention that the changes of population COVID contraction will influence the accuracy of the test given by the sheer chance of COVID exposure and subsequent likeliness of contracting it themselves.

#### b) Explain how to test whether a sample mean is significantly different than zero at the 95% confidence level and the 99% confidence level. State each of the 5 steps in hypothesis testing that you are using. For step 4, calculate the specific critical value assuming a two-tailed test. Contrast your approach for a sample with 15 independentobservations (N=15) and a sample 1000 independent observations (N=1000). (15 points)

For N>30, we need to use a t-test and for N>30 observations, we should use a z-test. Before implementing a z-test, we need to look at the distribution of observations to ensure there appears to be a normal, standaridzed distribution. We should note that with large enough N, both the z and t test will converge. By defining a variable, $z$:


where $\mu = mean$, $\sigma = 1$, and x is an evenly distributed range of values to plot against. The standard normal distribution allows us to accurately determine the probability of a sample mean occuring at specific intervals defined by the standard deviation (SD). A sample mean +/- 1SD of the population mean indicates a 31.73% chance that z falls outside of the mean. In other words there is a 68.27% confidence interval that a sample mean will fall within what is expected in the population mean:

<i> Confidence Intervals </i>
- +/- 1SD: 68.27%
- +/- 2SD: 95.45%
- +/- 3SD: 99.73%

<b>The five steps in hypothesis testing are as follows: </b>
1) State the significance level, $\alpha$

- We will do two tests, at the 95% and 99% confidence interval.     
    
2) State the null and alternative hypothesis.

- $H_0$: The sample mean <b>does not differ</b> from the population mean.
    
- $H_1$: The sample mean <b>differs</b> from the population mean.
    
3) State the statistic to be used, and the assumptions req to use it

- The statistic will be randomly generated "independent observations" for N=15 and a separate test for N=1000.
    
4) State the critical region

- For the T-Test (N=15), we calculate the degrees of freedom:
    $$
    \mu = N-1 = 15-1 = 14
    $$
    
- For the Z-Test (N>30), 


Using a 2-sided t-test(before we looked at the data, we didn't know what to expect from El Nino and La Nina years) we must have the following relationship to reject the null hypothesis. 
    
    With knowledge that N=16, $\nu = 16-1 = 15$
    
    The critical value can be found via lookup table or with a python function. 

<i> But because we are bootstrapping, the critical region is stated as: </i>

    The el nino means are not within the 95% confidence interval within the bootstrap means. 

5) Evaluate the statistic and state conclusion.

### Z-Test

#### STEP 1: For the Z-Test, we first need to calculate the sample mean and sample standard error:
$$
X: N=1000 \\
\mu_{\bar{x}}=\mu = mean \\
\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{N}}
$$

#### STEP 2: Next we will calculate the z-statistic, which is the number of standard errors that the sample mean deviates from the population mean by using the below expression for z:

$$
z = \frac{{\bar{X}}-{\mu_\bar{X}}}{\sigma_\bar{X}} = \frac{{\bar{X}}-{\mu}}{\frac{\sigma}{\sqrt{N}}} \\

\bar{X}: {mean}_{population}
$$


In [14]:
100-95.45

4.549999999999997