Power & Bayesian Inference

1) Power (Frequentist Hypothesis Testing)
* What is Power?
    * **power** of a hypothesis test is the probability that the test *correctly rejects the null hypothesis $(H_0)$*
        * which test do we like better?
            * test 1: $\alpha=0.05,$ power = $0.8$
            * test 2: $\alpha=0.05,$ power = $0.3$
    *  $\begin{align} \text{Power}
       & = P(\text{Reject } H_0 \mid  H_0 \text{ is false}) \\
       & = P(\text{Reject } H_0 \mid  H_A \text{ is true}) \\
       & = 1 - \text{Type II error} \\
       & = 1 - \beta 
    \end{align}$
    * what's the chance of rejecting the null when the null hypothesis is false?
        * in other words, what's the probability of detecting a real effect?
    * Illustrating power based on hypothesis test
![power_in_hypo](http://pip.ucalgary.ca/psyc-312/repeated-measures/within-subjects-designs/images/Power.png)
        1. Draw two curves based on $H_0$ (get $\mu_0$) and sample data ($\bar{x}$ and $s$)
        2. Define rejection region (color region until you hit $\alpha$)
        3. Calculate Type II error region, $\beta$, by computing area to the left of decision boundary
        4. Power is $1-\beta$
    * Factors influencing Power - what happens to power as we **increase** each of the following factors?
        1. $\uparrow \alpha$ 
        2. $\uparrow$ effect size (distance between two hypothesis means, $H_0$ and $H_1$)
        3. $\uparrow$ standard deviation, $s$
        4. $\uparrow$ sample size, $n$
    * Influence of Power analysis on Experimental Design
        * usually need to decide what *sample size* to collect
        * especially true if important to minimize number of samples due to:
            * painful or risky (e.g. new surgical procedure)
            * costly or expensive (e.g. paying test subjects)
        * *power analysis can allow you to determine the sample size needed to detect a particular effect*
* Calculating Power
    * example: One-sample Test of Mean
        * $H_0 : \mu = \mu_0$
        * $H_1 : \mu = \mu_1(> \mu_0)$
        * $\text{Power} = P(\text{Reject } H_0 \mid  H_0 \text{ is false})$
    * find the critical value, under $H_0$, beyond which we would reject $H_0$:
        * $X^* = \mu_0 + Z^* \frac{s}{\sqrt{n}}$
    * then, find the corresponding probability under $H_1$
        * $\text{Power} = P(X > X^*\mid H_1) = P(Z > \frac{X^*-\mu_1}{\frac{s}{\sqrt{n}}})$
    * power calculation steps:
![power_in_hypo_2](https://i.stack.imgur.com/R0ncP.png)
        1. decide the critical value for the test statistic (in general)
            * $Z^* = \pm 1.96$ for two-sided test
            * $Z^* = +1.64 \text{ or} -1.64$ for one-sided test
        2. calculate the corresponding value under the null distribution ($X^*$)
        3. find the tail probability of the above value under the alternative distribution (power)
* Calculating the sample size, $n$, for a given level of Power
    * what happens when we increase the sample size?
    * what if we do not know the true mean and want to collect a larger sample for the test?
    * requirements:
        * a mean associated with the null hypothesis $\mu_0$
        * estimate of the population mean $\mu_1$
        * estimate of the population standard deviation $\sigma$
        * a fixed significance level ($\alpha$, often 5%)
        * a desired power level ($1-\beta$, often 80%)
    * derive the value for $n$ from the power calculation:
        * $n = ((Z_{1-\text{power}}+Z^*)\frac{s}{\mu_1-\mu_0})^2$
        * both $Z$ should have the same sign
    * sample size calculation steps:
        1. obtain some sort of initial estimation of the parameter/effect we are trying to test (e.g. a pilot experiment)
        2. decide on the desired power of the test (e.g. power = 0.8)
        3. calculate the sample size using the initial estimation and desired power
* Relationship to A/B Testing
    * A/B Testing is essentially Two-Sample Hypothesis testing
    * in practice, we often conduct a small pilot experiment to estimate the sample size for a given power

2) Bayesian Inference
* Frequentist vs. Bayesian
    * Probability differences:
        * **Frequentist Probability** - "Long Run" frequency of an outcome
        * **Subjective Probability** - A measure of degree of belief
        * **Bayesian Probability** - combination of Frequentist and Subjective probability
    * Interpretation of experimental results:
        * Experiment 1: a fine classical musician says he is able to distinguish Haydn from Mozart. Small excerpts are selected at random and played for musician. Musician makes 10 correct guesses in exactly 10 trials
            * Frequentist: Very skilled musician; have confidence about musician's ability to distinguish Haydn from Mozart
            * Bayesian: Not certain, I have some prior confidence of musician's ability
        * Experiment 2: Drunk man he can correctly guess face of coin will fall down mid air. Coins are tossed and drunken man shouts out guesses while coins are mid air. Drunk man correctly guesses outcomes of the 10 throws
            * Frequentist: Very skilled drunk man; have confidence about drunk's ability to predict coin tosses
            * Bayesian: Not certain, I have some prior confidence about the drunk's ability
        * Experiment 3: Detector that rolls two dice to see if sun has gone nova. Will lie if rolls two sixes, otherwise tells the truth
            * Frequentist: Probability of this result happening by chance is $\frac{1}{36} = 0.027 < 0.05 (p)$, so conclusion is that the sun has exploded with 95% confidence
            * Bayesian: I don't believe that happened based on my prior confidence
* Prior, Likelihood, and Posterior Distributions
    * Update beliefs after considering new evidence
    * Probability as measure of believability in event
    * **A priori** - a prior belief distribution for outcomes
        * e.g. prior belief of musician's ability to distinguish Haydn from Mozart
    * Posterior distribution similarity to Bayes Rule
        * Bayes Rule: $P(A\mid B) = \frac{P(B\mid A)P(A)}{P(B\mid A)P(A)+P(B\mid A^c)P(A^c)}$
        * Posterior Distribution: $\pi(\theta\mid x) = \frac{f(x\mid \theta)\pi(\theta)}{f(x\mid \bar{\theta})\pi(\bar{\theta})d(\bar{\theta})}$
    * Bayesian Inference equation parts:
        * **Prior distribution** - describe our current (prior) knowledge about $\theta$ (or $A$). Can be subjective
        * **Likelihood** - distribution for the data (as a function of the parameter)
        * **Posterior distribution** - our updated knowledge about $\theta$ (or $A$) after seeing the data 
![bayesian_update](bayesian_updating_coin_toss.png)
* **Maximum a posteriori (MAP)** - mode of posterior distribution
    * For MAP, we assume a prior $g$ over $\Theta$, and go one step further to get the posterior
    * Simply get Posterior distribution using Bayes
    * MLE: $\hat{\theta}_{mle} = argmax_{\theta \in \Theta}f(x\mid \theta) = argmax_{\theta \in \Theta}$ $logL(\theta\mid x_1,\dots,x_n)$
    * MAP: $f(\theta\mid x)=\frac{f(x\mid \theta)g(\theta)}{\int_{\upsilon \in \Theta}f(x\mid \upsilon)g(\upsilon)d\upsilon}$
    * $\hat{\theta}_{map} = argmax_{\theta \in \Theta}$ $\frac{f(x\mid \theta)g(\theta)}{\int_{\upsilon \in \Theta}f(x\mid \upsilon)g(\upsilon)d\upsilon} = argmax_{\theta \in \Theta} f(x\mid \theta)g(\theta)$
![map](https://www.probabilitycourse.com/images/chapter9/MAP.png)
* Famous examples of Bayesian Inference:
    1. Monty Hall Problem - Assume that a room is equipped with three doors. Behind two are goats, and behind the third is a shiny new car. You are asked to pick a door, and will win whatever is behind it. Let's say you pick door 1. Before the door is opened, however, someone who knows what's behind the doors (Monty Hall) opens one of the other two doors, revealing a goat, and asks you if you wish to change your selection to the third door (i.e., the door which neither you picked nor he opened). The Monty Hall problem is deciding whether you do
        * Two choices are 50-50 when you know nothing about them
        * Monty helps us by “filtering” the bad choices on the other side. It’s a choice of a random guess and the “Champ door” that’s the best on the other side.
        * In general, more information means you re-evaluate your choices.
        * Similar problem: A Bayesian Filter improves as it gets more information about whether messages are spam or not. You don’t want to stay static with your initial training set of data.
    