# 5.1

Want to know one of four things when making probabilistic statements about a distribution:
- Density (pdf) at particular value (dname)
- Distribution (cfd) as a particular value (pname)
- Quantile value corresponding to a particular probability (qname)
- Random draw of values from a particular distribution (rname)

name in the above R functions symbolize the name of the distribution (i.e. dnorm). To calculate the value of the pdf at $x = 3$ (height of curve at $x = 3$) use:

In [1]:
dnorm( x = 3, mean = 2, sd = 5 )

To calculate the value of the cdf at $x = 3$ (that is $P(X \le 3)$ ), the probability that $X$ is less than or equal to $3$ use:

In [2]:
pnorm( q = 3, mean = 2, sd = 5 )

Or to calculate the quantile for probability $0.975$ use:

In [4]:
qnorm( p = 0.975, mean = 2, sd = 5 )

To generate a random sample of size $n = 10$ use:

In [5]:
rnorm( n = 10, mean = 2, sd = 5 )

Other name beyond norm include:
- *binom
- *t
- *pois
- *f
- *chisq

where the * can be d, p, q, and r. Example of getting probability of flipping a coin $10$ times and seeing $6$ heads given the probability of heads is $0.75$ use:

In [6]:
dbinom( x = 6, size = 10, prob = 0.75 )

Binomial Distribution Formula:

$${N \choose y}\ \theta^y\ ( 1 - \theta )^{ N - y }$$

Or formally $P( Y = 6 )$ if $Y \sim b( n = 10, p = 0.75 )$

In [27]:
# Can also do choose( 10, 6 )
n_choose_y <- ( 10 * 9 * 8 * 7 * 6 * 5 ) / ( 6 * 5 * 4 * 3 * 2 * 1 )
theta      <- 0.75
n          <- 10
y          <- 6

n_choose_y * theta ^ y * ( 1 - theta ) ^ ( n - y )

# 5.2

#### Hypothesis Testing

##### One Sample t-Test

Suppose $x_i \sim N( \mu, \sigma^2 )$ and we want to test $H_0: \mu = \mu_0$ versus $H_1: \mu \neq \mu_1$. If we assume $\sigma$ is unknown, we use one-sample $t$ statistic:

$$t = \frac{x - \mu_0}{\frac{s}{\sqrt(n)}} \sim t_{n - 1}$$

where

$$x = \frac{\sum_{i = 1}^n\ x_i}{n}$$

and

$$s = \sqrt(\frac{1}{n - 1}\ sum_{i = 1}^ n\ (x_i - x)^2)$$

A $100(1 - \alpha)\%$ confidence interval for $\mu$ is given by:

$$x \pm t_{n - 1}( \frac{\alpha}{2} ) \frac{s}{\sqrt(n)}$$

where $t_{n - 1(\frac{\alpha}{2})}$ is the critical value such that:

$$P(t > t_{n - 1}( \frac{\alpha}{2} )) = \frac{\alpha}{2}$$ 

for $n -1$ degrees of freedom.

#### Example

Suppose a grocery store sells 16 ounce boxes of cereal. A random sample of 9 boxes is taken and weighed:

In [28]:
cereal <- data.frame( weight = c( 15.5, 16.2, 16.1, 15.8, 15.6, 16.0, 15.8, 15.9, 16.2 ) )

The claim is a boxe weighs at least 16 ounces. Assume the weight is normally distributed and use a $0.05$ level of significance to test the claim. So:  
- $H_0: \mu \ge 16$
- $H_1: \mu < 16$

In [29]:
x_bar <- mean( cereal$weight )
s     <- sd( cereal$weight )
mu_0  <- 16
n     <- 9

t <- ( x_bar - mu_0 ) / ( s / sqrt( n ) )
t

Under the null hypothesis the test statistic has a $t$ distribution with $n - 1$ degrees of freedom, which is 8 in this case. Let's get the p-value of the test. Since this is a one-sided test with a less-than alternative,, we need the area to the left of $-1.2$ for a $t$ distribution with $8$ degrees of freedom:

$$P( t_8 < -1.2 )$$

In [30]:
pt( t, df = n - 1 )

The p-value is greater than our significance level of $0.05$, so we fail to reject the null hypothesis. A more condensed way to run the test in R is as follows:

In [31]:
t.test( x = cereal, mu = 16, alternative = c('less'), conf.level = 0.95 )


	One Sample t-test

data:  cereal
t = -1.2, df = 8, p-value = 0.1322
alternative hypothesis: true mean is less than 16
95 percent confidence interval:
     -Inf 16.05496
sample estimates:
mean of x 
     15.9 


In [33]:
# For Two Sided Test
cereal_results <- t.test( x = cereal, mu = 16, alternative = c('two.sided'), conf.level = 0.95 )

In [34]:
names( cereal_results )

In [35]:
cereal_results$conf.int

Let us check this by hand, but first we need to get the critical value:

$$t_{n - 1}(\frac{\alpha}{2} ) = t_8( 0.025 )$$ 

In [36]:
qt( 0.975, df = 8 )

Now plug into formula:

$$x \pm t_{n - 1}( \frac{\alpha}{2} ) \frac{s}{\sqrt(n)}$$

In [42]:
c( 
    mean( cereal$weight ) - qt( 0.975, df = 8 ) * sd( cereal$weight ) / sqrt(9), 
    mean( cereal$weight ) + qt( 0.975, df = 8 ) * sd( cereal$weight ) / sqrt(9),
)

ERROR: Error in c(mean(cereal$weight) - qt(0.975, df = 8) * sd(cereal$weight)/sqrt(9), : argument 3 is empty
