# Confidence Interval

## Confidence Interval (for a Mean)

$$ 
ME = z^\star \frac{s}{\sqrt{n}}
$$

In [40]:
# 95% CI = mu +/- 1.96 se
qnorm((1-0.95)/2)

In [41]:
# 98% CI = mu +/- 2.32 se
qnorm((1-0.98)/2)

In [42]:
# 99% CI = mu +/- 2.58 se
qnorm((1-0.99)/2)

## Accuracy vs Precision
* Commonly used CI are 90%, 95%, 98%, and 99%.
* A wider interval (higher CI) indicates a higher probability of capturing the true polulation, which increases the accuracy, but decreases the precision.
* The way to get both a higher precision and higher accuracy is to increase the sample size, as it shrinks the standard error and margin of error.

**Example**
* The General Social Survey (GSS) is a sociological survey used to collect data on demographic characteristics and attitudes of residents of the United States. 
* In 2010, the survey collected responses from 1,154 US residents. Based on the survey results, a 95% confidence interval for the average number of hours Americans have to relax or pursue activities that you enjoy after an average work day is 3.53 to 3.83 hours.

In [43]:
# sample mean
3.53 + (3.83-3.53)/2

In [44]:
# standard error
(3.83-3.53)/2/1.96

In [45]:
# margin of error
(3.83-3.53)/2

## Required Sample Size for Margin of Error (ME)
* All else held constant, as sample size increases, the margin of error decreases.

$$ 
n = ( \frac{z^\star s}{ME} )^2
$$

**Example**

* Suppose a group of researchers want to test the possible effect of an epilepsy medication taken by pregnant mothers on the cognitive development of their children. As evidence, they want to estimate the IQs of three-year-old children born to mothers who were on this medication during their pregnancy.
* Previous studies suggest that the standard deviation of IQ scores of three-year-old children is 18 points. 

_How many such children should the researches sample in order to obtain a 90% confidence interval with a margin of error less than or equal to four points?_

In [58]:
me <- 4  
ci <- 0.9  
sd <- 18 
z <- qnorm((1-ci)/2)

(n <- ((1.64 * sd)/me)^2)
ceiling(n)

_How would the required sample size change if we want to further decrease the margin of error, to two points?_

$$ 
\frac{1}{x} ME = z^\star \frac{s}{\sqrt{n}} \frac{1}{x}
\\
\frac{1}{x} ME = z^\star \frac{s}{\sqrt{n x^2}} 
$$

In [60]:
me <- 2
(n <- ((1.64 * sd)/me)^2)
ceiling(n)

**Example**

* A sample of 50 college students were asked, how many exclusive relationships they've been in so far? 
* The students in the sample had an average of 3.2 exclusive relationships, with a standard deviation of 1.74.
* In addition, the same distribution was only slightly skewed to the right. 

_Estimate the true number of exclusive relationships based on this sample using a 95% confidence interval._

In [77]:
n <- 50  
mu <- 3.2  
sd <- 1.74  

ci <- 0.95
z <- abs(round(qnorm((1-ci)/2), 2))

se <- sd/sqrt(n)

me <- z * se

# 1.96 * 1.74/sqrt(50)

mu - me
mu + me

_What is the correct calculation of the 98% confidence interval for the average number of exclusive relationships college students on average have been in?_

In [78]:
ci <- 0.98
z <- abs(round(qnorm((1-ci)/2), 2))
me <- z * se

# 2.33 * 1.74/sqrt(50)

mu - me
mu + me