In [4]:
library(MASS)

In [5]:
head(survey)

Sex,Wr.Hnd,NW.Hnd,W.Hnd,Fold,Pulse,Clap,Exer,Smoke,Height,M.I,Age
Female,18.5,18.0,Right,R on L,92.0,Left,Some,Never,173.0,Metric,18.25
Male,19.5,20.5,Left,R on L,104.0,Left,,Regul,177.8,Imperial,17.583
Male,18.0,13.3,Right,L on R,87.0,Neither,,Occas,,,16.917
Male,18.8,18.9,Right,R on L,,Neither,,Never,160.0,Metric,20.333
Male,20.0,20.0,Right,Neither,35.0,Right,Some,Never,165.0,Metric,23.667
Female,18.0,17.7,Right,L on R,64.0,Right,Some,Never,172.72,Imperial,21.0


# Point Estimate of Population Mean

For any particular random sample, we can always compute its sample mean. Although most often it is not the actual population mean, it does serve as a good _point estimate_.

In [6]:
height.survey = survey$Height

In [7]:
mean(height.survey, na.rm=TRUE)

# Interval Estimate of Population Mean with Known Variance

After we found the point estimate of the population mean, we would need to quantify its accuracy. Here, we discuss the case where the population variance $\sigma^2$ is assumed known.

Let us denote the $100(1-\alpha/2)$ percentile of the standard normal distribution as $z_{\alpha/2}$. For random sample of sufficiently large size, the end points of the _interval estimate_ at $(1-\alpha)$ confidence leven is given as follows:

$$ \bar{x} \pm z_{\alpha/2}\dfrac{\sigma}{\sqrt n} $$

We first filter out missing values in `survey$Height` with the `na.omit` function.

In [9]:
height.response = na.omit(survey$Height)

Then we compute the standard error of the mean.

In [10]:
n = length(height.response)
sigma = 9.48
sem = sigma/sqrt(n)

Since there are two tails of the normal distribution, the 95% confidence level would imply the 97.5th percentile of the normal distribution upper tail. Therefore, $z_{\alpha/2}$ is given by `qnorm(.975)`. We multiply it with the standard error of the mean `sem` and get the margin of error.

In [12]:
E = qnorm(.975)*sem

We then add it up with the sample mean, and find the confidence interval as told.

In [15]:
xbar = mean(height.response)
xbar + c(-E, E)

Assuming the population standard deviation $\sigma$ being $9.48$, the margin of error for the student height survey at $95\%$ confidence level is $1.2852$. The confidence interval is between $171.10$ and $173.67$ centimeters.

# Interval Estimate of Population Mean with Unknown Variance

In [57]:
height.response = na.omit(survey$Height)

In [58]:
n = length(height.response)
s = sd(height.response)
SE = s/sqrt(n)

In [59]:
E = qt(.975, df=n-1)*SE

In [60]:
xbar = mean(height.response)
xbar + c(-E, E)

In [61]:
t.test(height.response)


	One Sample t-test

data:  height.response
t = 253.07, df = 208, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 171.0380 173.7237
sample estimates:
mean of x 
 172.3809 


# Sampling Size of Population Mean

In [62]:
zstar = qnorm(.975)

In [63]:
sigma = 9.48

In [64]:
E = 1.2

In [65]:
zstar^2 * sigma^2 / E^2

# Point Estimate of Population Proportion

In [66]:
gender.response = na.omit(survey$Sex)

In [67]:
n = length(gender.response)

In [68]:
k = sum(gender.response == "Female")

In [69]:
pbar = k/n

In [70]:
pbar

# Interval Estimate of Population Proportion

In [71]:
SE = sqrt(pbar*(1-pbar)/n)

In [72]:
SE

In [73]:
E = qnorm(.975)*SE

In [74]:
pbar + c(-E, E)

In [75]:
prop.test(k, n)


	1-sample proportions test without continuity correction

data:  k out of n, null probability 0.5
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.4367215 0.5632785
sample estimates:
  p 
0.5 


# Sampling Size of Population Proportion

In [76]:
zstar = qnorm(.975)
p = 0.5
E = 0.05
zstar^2 * p * (1-p) / E^2