# Interval Estimation
It is a common requirement to efficiently estimate population parameters based on simple random sample data. In the R tutorials of this section, we demonstrate how to compute the estimates. The steps are to be illustrated with a built-in data frame named survey. It is the outcome of a Statistics student survey in an Australian university.

The data set belongs to the MASS package, which has to be pre-loaded into the R workspace prior to use.

In [2]:
library(MASS)      # load the MASS package 
head(survey) 

Sex,Wr.Hnd,NW.Hnd,W.Hnd,Fold,Pulse,Clap,Exer,Smoke,Height,M.I,Age
Female,18.5,18.0,Right,R on L,92.0,Left,Some,Never,173.0,Metric,18.25
Male,19.5,20.5,Left,R on L,104.0,Left,,Regul,177.8,Imperial,17.583
Male,18.0,13.3,Right,L on R,87.0,Neither,,Occas,,,16.917
Male,18.8,18.9,Right,R on L,,Neither,,Never,160.0,Metric,20.333
Male,20.0,20.0,Right,Neither,35.0,Right,Some,Never,165.0,Metric,23.667
Female,18.0,17.7,Right,L on R,64.0,Right,Some,Never,172.72,Imperial,21.0


## Point Estimate of Population Mean
For any particular random sample, we can always compute its sample mean. Although most often it is not the actual population mean, it does serve as a good **point estimate**. For example, in the data set survey, the survey is performed on a sample of the student population. We can compute the sample mean and use it as an estimate of the corresponding population parameter.

**Problem**

Find a point estimate of mean university student height with the sample data from survey.

**Solution**

For convenience, we begin with saving the survey data of student heights in a variable height.survey.

In [3]:
height.survey = survey$Height


It turns out not all students have answered the question, and we must filter out the missing values. Hence we apply the mean function with the "na.rm" argument as TRUE.

In [4]:
mean(height.survey, na.rm=TRUE)  # skip missing values 

**Answer**

A point estimate of the mean student height is 172.38 centimeters.

## Interval Estimate of Population Mean with Known Variance

**Problem**

Assume the population standard deviation σ of the student height in survey is 9.48. Find the margin of error and interval estimate at 95% confidence level.

**Solution**

We first filter out missing values in survey$Height with the na.omit function, and save it in height.response.

In [5]:
height.response = na.omit(survey$Height)

Then we compute the standard error of the mean.

In [7]:
n = length(height.response)
sigma = 9.48                   # population standard deviation 
sem = sigma/sqrt(n); sem       # standard error of the mean 

Since there are two tails of the normal distribution, the 95% confidence level would imply the 97.5th percentile of the normal distribution at the upper tail. Therefore, z<sub>α∕2</sub> is given by qnorm(.975). We multiply it with the standard error of the mean sem and get the margin of error.

In [10]:
E = qnorm(.975)*sem
E                      # margin of error

We then add it up with the sample mean, and find the confidence interval as told.

In [14]:
xbar = mean(height.response)   # sample mean 
xbar + c(-E, E)

**Answer**

Assuming the population standard deviation σ being 9.48, the margin of error for the student height survey at 95% confidence level is 1.2852 centimeters. The confidence interval is between 171.10 and 173.67 centimeters.

**Alternative Solution**

Instead of using the textbook formula, we can apply the z.test function in the TeachingDemos package. It is not a core R package, and must be installed and loaded into the workspace beforehand.

In [16]:
library(TeachingDemos)     # load TeachingDemos package 
z.test(height.response, sd=sigma)


	One Sample z-test

data:  height.response
z = 262.88, n = 209.00000, Std. Dev. = 9.48000, Std. Dev. of the sample
mean = 0.65575, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 171.0956 173.6661
sample estimates:
mean of height.response 
               172.3809 


## Interval Estimate of Population Mean with Unknown Variance

**Problem**

Without assuming the population standard deviation of the student height in survey, find the margin of error and interval estimate at 95% confidence level.

**Solution**

We first filter out missing values in survey$Height with the na.omit function, and save it in height.response.

Then we compute the sample standard deviation.

In [17]:
s = sd(height.response)        # sample standard deviation 
SE = s/sqrt(n); SE             # standard error estimate 

Since there are two tails of the Student t distribution, the 95% confidence level would imply the 97.5th percentile of the Student t distribution at the upper tail. Therefore, t<sub>α∕2</sub> is given by qt(.975, df=n-1). We multiply it with the standard error estimate SE and get the margin of error.

In [26]:
E =qt(.975, df=n-1)*SE
E                            # margin of error 

We then add it up with the sample mean, and find the confidence interval.

In [30]:
xbar = mean(height.response)   # sample mean 
xbar + c(-E,E)

**Answer**

Without assumption on the population standard deviation, the margin of error for the student height survey at 95% confidence level is 1.3429 centimeters. The confidence interval is between 171.04 and 173.72 centimeters.

**Alternative Solution**

Instead of using the textbook formula, we can apply the t.test function in the built-in stats package.

In [31]:
t.test(height.response)


	One Sample t-test

data:  height.response
t = 253.07, df = 208, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 171.0380 173.7237
sample estimates:
mean of x 
 172.3809 


## Sampling Size of Population Mean

**Problem**

Assume the population standard deviation σ of the student height in survey is 9.48. Find the sample size needed to achieve a 1.2 centimeters margin of error at 95% confidence level.

**Solution**

Since there are two tails of the normal distribution, the 95% confidence level would imply the 97.5th percentile of the normal distribution at the upper tail. Therefore, z<sub>α∕2</sub> is given by qnorm(.975).

In [3]:
zstar = qnorm(.975) 
sigma = 9.48 
E = 1.2 
zstar^2 * sigma^2/ E^2

**Answer**

Based on the assumption of population standard deviation being 9.48, it needs a sample size of 240 to achieve a 1.2 centimeters margin of error at 95% confidence level.

## Point Estimate of Population Proportion
Multiple choice questionnaires in a survey are often used to determine the the proportion of a population with certain characteristic. For example, we can estimate the proportion of female students in the university based on the result in the sample data set *survey.*

**Problem**
Find a point estimate of the female student proportion from survey.

**Solution**
We first filter out missing values in survey$Sex with the na.omit function, and save it in gender.response.

In [1]:
library(MASS)                  # load the MASS package 
gender.response = na.omit(survey$Sex) 
n = length(gender.response)    # valid responses count

To find out the number of female students, we compare gender.response with the factor ’Female’, and compute the sum. Dividing it by n gives the female student proportion in the sample survey.

In [2]:
k = sum(gender.response == "Female") 
pbar = k/n; pbar

**Answer**

The point estimate of the female student proportion in survey is 50%.

## Interval Estimate of Population Proportion
After we found a point sample estimate of the population proportion, we would need to **estimate its confidence interval.**

**Problem**

Compute the margin of error and estimate interval for the female students proportion in survey at 95% confidence level.

**Solution**

We first determine the proportion point estimate. Further details can be found in the previous tutorial.

Then we estimate the standard error.

In [4]:
SE = sqrt(pbar*(1-pbar)/n);
SE                    # standard error 

Since there are two tails of the normal distribution, the 95% confidence level would imply the 97.5th percentile of the normal distribution at the upper tail. Therefore, z<sub>α∕2</sub> is given by qnorm(.975). Hence we multiply it with the standard error estimate SE and compute the margin of error.

In [7]:
E=qnorm(.975)*SE
E              # margin of error 

Combining it with the sample proportion, we obtain the confidence interval.

In [9]:
pbar + c(-E,E)

**Answer**

At 95% confidence level, between 43.6% and 56.3% of the university students are female, and the margin of error is 6.4%.

**Alternative Solution**

Instead of using the textbook formula, we can apply the prop.test function in the built-in stats package.

In [10]:
prop.test(k, n)


	1-sample proportions test without continuity correction

data:  k out of n, null probability 0.5
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.4367215 0.5632785
sample estimates:
  p 
0.5 


## Sampling Size of Population Proportion

**Problem**

Using a 50% planned proportion estimate, find the sample size needed to achieve 5% margin of error for the female student survey at 95% confidence level.

**Solution**

Since there are two tails of the normal distribution, the 95% confidence level would imply the 97.5th percentile of the normal distribution at the upper tail. Therefore, z<sub>α∕2</sub> is given by qnorm(.975).