# One-, Two- Sample Estimation of the Mean 

## Estimation Problems

### Problem 1

9.10) A random sample of 12 graduates of a certain secretarial school typed an average of 79.3 words per minute with a standard deviation of 7.8 words per minute. Assuming a normal distribution for the number of words typed per minute, find a 95% confidence interval for the average number of words typed by all graduates of this school.

In [30]:
n <- 12 # n = "size" = graduate_students
dof<- (n - 1) # degrees of freedom
sample_mean <- 79.3 # words per minute, "x bar"
s <- 7.8 # sample standard deviation "sigma" = "S"
confidence_level <- .95 #95% confidence level interval

# Set the confidence level
alpha <- (1 - confidence_level)

# Calculate the critical t-value
t_value <- qt(1 - alpha/2, dof)
cat("t-value: ", t_value, "\n")

# Calculate Margin of Error with the formula for the confidence interval
margin_of_error <- t_value * (s / sqrt(n))
cat("Margin of Error: ", margin_of_error, "\n")
cat("95% Confidence Interval : [ ", sample_mean - margin_of_error, " , ", sample_mean + margin_of_error , " ]")

t-value:  2.200985 
Margin of Error:  4.955884 
95% Confidence Interval : [  74.34412  ,  84.25588  ]

### Problem 2

9.13) A random sample of 12 shearing pins is taken in a study of the Rockwell hardness of the pin head. Measurements on the Rockwell hardness are made for each of the 12, yielding an average value of 48.50 with a sample standard deviation of 1.5. Assuming the measurements to be normally distributed, construct a 90% confidence interval for the mean Rockwell hardness.

In [8]:
n <- 12 # random sample of shearing pins
dof<- (n - 1) # degrees of freedom
sample_mean <- 48.50 # average value
s <- 1.5 # sample standard deviation "sigma" = "S"
confidence_level <- .90 #90% confidence level interval

# Set the confidence level
alpha <- (1 - confidence_level)

# Calculate the critical t-value
t_value <- qt(1 - alpha/2, dof)
cat("t-value: ", t_value, "\n")

# Calculate Margin of Error with the formula for the confidence interval
margin_of_error <- t_value * (s / sqrt(n))
cat("Margin of Error: ", margin_of_error, "\n")
cat("90% Confidence Interval : [ ", sample_mean - margin_of_error, " , ", sample_mean + margin_of_error , " ]")

t-value:  1.795885 
Margin of Error:  0.7776409 
90% Confidence Interval : [  47.72236  ,  49.27764  ]

### Problem 3

9.19) A random sample of 25 tablets of buffered aspirin contains, on average, 325.05 mg of aspirin per tablet, with a standard deviation of 0.5 mg. Find the 95% tolerance limits that will contain 90% of the tablet contents for this brand of buffered aspirin. Assume that the aspirin content is normally distributed.

In [9]:
n <- 25 # random sample of tablets of buffered aspirin
dof<- (n - 1) # degrees of freedom
sample_mean <- 325.05 # average value, 325.05 mg of aspirin per tablet
s <- 0.5 # sample standard deviation "sigma" = "S", of 0.5 mg 
lambda <- .95 #95% confidence level interval
p <- .90 # Proportion of the population p = 0.90

# Set the confidence level
alpha <- (1 - lambda)
# z-value corresponding to the cumulative probability p
z_value <- qnorm(p)
cat("z_value: ", z_value, "\n")

# chi-square value with n−1 degrees of freedom for the confidence level "lambda"
chi_squared <-  qchisq(lambda, dof)
cat("Chi-Squared value: ", chi_squared, "\n")

z_value:  1.281552 
Chi-Squared value:  36.41503 


In [10]:
# k = tolerance factor
tolerance_level <- (z_value * sqrt(1 + (1/n))) / sqrt(chi_squared / dof)

# calculate tolerance interval
lower_bound <- sample_mean - (tolerance_level * s)
upper_bound <- sample_mean + (tolerance_level * s)

cat("95% Confidence Interval : [ ", lower_bound, " , ", upper_bound , " ]")

95% Confidence Interval : [  324.5195  ,  325.5805  ]

### Problem 4

9.28) In Section 9.3, we emphasized the notion of “most efficient estimator” by comparing the variance of two unbiased estimators ˆ Θ sub 1 and ˆ Θ sub 2. However, this does not take into account bias in case one or both estimators are not unbiased. Consider the quantity 

> MSE=E(ˆ Θ−θ),

where MSE denotes mean squared error. The MSE is often used to compare two estimators ˆ Θ1 and ˆ Θ2 of θ when either or both is unbiased because (i) it is intuitively reasonable and (ii) it accounts for bias. Show that MSE can be written

> MSE=E[ˆ Θ−E(ˆ Θ)]^2 +[E(ˆ Θ−θ)]^2
>
> =Var(ˆ Θ) +[Bias(ˆ Θ)]^2


**Solution:**

^Θ is referred to as "big theta"; Θ is referred to as "theta"; E is Expected Value

Step 1: Definition of Mean Squared Error, <b>MSE(big theta) = E[(big theta - theta)^2]</b>

Step 2: Introduced expected value of the estimator E(big theta) in squared term
> MSE(big theta) = E[(big theta - E(big theta) + E(big theta) - theta)^2]

Step 3: Expanding the Squared Term
> (big theta - theta)^2 = (theta - E(big theta) + E(big theta) - theta)^2
> 
> (big theta - theta)^2 = (big theta - E(big theta))^2 + 2(big theta - E(big theta))*(E(big theta) - theta) + (E(big theta) - theta)^2

Step 4: Taking the Expectation for each term separately 
> E(big theta - theta)^2 = E(big theta - E(big theta))^2 + E[2(big theta - E(big theta))*(E(big theta) - theta)] + E[(E(big theta) - theta)^2]

Step 5: Simplify each term
> "first term" E(big theta - E(big theta))^2 = VAR(big theta)
> 
> "second term" E[2(big theta - E(big theta))*(E(big theta) - theta)] = 0
> 
> > <b>Justification</b> => E(big theta - E(big theta)) = 0 ; E(big theta) - theta = k
> > 
> "third term" E(big theta) - theta = k thus, E[(E(big theta) - theta)^2] = (E(big theta) - theta)^2

Step 6: Combine Term Results
> MSE(big theta) = Var(big theta) + (E(big theta) - theta)^2

Conclusion
> MSE(big theta) = Var(big theta) + Bias(big theta)^2
> 
> where the bias "big theta" is defined as => Bias(big theta) = E(big theta) - theta

### Problem 5

The following data represent the length of time, in days, to recovery for patients randomly treated with one of two medications to clear up severe bladder infections:

![image.png](attachment:3b272b92-4b06-40aa-8ccb-3509a384ff38.png)


- Medication 1 : n sub 1 is 14, x bar 1 is 17, s to the power of 2 sub 1 is 1.5
- Medication 2 : n sub 2 is 16, x bar 2 is 19, s to the power of 2 sub 2 is 1.8

Find a 99% confidence interval for the difference μ2−μ1 in the mean recovery times for the two medications, assuming normal populations with equal variances.

In [11]:
# Medication 1
n_1 <- 14 # n sub 1 is 14
sample_mean_1 <- 17 # x bar 1 is 17
s_1 <- 1.5 # s to the power of 2 sub 1 is 1.5 

# Medication 2
n_2 <- 16 # n sub 2 is 16
sample_mean_2 <- 19 # x bar 2 is 19
s_2 <- 1.8 #s to the power of 2 sub 2 is 1.8

In [12]:
# Pooled Variance calculation
pooled_variance <- (((n_1 - 1) * s_1 ) + ((n_2 - 1) * s_2)) / (n_1 + n_2 - 2) # = s squared sub p
cat("Pooled Variance: ", pooled_variance, "\n")

# Standard Error, (SE)
standard_error <- sqrt(pooled_variance * ((1 / n_1) + (1 / n_2)))# Difference in means = "SE"
cat("Standard Error (SE): ", standard_error, "\n")

# Margin of Error, (ME) = (t sub alpha/2) * SE 
alpha <- 0.01 # since we want the 99% confidence interval
dof <- n_1 + n_2 - 2 # degrees of freedom 
t_value <- qt(1 - alpha/2, dof) # get t value since n <30

margin_of_error <- t_value * standard_error

cat("Critical T Value: ", t_value, "\n")
cat("Margin of Error (ME): ", margin_of_error, "\n")

Pooled Variance:  1.660714 
Standard Error (SE):  0.4716112 
Critical T Value:  2.763262 
Margin of Error (ME):  1.303185 


In [13]:
# Confidence Interval (mu sub 1 - mu sub 2) +- ME 
mean_combined <- sample_mean_2 - sample_mean_1

lower_bound <- mean_combined-margin_of_error
upper_bound <- mean_combined+margin_of_error
cat("99% Confidence Interval : [ ", lower_bound, " , ", upper_bound , " ]")

99% Confidence Interval : [  0.6968146  ,  3.303185  ]

## R Built-in Datasets, Assess data with Confidence interval for the Mean

In [1]:
# enable built-in R datasets for analysis
library(datasets)

In [11]:
# Traverse data frame, identify desired dataset
# data()

In [14]:
head(USArrests)

Unnamed: 0_level_0,Murder,Assault,UrbanPop,Rape
Unnamed: 0_level_1,<dbl>,<int>,<int>,<dbl>
Alabama,13.2,236,58,21.2
Alaska,10.0,263,48,44.5
Arizona,8.1,294,80,31.0
Arkansas,8.8,190,50,19.5
California,9.0,276,91,40.6
Colorado,7.9,204,78,38.7


In [22]:
# identify dataframe characteristics
names(USArrests)
nrow(USArrests)
attach(USArrests) #attaches columns to callable "Variables"

In [39]:
n <- length(Murder) # sample size
df <- n - 1 # Degree of Freedom
sample_mean <- mean(Murder) # sample mean

confidence_level <- .95
alpha <- (1 - confidence_level)

z_value <- qt(1 - alpha/2, df)
standard_error <- sd(Murder) / sqrt(n)
margin_of_error <- z_value * standard_error

cat("z value: ",z_value,"\n")
cat("Standard Error: ",standard_error,"\n")
cat("95% Confidence Interval : [ ", sample_mean - margin_of_error, " , ", sample_mean + margin_of_error , " ]")

z value:  2.009575 
Standard Error:  0.6159621 
95% Confidence Interval : [  6.550178  ,  9.025822  ]