# Mod5/L2 Confidence Intervals for Variances

## Introduction
In this video, we will discuss how to construct confidence intervals for variances $(\sigma^2)$ and the difference between two population proportions. We will use the skills we have learned so far.

## Example: Difference Between Two Population Proportions
We have two random samples from different regions of a country undergoing a national election. 

In the first sample of 500 people, 320 preferred Candidate A. In the second sample of 420 people, 302 preferred Candidate A. 

We want to construct a confidence interval for the difference between the true proportions $(P_1 - P_2)$.

### Steps
1. **Estimator**: Use the difference in sample proportions $(\hat{P}_1 - \hat{P}_2)$.
2. **Distribution**: By the Central Limit Theorem, the sample proportions are approximately normally distributed for large samples.
3. **Standardization**: Standardize the difference by subtracting the mean and dividing by the standard deviation.
4. **Critical Values**: Use the critical values for the standard normal distribution to construct the confidence interval.

### R Example


In [3]:
# Sample data
n1 <- 500
p1_hat <- 320 / 500
n2 <- 420
p2_hat <- 302 / 420

# Standard deviation
std_dev_diff <- sqrt((p1_hat * (1 - p1_hat) / n1) + (p2_hat * (1 - p2_hat) / n2))

# Critical value for 90% confidence interval
z_critical <- qnorm(0.95)

# Confidence interval
lower_bound <- (p1_hat - p2_hat) - z_critical * std_dev_diff
upper_bound <- (p1_hat - p2_hat) + z_critical * std_dev_diff

# Output the confidence interval
cat("90% Confidence Interval for the difference in proportions: [", lower_bound, ", ", upper_bound, "]\n")

90% Confidence Interval for the difference in proportions: [ -0.129526 ,  -0.02856922 ]


## Interpretation
The result indicates that there is indeed a statistically proven difference between the two sample proportions:

"I get negative 0.129 to negative 0.029. One thing this suggests as being an interval of plausible values for the difference P1 minus P2, because this interval
doesn't actually contain zero, it's completely in the negatives, it suggests that the true proportions are not quite equal based on this particular sample. 

It does seem like P2 hat is larger than P1 hat, but previously, we weren't quite sure, how big of a difference we needed to say they were statistically different."

## Example: Variance for a Single Population
A potato chip manufacturer wants to ensure that the variance in the weight of 10-ounce bags is small. We take a random sample of 20 bags and find a sample variance of 0.52 ounces. Assuming the weights are normally distributed, we construct a 95% confidence interval for the true standard deviation ((\sigma)).

### Steps
**Estimator**: Use the sample standard deviation.

**Distribution**: For a normal distribution, $((n-1)s^2/\sigma^2)$ follows a chi-square distribution with $(n-1)$ degrees of freedom.

**Critical Values**: Use the chi-square distribution to find the critical values and construct the confidence interval.

#### R Example

In [13]:
# Sample data
n <- 20
sample_variance <- 0.52

# Chi-square critical values for 95% confidence interval
chi2_lower <- qchisq(0.025, df = n - 1)
chi2_upper <- qchisq(0.975, df = n - 1)

# Chi-square critical values for alternative 95% confidence interval
chi2_lower2 <- qchisq(0.05, df = n - 1)
chi2_upper2 <- qchisq(1.0, df = n - 1)

# Yet another Chi-square critical values for alternative 95% confidence interval
chi2_lower3 <- qchisq(0.0, df = n - 1)
chi2_upper3 <- qchisq(0.95, df = n - 1)

# Output the critical values
cat("Chi-square critical values for 95% confidence interval: ", chi2_lower, ", ", chi2_upper, "\n")

# Output the alternative critical values
cat("Chi-square critical values for first alternative 95% confidence interval: ", chi2_lower2, ", ", chi2_upper2, "\n")

# Output the alternative critical values
cat("Chi-square critical values for second alternative 95% confidence interval: ", chi2_lower3, ", ", chi2_upper3, "\n")

# Confidence interval for variance
lower_bound_var <- (n - 1) * sample_variance / chi2_upper
upper_bound_var <- (n - 1) * sample_variance / chi2_lower

# Confidence interval for standard deviation
lower_bound_sd <- sqrt(lower_bound_var)
upper_bound_sd <- sqrt(upper_bound_var)

# Output the confidence interval
cat("95% Confidence Interval for the standard deviation: [", lower_bound_sd, ", ", upper_bound_sd, "]\n")

# also calculate the alternative lower and upper bounds and the confidence interval for the alternative critical values
lower_bound_var2 <- (n - 1) * sample_variance / chi2_upper2
upper_bound_var2 <- (n - 1) * sample_variance / chi2_lower2

lower_bound_sd2 <- sqrt(lower_bound_var2)
upper_bound_sd2 <- sqrt(upper_bound_var2)

cat("First Alternative 95% Confidence Interval for the standard deviation: [", lower_bound_sd2, ", ", upper_bound_sd2, "]\n")

# also calculate the alternative lower and upper bounds and the confidence interval for the alternative critical values
lower_bound_var3 <- (n - 1) * sample_variance / chi2_upper3
upper_bound_var3 <- (n - 1) * sample_variance / chi2_lower3

lower_bound_sd3 <- sqrt(lower_bound_var3)
upper_bound_sd3 <- sqrt(upper_bound_var3)

cat("Second Alternative 95% Confidence Interval for the standard deviation: [", lower_bound_sd3, ", ", upper_bound_sd3, "]\n")


Chi-square critical values for 95% confidence interval:  8.906516 ,  32.85233 
Chi-square critical values for first alternative 95% confidence interval:  10.11701 ,  Inf 
Chi-square critical values for second alternative 95% confidence interval:  0 ,  30.14353 
95% Confidence Interval for the standard deviation: [ 0.5483974 ,  1.053233 ]
First Alternative 95% Confidence Interval for the standard deviation: [ 0 ,  0.988217 ]
Second Alternative 95% Confidence Interval for the standard deviation: [ 0.5725078 ,  Inf ]


## Conclusion
We have constructed confidence intervals for the difference between two population proportions and for the variance of a single population. In the next video (refer to [mod5_summarytranscript_L3_CIs_RatioOfVariances.ipynb](mod5_summarytranscript_L3_CIs_RatioOfVariances.ipynb)), we will compare the true variances for two different populations using a new distribution.