# üìä PLS 120 Lab 6: Confidence Intervals and T-Tests

**Binder Environment Developer:** Mohammadreza Narimani  
**Lab Content Developer:** Parastoo Farajpoor  
**Date:** November 5, 2024  
**Course:** PLS 120 - Applied Statistics in Agricultural Sciences  
**Institution:** UC Davis

In [None]:
# Load required libraries
suppressPackageStartupMessages({
  library(ggplot2)
  library(dplyr)
  library(knitr)
  library(tigerstats)
})

## üñ®Ô∏è Printing Values in R

Let's just quickly talk about how to print out the values of some objects.

In [None]:
#1. Basic method: write down the name of the object and you'll see the value it holds.
x <- 42
x

In [None]:
#2. Basic Printing with print() function: The print() function is the standard way to display basic data types and objects in R. It outputs the content to the console.
x <- 2
print(x)

In [None]:
x <- c(1,2,3)
print(x)

In [None]:
x <- c("Alice", "Bob")
print(x)

In [None]:
#3. cat() function: cat() concatenates and prints objects. It combines string vectors into a single string vector, separating them by a space or other specified character.

x <- c(4,6,8)
cat("The values of x are:", x)

In [None]:
x <- c(4,20,21)
cat("The three values of x are:", x[1], x[2], x[3])

In [None]:
## you can't store the cat function inside an object
statement <- cat("The three values of x are:", x[1], x[2], x[3])
print(statement)

In [None]:
#4. paste() Function: paste() combines string vectors into a single string vector, separating them by a space or other specified character. 
x <- c(1,2,3)
paste("The three values of x are:", x[1], x[2], x[3])

In [None]:
##you can store the paste function inside an object and print it later.
statement <- paste("The three values of x are:", x[1], x[2], x[3])
print(statement)

## üìä Confidence Intervals and Statistical Inference

For this lab, we're going to learn how to construct confidence intervals and compare means between groups using the iris dataset.

## üìÇ Loading and Exploring Data

In [None]:
#Next, we'll load the data.
data <- iris

str(data)

In [None]:
#You can see that the variable "Species" is a factor. If it was not a factor, we could use as.factor() to make this variable a factor.
data$Species <- as.factor(data$Species)
str(data)

## üî¢ Sample Sizes

In [None]:
#let's see how many observation of each factor (species) we have.
table(data$Species)

## üìà Data Visualization

In [None]:
#Let's visualize sepal length differences between species
ggplot(data, aes(x=Species, y=Sepal.Length, fill=Species))+geom_boxplot()

In [None]:
ggplot(data, aes(x=Species, y=Sepal.Length, fill=Species))+geom_boxplot()+geom_point()

In [None]:
ggplot(data, aes(x=Species, y=Sepal.Length, fill=Species))+geom_boxplot()+geom_jitter()

#geom_jitter() shows individual data points without overlap

## üîÑ Data Filtering

In [None]:
#Create separate datasets for each species
setosa_df <- data %>% filter(Species == "setosa")
versicolor_df <- data %>% filter(Species == "versicolor")
virginica_df <- data %>% filter(Species == "virginica")

## üìä Sample Means

**Formula:** $\bar{x} = \frac{\sum x_i}{n}$

Calculate the sample mean for each species:

In [None]:
mean_setosa <- mean(setosa_df$Sepal.Length)
mean_versicolor <- mean(versicolor_df$Sepal.Length)
mean_virginica <-mean(virginica_df$Sepal.Length)


#Let's create a vector containing all of these 3 means and print the values.
all_means <- c(mean_setosa, mean_versicolor, mean_virginica)
print(all_means)

## üéØ Confidence Intervals

Sample means are estimates of population means. To quantify uncertainty, we construct confidence intervals.

## üìê Standard Deviations

**Formula:** $s = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n-1}}$

In [None]:
sd_setosa <- sd(setosa_df$Sepal.Length)
sd_versicolor <- sd(versicolor_df$Sepal.Length)
sd_virginica <- sd(virginica_df$Sepal.Length)

In [None]:
#you can use cat() function to concatenate strings and variables into a single output.
cat("The standard deviation of Setosa is", sd_setosa,
    ", Versicolor is", sd_versicolor,
    ", and Virginica is", sd_virginica)

In [None]:
#you can also use paste function for this purpose.
paste("The standard deviation of Setosa is", sd_setosa,
    ", Versicolor is", sd_versicolor,
    ", and Virginica is", sd_virginica)

## üìè Standard Error

**Formula:** $SE = \frac{s}{\sqrt{n}}$

In [None]:
n <- 50

SE_setosa <- sd_setosa/sqrt(n)
SE_versicolor <- sd_versicolor/sqrt(n)
SE_virginica <- sd_virginica/sqrt(n)

## üìä T-Distribution Parameters

**Degrees of Freedom:** $df = n - 1$

Since population variance is unknown, we use t-distribution instead of z-distribution.

In [None]:
a <- 0.05  # 95% confidence level
DoF <- n-1

## üéØ Critical T-Value

**Formula:** $t_{\alpha/2, df}$ where $\alpha = 0.05$

In [None]:
t_score <- qt(1-a/2, DoF)
t_score

## üå∏ Setosa Confidence Interval

**Margin of Error:** $ME = t \times SE$
**Confidence Interval:** $\bar{x} \pm ME$

In [None]:
ME_setosa <- t_score*SE_setosa

LB_setosa<- mean_setosa - ME_setosa
UB_setosa<- mean_setosa + ME_setosa

CI_setosa <- c(LB_setosa, mean_setosa, UB_setosa)
CI_setosa

## üå∫ Versicolor Confidence Interval

In [None]:
ME_versicolor <- t_score * SE_versicolor

LB_versicolor <- mean_versicolor - ME_versicolor
UB_versicolor <- mean_versicolor + ME_versicolor

CI_versicolor <- c(LB_versicolor, mean_versicolor, UB_versicolor)
CI_versicolor

## üå∑ Virginica Confidence Interval

In [None]:
ME_virginica <- t_score * SE_virginica

LB_virginica <- mean_virginica - ME_virginica
UB_virginica <- mean_virginica + ME_virginica

CI_virginica <- c(LB_virginica, mean_virginica, UB_virginica)
CI_virginica

## üîç Comparing Confidence Intervals

Now we can compare the confidence intervals to assess if population means are significantly different.

In [None]:
#Let's take a look at the confidence intervals all at once. Run all three lines at the same time. 
CI_setosa
CI_versicolor
CI_virginica

## üìà Results Interpretation

**Key Finding:** None of the confidence intervals overlap, indicating the species means are significantly different.

**Rule:** Non-overlapping confidence intervals suggest statistically significant differences between group means.

---

## üìß Need Help?

**Mohammadreza Narimani** (Teaching Assistant)  
üìß mnarimani@ucdavis.edu  
üè´ Department of Biological and Agricultural Engineering, UC Davis  
‚è∞ Office Hours: Thursdays 10 AM - 12 PM (Zoom)  
üîó [Join Zoom Office Hours](https://ucdavis.zoom.us/j/99533096447)

---

*Week 6 - Confidence Intervals and T-Tests | PLS 120 | UC Davis*