# 🎲 PLS 120 Week 4: Probability and Sampling
**Applied Statistics in Agriculture**

**Instructor:** Parastoo Farajpoor  
**Binder Developer:** Mohammadreza Narimani  
**Date:** October 22, 2025

---

## 🎯 Learning Objectives

By the end of this lab, you will be able to:

✅ **Understand Logical Variables** - Work with TRUE/FALSE values and logical operations  
✅ **Convert Data Types** - Transform between numeric, character, factor, and logical types  
✅ **Perform Random Sampling** - Use the sample() function for population sampling  
✅ **Simulate Probability** - Model coin tosses and dice rolls programmatically  
✅ **Work with Normal Distributions** - Use rnorm(), pnorm(), and qnorm() functions  
✅ **Visualize Probability Distributions** - Create bar plots and histograms of outcomes  

---

## 📚 Setup: Load Required Libraries

Let's start by loading the libraries we'll need for this lab:

In [None]:
# Load required libraries
suppressPackageStartupMessages({
  library(ggplot2)
  library(tigerstats)
})

cat("✅ Libraries loaded successfully!")

---

## 🔍 Part 1: Logical Variables

First, let's take a quick peek into logical variables. Logical variables in R are variables that can only take the values TRUE, FALSE, or NA (missing value). They are the result of logical conditions and are fundamental in decision-making processes in programming. Below are some examples to illustrate the use of logical variables:

In [None]:
# Example 1: Basic logical comparison
is_five_greater_than_two <- 5 > 2
is_five_greater_than_two
# This will return TRUE because 5 is indeed greater than 2.

In [None]:
# Example 2: Equality check
is_five_equal_to_five <- 5 == 5
is_five_equal_to_five
# This will return TRUE because 5 is equal to 5.

In [None]:
# Example 3: Logical negation
is_not_equal <- !(1 == 2)
is_not_equal
# This uses the logical negation operator (!) to invert the result of 1 == 2, which will return TRUE because 1 is not equal to 2.

In [None]:
# Example 4: Checking vector equality
vector_comparison <- c(1, 2, 3) == c(3, 2, 1)
vector_comparison
# This compares each element in the two vectors pairwise, resulting in FALSE, TRUE, FALSE.

In [None]:
# Example 5: Combining logical operations
age <- 25
height <- 160
is_young_and_tall <- (age < 30) & (height > 180)
is_young_and_tall
# This will check if the person is younger than 30 AND taller than 180 cm. The result depends on the value of 'age' & 'height'. If both conditions are met, it returns TRUE, if one of the conditions or both conditions are not met, it returns FALSE.

---

## 🔄 Part 2: Data Type Conversion

Next, let's also peek into how to convert the data types to something else. We need to know about this because some functions only accept a certain types of data, and we need to know how to change our data types to what is acceptable by a certain function.

In [None]:
# 1. Converting to numeric: The as.numeric() function converts data to numeric type, which is useful when dealing with numerical operations.

#Example: Convert a character vector to numeric
char_vector <- c("1", "2", "3")
numeric_vector <- as.numeric(char_vector)
print(numeric_vector)

In [None]:
# 2. Converting to integer: The as.integer() function converts data to integer type. This can be crucial when you need to work with integer-specific functions or operations.

#Example:  Convert a numeric vector to integer
numeric_vector <- c(1.5, 2.5, 3.5)
integer_vector <- as.integer(numeric_vector)  # Note: this will truncate the decimal part
print(integer_vector)

In [None]:
# 3. Converting to character: The as.character() function is used to convert data to character type, often necessary for text processing or when preparing data for output.

#Example: Convert a numeric vector to character
numeric_vector <- c(1, 2, 3)
char_vector <- as.character(numeric_vector)
print(char_vector)

In [None]:
# 4. Converting to factor: The as.factor() function converts data to a factor, which is used in statistical modeling to handle categorical variables.

#Example: Convert a character vector to factor
char_vector <- c("yes", "no", "yes", "maybe")
factor_vector <- as.factor(char_vector)
print(factor_vector)

In [None]:
# 5. Converting to data frame: The data.frame() function is used to convert data into a data frame, a crucial data type for handling datasets in R.

#Example: Convert two vectors into a data frame
names_vector <- c("Alice", "Bob", "Charlie")
age_vector <- c(25, 30, 35)
data_frame <- data.frame(Name = names_vector, Age = age_vector)
print(data_frame)

In [None]:
# 6. Converting to logical: The as.logical() function converts data to logical type, which is helpful for conditions and logical operations.

#Example 1: Convert a numeric vector to logical (0 is FALSE, all other numbers are TRUE)
numeric_vector <- c(0, 1, 2)
logical_vector <- as.logical(numeric_vector)
print(logical_vector)

# Example 2: Convert a numeric vector of scores to logical, where non-zero scores are considered "passing" (TRUE).
scores <- c(0, 15, -10, 20, 0, 5)
logical_scores <- as.logical(scores)
print(logical_scores)

---

## 🎯 Part 3: Random Sampling

Before exploring the probability of selecting a sample, it's essential to understand how to perform random sampling from a population in R. We accomplish this with the sample() function, which facilitates drawing random samples from a specified dataset.

The sample() function requires three arguments:  
1- **x**: the population (dataset) that we want to take samples from  
2- **size**: the sample size (number of samples)  
3- **replace**: determines whether the sampling is done with replacement (TRUE) or without replacement (FALSE). If we set "replace = FALSE", each row will be selected once, and if we set "replace = TRUE", each row might be selected more than once.

In this example, we'll randomly select 30 rows from iris (flower) dataset without replacement (each row will only be selected once).

In [None]:
data <- iris

#nrow() function gives us the number of rows (observations) in a data frame. We need to know the number of rows in our dataset because that would be our population that we want to take our samples from.
nrow(data)

In [None]:
# Randomly take 30 samples (rows) without replacement. This code prints the row indices of randomly selected samples.
sampled_indices <- sample(nrow(data), size = 30, replace = FALSE)
sampled_indices

In [None]:
#As you can see, sampled_indices is a vector that contains the row number of selected samples. In order to access to the sample themselves, we need to get the selected rows from our data frame.

# Use the indices to get the sampled rows from the our dataset. This code will result in a data frame that only contains the selected rows (samples)
dataset_subset <- data[sampled_indices,]
head(dataset_subset)

---

## 🪙 Part 4: Coin Toss Simulation

Dice rolls and coin tosses are simple ways to simulate probabilities and probability distributions. However, we won't be using physical dice. Instead, we will be writing code that is the equivalent of flipping coins and rolling dice.

In [None]:
#First, we need to create a vector to represent 2 sides of a coin. This vector will contain two elements: "H" for heads and "T" for tails.
coin <- c("H","T")
coin

In [None]:
#Now that we have a "coin", we perform the toss by using the sample function. In this example, we will toss the coin 20 times. Setting replace=TRUE allows each toss to be independent, representing an actual coin toss where each outcome (heads or tails) has an equal chance of occurring every time.
toss <- sample(coin, size=20, replace=TRUE)
print(toss)

In [None]:
#Now that we have the results of our 20 coin tosses stored in the toss vector, we can analyze the outcomes by counting the number of heads and tails. We achieve this by checking each element of the toss vector to see whether it is "H" (heads) or "T" (tails), and then summing up the results to get the total counts for each.

# Create a logical vector indicating whether each toss resulted in heads.This line checks each element in the 'toss' vector to see if it matches "H". If the condition is met, it returns TRUE; otherwise, it returns FALSE.
heads_vector <- toss == "H"
heads_vector

In [None]:
# Count the number of heads in the coin tosses. This line sums all the instances where the condition (toss == "H") is met (TRUE).
heads <- sum(toss == "H")
heads

#Let's do the same for tails
tails_vector <- toss=="T"
tails_vector

tails <- sum(toss=="T")
tails

In [None]:
#To calculate the probability of an event, we divide the number of times the event occurs by the total number of trials. We might expect that the probabilities for the heads and tails to be 50/50 or 0.5. Let's calculate the probability for heads and tails.
prob_heads <- heads/20
prob_heads

#probability for tails can also be achieved by 1-prob_heads.
prob_tails <- tails/20
prob_tails

In [None]:
#As you see, the probabilities you observed for heads and tails does not exactly equals to 50%. Let's explore this further by looking at how the outcomes of our coin tosses distribute between heads and tails. We'll start by turning our coin tosses from a vector to a table. This transformation allows us to view the data more like a frequency table, providing a clearer summary of the outcomes. This table will display how many times each outcome (heads and tails) occurred.
toss_table<- table(toss)
toss_table

#Next, we'll determine the probability of each outcome by dividing the frequency of each result by the total number of tosses. This approach converts the frequency data into a probability distribution, which helps in understanding how frequently each outcome appears relative to the total number of tosses (in our case 20 tosses).
toss_prob_distribution <- toss_table/sum(toss_table)
toss_prob_distribution

In [None]:
#Now let's make a bar plot to visualize the probability distribution table. For this purpose, we can either use ggplot() function or barplot() function. Let's explore both methods:

# 1. barplot() function:

## 1.1: This gives you the bar plot that shows the frequency in y axis
barplot(toss_table)

## 1.2: This gives you the bar plot that shows the probability in y axis
barplot(toss_prob_distribution)

In [None]:
#We can make our barplot cooler.
barplot(toss_prob_distribution, main="Probability Distribution of Coin Toss", 
        ylab="Probability", xlab="Outcome", col="blue",
        names=c("Heads", "Tails"))

In [None]:
# 2. ggplot() function: To use ggplot() for plotting data directly from a vector like toss, you'll need to slightly adjust how the data is passed into ggplot(). The ggplot2 library generally expects data in a data frame format, so even if you're working with a simple vector or table, it's a good practice to convert it to a data frame or use another method to structure the data appropriately.

## 2.1: This gives you the bar plot that shows the frequency in y axis
toss_df <- data.frame(toss_table)
ggplot(toss_df, aes(x=toss, y = Freq))+geom_bar(stat = "identity") #x and y should be the column names in the toss_df data frame (which is toss and Freq).

In [None]:
## 2.2: This gives you the bar plot that shows the probability in y axis
toss_prob_df <- data.frame(toss_prob_distribution)
ggplot(toss_prob_df, aes(x=toss, y = Freq))+geom_bar(stat = "identity")  # 'identity' to use the y values directly

#The geom_bar(stat = "identity") is crucial here. It tells ggplot that you want to use the values in the Probability field (Freq) directly for the height of the bars, unlike the default behavior of geom_bar(), which counts occurrences. Since the counts of occurrences for heads and tails in the data frame is 1 (we have one observation of each), it shows 1 on y axis by default.

---

## 🎲 Part 5: Dice Roll Simulation

Now that we are done with coin flips, let's take a look into dice rolls. The process is almost exactly the same, except now we are going to create a sample vector with six possibilities instead of two.

In [None]:
#For this example, We will roll the dice 100 times.

# Define a vector representing the six faces of a dice
dice <- c(1:6) #you can also write it as dice <- seq(1,6,1)
dice

# Roll the dice 100 times using the sample() function
roll <- sample(dice, size=100, replace=TRUE)
head(roll, 20)  # Show first 20 rolls

In [None]:
# Count the occurrences of each face of the dice
roll_counts <- table(roll)
roll_counts

# Calculate the expected probabilities for each face assuming a fair dice
exp_prob_dice <- roll_counts / sum(roll_counts)
exp_prob_dice

### 🎲🎲 Two Dice Simulation

In [None]:
#Now what do you think would happen if we roll 2 dice? Let's examine this example with 2 scenarios: 40 rolls and 1000 rolls.

# Roll two dice 40 times
roll2_40 <- sample(dice, size=40, replace=TRUE) + sample(dice, size=40, replace=TRUE)
head(roll2_40, 10)  # Show first 10 sums

exp_prob2_40 <- table(roll2_40) / 40
exp_prob2_40

In [None]:
barplot(exp_prob2_40, main="40 Rolls of Two Dice", xlab="Sum of 2 Dice", ylab="Probability", col="blue")

In [None]:
# Roll two dice 1000 times
roll2_1000 <- sample(dice, size=1000, replace=TRUE) + sample(dice, size=1000, replace=TRUE)
head(roll2_1000, 10)  # Show first 10 sums

exp_prob2_1000 <- table(roll2_1000) / 1000 
exp_prob2_1000

In [None]:
barplot(exp_prob2_1000, main="1000 Rolls of Two Dice", xlab="Sum of 2 Dice", ylab="Probability", col="red")

In [None]:
# Now if you want to compare the plots side by side, you can create a two-panel plot to compare the distributions from 40 rolls vs 1000 rolls with mfrow() function. This function takes a vector of two integers as its argument; the first integer is the number of rows in the plot and the second one is the number of columns in the plot. The use of par(mfrow=c(1,2)) sets up the graphics display for side-by-side plots. When you execute par(mfrow = c(1,2)) and then create two plots, R will automatically place the first plot in the first column and the second plot in the second column of the same row.
### Run these 3 lines together:
par(mfrow=c(1,2))
barplot(exp_prob2_40, main="40 Rolls of Two Dice", xlab="Sum of 2 Dice", ylab="Probability", col="blue")
barplot(exp_prob2_1000, main="1000 Rolls of Two Dice", xlab="Sum of 2Dice", ylab="Probability", col="red")

#What happens as the number of rolls increases? As the number of rolls increases from 40 to 1000, the distribution of the sums should more closely resemble a normal distribution, due to the Central Limit Theorem. This theorem states that the distribution of the sum (or average) of a large number of random variables, each with finite mean and variance, will approximate a normal distribution, regardless of the underlying distribution of the variables.

---

## 📊 Part 6: Normal Distribution Functions

We've already looked at how to simulate dice rolls and coin tosses to see how probabilities work. Now, we're going to use functions like rnorm(), pnorm(), and qnorm() that help us work with normal distributions. These functions allow us to simulate data, calculate probabilities, and determine quantiles based on specified means and standard deviations. These functions will be very useful as we move forward and use them to perform hypothesis testing.

In [None]:
# The rnorm() function generates a random series of numbers that fit a normal distribution. You will need to specify how many numbers to generate, the mean and the standard deviation of the generated numbers. Let's generate 100 numbers with a normal distribution, the mean of 50 and standard deviation of 25:
normal <- rnorm(100, mean=50, sd=25)
head(normal, 10)  # Show first 10 values

In [None]:
#Let's visually look at the distribution of the numbers to see how their density plot and histogram look like. First we need to create a data frame of the numbers to use them in ggplot() function.
df_normal <- data.frame(normal)
ggplot(df_normal,aes(x=normal))+geom_density()

In [None]:
ggplot(df_normal,aes(x=normal))+geom_histogram()

In [None]:
# The pnorm() function can be used to determine the probability that a random draw from a normal distribution is below a certain value. If you have a normal distribution with a mean of 50 and a standard deviation of 25, and you want to know the probability of drawing a number less than 82 from this distribution, pnorm() gives you that probability (area under the curve before 82)
prob_less_than_82 <- pnorm(82, mean = 50, sd = 25)
print(prob_less_than_82)

In [None]:
# Visualize this using pnormGC() function inside tigerstats package for better understanding (no need to go into details for this for now)
pnormGC(82, mean = 50, sd = 25, graph = TRUE)

In [None]:
# The qnorm() function is the inverse of pnorm(), it returns the quantile value below which a given percentage of data falls in a normal distribution. To test this, let's use the quantile that we got in the previous example, and see what happens when we put it in qnorm() function.
# Example: use qnorm() to find the data point corresponding to a percentile of 0.8997 in a normal distribution population with a mean of 50 and sd of 25
data_point_at_90th_percentile <- qnorm(0.8997, mean = 50, sd = 25)
print(data_point_at_90th_percentile)

In [None]:
# Visualize this using qnormGC() function inside tigerstats package for better understanding (no need to go into details for this for now)
qnormGC(0.8997, mean = 50, sd = 25, graph = TRUE)

---

## 🎯 Summary

In this lab, we covered:

✅ **Logical Variables**: TRUE/FALSE operations and comparisons  
✅ **Data Type Conversion**: Converting between different R data types  
✅ **Random Sampling**: Using sample() function for population sampling  
✅ **Probability Simulation**: Coin tosses and dice rolls  
✅ **Normal Distributions**: rnorm(), pnorm(), and qnorm() functions  
✅ **Visualization**: Bar plots and histograms for probability distributions  

These concepts form the foundation for understanding probability theory and statistical inference in agricultural research!

---

## 📧 Need Help?

If you have questions about this lab or need help with R programming, please contact:

**Mohammadreza Narimani**  
📧 mnarimani@ucdavis.edu  
🏫 Department of Biological and Agricultural Engineering, UC Davis

---

*PLS 120 - Applied Statistics in Agricultural Sciences | UC Davis | October 2025*