# 🌾 Week 2: Descriptive Statistics and Central Tendencies
**PLS 120 - Applied Statistics in Agriculture**

**Binder Developer:** Mohammadreza Narimani  
**Lab Content Developer:** PLS 120 TAs  
**Date:** 2024-09-10

For this lab, we will be looking at descriptions of central tendencies. These are good ways of summarizing and inferring some basic information about the data that we have. There are lots of ways to calculate these in R, as well as provide lots of types of summaries of data.

## 📊 Loading the Iris Dataset

We'll start by loading the iris dataset and examining it using the `View()` function.

**Expected Output:** The iris dataset will be loaded into memory and you can view it.

In [None]:
# Load the iris data set
data <- iris

# To examine data, you can use the View() function
# Note: View() opens in a separate window in RStudio, but in Jupyter we can use head() instead
head(data)

## 📈 Getting Data Summaries

Next, let's take a look at the data set and getting a summary. You can do this in a few different ways:

- The **`summary()`** function will give you the min, max, mean and median for each of the variables, or one if you choose to subset the data at all. You can also calculate all of these individually using their corresponding functions such as `min()`, `max()`, `mean()` and `median()`.

- The **`head()`** function will give you the first six rows of the data set. This can be useful for checking on large data sets to see if they have loaded correctly.

- To calculate the **variance** and **standard deviation**, use the `var()` and `sd()` functions.

**Expected Output:** You'll see statistical summaries including min, max, mean, median, and quartiles for each variable.

In [None]:
# Use the summary and head functions to get an overview of the iris data
# Hint: summary(data)

summary()

# Hint: head(data)
head()

## 🎯 Your Turn: Calculate Central Tendencies

Now it's your turn to calculate some statistics! Complete the code below to calculate the mean and median for sepal width.

**Question:** How do they compare with the mean and median in summary?

**Expected Output:** You should see numerical values for mean and median of sepal width.

In [None]:
# Calculate the mean and median for sepal width
# How do they compare with the mean and median in summary?

# Hint: mean.sepal.width <- mean(data$Sepal.Width)
mean.sepal.width <-
  
# Hint: median.sepal.width <- median(data$Sepal.Width)
median.sepal.width <-

# Print the values
mean.sepal.width
median.sepal.width

## 📊 Calculating Variance and Standard Deviation

Variance and standard deviation are measures of spread in your data. The standard deviation is simply the square root of the variance.

**Question:** How do these two values compare to one another?

**Expected Output:** You'll see two numerical values - variance will be larger than standard deviation.

In [None]:
# Calculate the variance and standard deviation for sepal width
# How do these two values compare to one another?

# Hint: var.sepal.width <- var(data$Sepal.Width)
var.sepal.width <-
  
# Hint: sd.sepal.width <- sd(data$Sepal.Width)
sd.sepal.width <-

# Print the values
var.sepal.width
sd.sepal.width

## 📐 Calculating Quantiles

You can also calculate quantiles using `quantile()`. Quantiles divide your data into equal parts.

**Question:** How do they compare to the quantiles from the summary?

**Expected Output:** You'll see the 0%, 25%, 50%, 75%, and 100% quantiles (min, Q1, median, Q3, max).

In [None]:
# You can also calculate quantiles using quantile()
# How do they compare to the quantiles from the summary?

# Hint: quantiles.sepal.width <- quantile(data$Sepal.Width)
quantiles.sepal.width <-

# Print the values
quantiles.sepal.width

## ✅ Checking Your Work

If everything worked correctly, you should see vectors you created in the global environment. You can also print the values by typing out the vector name in the chunk.

**Expected Output:** All your calculated statistics will be displayed together.

In [None]:
# Print all calculated values together
mean.sepal.width  
median.sepal.width
var.sepal.width
sd.sepal.width
quantiles.sepal.width

## 🔢 Calculating Mode (Most Frequent Value)

R does not have a built-in function for calculating Mode (most frequent observation). You can build your own functions in R.

**Expected Output:** You'll see the most frequently occurring value in the Sepal.Width column.

In [None]:
# Create a custom Mode function
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

# Calculate mode for sepal width
# Hint: mode.sepal.width <- Mode(data$Sepal.Width)
mode.sepal.width <- Mode(data$Sepal.Width)
mode.sepal.width

## ➕ Calculating Sum

We have a `sum()` function, which does a summation for all the values you define inside it.

**Expected Output:** You'll see the total sum of all sepal width values.

In [None]:
# Calculate the sum of sepal width
# Hint: sum_sepal_width <- sum(data$Sepal.Width)
sum_sepal_width <- sum(data$Sepal.Width)
sum_sepal_width

## 📏 Getting Data Dimensions

We can have the number of rows and columns in our dataframe using `nrow()` and `ncol()` functions.

**Expected Output:** You'll see the number of rows (150) and columns (5) in the iris dataset.

In [None]:
# Get number of rows
# Hint: nrows_data <- nrow(data)
nrows_data <- nrow(data)
nrows_data

# Let's double check it with str()
str(data)

In [None]:
# Get number of columns
# Hint: ncols_data <- ncol(data)
ncols_data <- ncol(data)
ncols_data

# Let's double check it with str()
str(data)

## 🎉 Congratulations!

You've completed Week 2 of PLS 120! You've learned:

✅ **`summary()`** - Get comprehensive statistical summaries  
✅ **`head()`** - View the first few rows of data  
✅ **`mean()`** - Calculate the average  
✅ **`median()`** - Find the middle value  
✅ **`var()`** - Calculate variance  
✅ **`sd()`** - Calculate standard deviation  
✅ **`quantile()`** - Find data quartiles  
✅ **Custom Mode function** - Find most frequent value  
✅ **`sum()`** - Calculate total sum  
✅ **`nrow()` and `ncol()`** - Get data dimensions  

---

## 📧 Questions?

If you have more questions about this lab or need help with R programming, please contact:

**Mohammadreza Narimani**  
📧 mnarimani@ucdavis.edu  
🏫 Department of Biological and Agricultural Engineering, UC Davis

---

*Next week: We'll explore more advanced statistical concepts and visualizations!* 🚀