# 🌾 Week 1: Introduction to R
**PLS 120 - Applied Statistics in Agriculture**

**Binder Developer:** Mohammadreza Narimani  
**Lab Content Developer:** Parastoo Farajpoor  
**Date:** 2025-10-01

This lab will cover the basics of R, including loading data, creating vectors, data frames and tables, and making simple plots. Please follow along with the code chunks provided.

## 🔢 Data Types in R

In R, data can be stored in different types, depending on the kind of information it represents. The basic data types in R are:

1. **Integer**: Whole numbers without decimals. Such as 1, 355, etc. Numbers like 2.5, 34.7 are NOT integers.
2. **Numeric**: Numbers (both integers and decimals).
3. **Character**: Text or string data.
4. **Logical**: TRUE or FALSE values.
5. **Factor**: Categorical data with levels.

We can find out the data type of a variable by using `class()` function. The output of this function is one word, specifying the data type or class.

**Note:** In R, the assignment operator `<-` is equivalent to `=` (like equal sign in math)

**Expected Output:** You'll see words like `"numeric"`, `"integer"`, `"character"`, `"logical"`

In [None]:
# Example of an integer variable
# Notice how R reads this variable as numeric
count <- 10
class(count)

In [None]:
# To define 'count' as integer, you can add 'L' after the value of the variable.
count <- 10L
class(count)

In [None]:
# Example of a numeric variable
x <- 3.14
class(x)  # Returns the type of the object

In [None]:
# If we want to give our x a value with decimal as an integer, we'll get an error.
x <- 3.14L
class(x)  # Returns the type of the object

In [None]:
# Example of a character variable
name <- "Statistics"
class(name)

In [None]:
# Example of a logical variable
is_student <- TRUE
class(is_student)

## 📊 Working with Vectors

Vectors are a series of numbers in one dimension. It means that they only have one row, but they can have different columns. Let's define a vector (a series of numbers) and assign the numbers to a variable called 'vector':

**Mathematical notation:** A vector **v** = (v₁, v₂, v₃, ..., vₙ)

**Expected Output:** You'll see numbers displayed like: 0 5 6 3 6 9 3

In [None]:
vector_1 <- c(0, 5, 6, 3, 6, 9, 3) # here we make a vector containing a random series of numbers
vector_1

### 🎯 Your Turn!
Complete the vector below with seven different numbers:

In [None]:
# make another vector with seven different numbers
example_1 <- c(8, 9, 20, 8, 9, 20, 8)  # Example numbers - you can change these!
example_1

### 🔢 Sequential Vectors

We can use `seq()` function to create a vector of sequential numbers with our specified range and increments. 

**Formula:** `seq(min, max, increment)`

**Expected Output:** You'll see evenly spaced numbers like: 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0

In [None]:
# If we want to create a vector of numbers from 4 to 6 with 0.2 increment, we write:
vector_2 <- seq(4, 6, 0.2)
vector_2
?seq() # using ? before every function will open 'Help' tab to explain it

In [None]:
example_2 <- seq(0, 10, 0.15)
example_2

### 🔗 Combining Vectors

**Expected Output:** You'll see a matrix with 2 rows showing both vectors

In [None]:
# now that we have two vectors, we can combine them to make a single data frame using the rbind() function
suppressWarnings({
  df <- rbind(vector_1, example_1)
})
df

## 📋 Data Frames: Working with 2D Data

We often work with data frames, which are in two dimensions. It means that they have multiple rows and multiple columns.

For working with data frames, we will first need some data to work with. The R language includes several data sets in the language itself to practice this. For the labs, we will most often be using the **"Iris"** data set, which has data describing different species of flowers. 

**Expected Output:** You'll see a structured summary of the iris dataset with 150 observations and 5 variables

In [None]:
flower <- iris # here we assign the iris data in R to an object called flower

# notice the top right corner in 'environment' tab. if you can see the object up there, the data is loaded successfully
str(iris) # now use structure to look at the data

### 📁 Loading External Data

Data can also be loaded using a comma separated value (csv) file. We've included a sample dataset for you!

**Expected Output:** You'll see the structure of the LA_Data dataset

In [None]:
LA_Data <- read.csv("LA_Data.csv") # data can also be loaded using a comma separated value (csv) file
?read.csv() 

str(LA_Data)

### 📖 Understanding Data Types

The `str()` function tells you about data types:

| Type | Description | Example |
|------|-------------|----------|
| **num** | numeric values | 3.14, 2.5 |
| **chr** | character values | "hello", "data" |
| **Factor** | design factors in the experiment | species, treatment |
| **int** | whole integer values | 1, 2, 100 |

## 🧮 R as a Calculator

However, R doesn't always need data to perform its functions. It also works just like a calculator!

**Mathematical Operations:** +, -, ×, ÷

**Expected Output:** You'll see the numerical results: `7`, `3`, `18`, `4`

In [None]:
# addition
3 + 4

In [None]:
# subtraction
5 - 2

In [None]:
# multiplication
3 * 6

In [None]:
# division
8 / 2

### 🎯 Your Turn!
Calculate: what is 22 times 56 plus 8 minus 200?

In [None]:
# what is 22 times 56 plus 8 minus 200?
22 * 56 + 8 - 200

## 📊 Data Visualization: Getting the Big Picture

Whenever looking at data, a good first step is to take a big picture approach, by looking at the distribution of the data points, and through simple visualization. It's often easier to look at a picture than it is to look at a bunch of numbers.

We will start with a **frequency table**. This is a table that shows the distribution of data given two values. This can also be described as looking at the "counts" of the data point. Using the iris data, we will make a frequency table of the sepal length.

### 📚 Required Libraries

**Good news!** We've already installed the `tidyverse` package for you in this Binder environment. No installation needed!

In [None]:
# install.packages(tidyverse) # Already installed for you in Binder!
suppressPackageStartupMessages({
  library(tidyverse) # add in a package to make the code work
})
cat("✅ Tidyverse loaded successfully!")

### 🔢 Simple Frequency Tables

**Expected Output:** You'll see a clean table showing counts for each species

In [None]:
# How many samples (rows) belongs to each species in Species column (specified with $)
species_count <- table(iris$Species)
knitr::kable(as.data.frame(species_count), col.names = c("Species", "Count"))

### 📋 Two-Way Frequency Tables

**Expected Output:** You'll see a table showing combinations of species and sepal lengths (this will be quite large!)

In [None]:
# When you pass two variables to table(), it will count the combinations of values between the two variables. 
# For example, if we want to see how many times each species has a specific Sepal Length, we write:
two_way_table <- table(iris$Species, iris$Sepal.Length)
# Note that this might be impractical if Sepal.Length has many unique values, so you might want to group them into categories by using cut() which converts continuous data into categorical data based on specified breaks.

# Display first few columns to show the concept
knitr::kable(two_way_table[, 1:8], caption = "First 8 Sepal Length values by Species")

### 📊 Grouped Frequency Tables

**Expected Output:** You'll see a cleaner table with grouped ranges

In [None]:
# Here we designed a frequency_table as a 'table', then we pick a factor to divide the table down, we specify how it cuts, we will divide the table by a sequence.

frequency_table <- table(iris$Species, cut(iris$Sepal.Length, seq(4, 6, 0.2)))
knitr::kable(frequency_table, caption = "Species by Sepal Length Groups (4.0-6.0)")

In [None]:
frequency_table2 <- table(iris$Species, cut(iris$Sepal.Length, seq(10, 12, 0.2)))
knitr::kable(frequency_table2, caption = "Species by Sepal Length Groups (10.0-12.0) - Notice all zeros!")

### 🎯 Your Turn!
How would you remake this table for a different data column in the iris data? Try it below!

In [None]:
# how would you remake this table for a different data column in the iris data?
# Hint: Try iris$Petal.Length or iris$Petal.Width
# Example solution:
petal_table <- table(iris$Species, cut(iris$Petal.Length, seq(1, 7, 1)))
knitr::kable(petal_table, caption = "Species by Petal Length Groups")

## 📈 Histograms: Visualizing Data Distribution

This table is useful, but it can be difficult to interpret. However, you can make a simple graph that visualizes this information. A **histogram** shows:
- Range of data
- Location with the highest concentration of measurements
- Shape of distribution (symmetric or skewed)

We will make a histogram, which will show this distribution of the counts. Histograms are also useful because they only require a single function to make!

**Expected Output:** You'll see bar charts showing the frequency distribution of sepal lengths

In [None]:
# the function hist() will make a histogram using some vector of data. Here, we use the Sepal Length column in the iris data using $
hist(iris$Sepal.Length)

### 🎛️ Adjusting Histogram Bins

**Expected Output:** You'll see histograms with different levels of detail (more bars vs fewer bars)

In [None]:
# now that we have a histogram, we can also adjust how many "bins" are made, or how the counts are distributed.
hist(iris$Sepal.Length, breaks = 15)
# Here, setting breaks = 15 divides the range of Sepal Length into 15 equal-width bins, allowing you to control how fine or coarse the bins are.

In [None]:
# now we have increased the number of bins. What happens when we decrease the number of bins?
hist(iris$Sepal.Length, breaks = 2)

## 🎉 Congratulations!

You've completed Week 1 of PLS 120! You've learned:

✅ **Data types** in R (integer, numeric, character, logical, factor)  
✅ **Assignment operator** `<-` (equivalent to `=`)  
✅ **Vectors** and how to create them  
✅ **Sequential vectors** using `seq()`  
✅ **Data frames** and the `str()` function  
✅ **Loading external data** with `read.csv()`  
✅ **R as a calculator** for basic arithmetic  
✅ **Frequency tables** with `table()`  
✅ **Histograms** with `hist()`  

---

## 📧 Questions?

If you have more questions about this lab or need help with R programming, please contact:

**Mohammadreza Narimani**  
📧 mnarimani@ucdavis.edu  
🏫 Department of Biological and Agricultural Engineering, UC Davis

---

*Next week: We'll dive deeper into descriptive statistics and advanced data visualization!* 🚀