# Week 1: Introduction to Statistics and R 🌾

**PLS 120 - Applied Statistics in Agriculture**

This lab will cover the basics of R, including loading data, creating vectors, data frames and tables, and making simple plots. Please follow along with the code chunks provided.

## Required Libraries 📚

We've already installed these libraries for you in this Binder environment:
- **tidyverse**: Collection of data science packages (ggplot2, dplyr, etc.)
- **readr**: For reading data files
- **knitr**: For creating reports

Let's load the tidyverse package:

In [None]:
# Load the tidyverse package (already installed for you)
library(tidyverse)

## Data Types in R 📊

In R, data can be stored in different types, depending on the kind of information it represents. The basic data types in R are:

1. **Integer**: Whole numbers without decimals (1, 355, etc.)
2. **Numeric**: Numbers (both integers and decimals)
3. **Character**: Text or string data
4. **Logical**: TRUE or FALSE values
5. **Factor**: Categorical data with levels

We can find out the data type of a variable by using the `class()` function:

In [None]:
# Example of an integer variable
# Notice how R reads this variable as numeric
count <- 10
class(count)

In [None]:
# To define 'count' as integer, you can add 'L' after the value
count <- 10L
class(count)

In [None]:
# Example of a numeric variable
x <- 3.14
class(x)  # Returns the type of the object

In [None]:
# Example of a character variable
name <- "Statistics"
class(name)

In [None]:
# Example of a logical variable
is_student <- TRUE
class(is_student)

## Working with Vectors 📈

Vectors are a series of numbers in one dimension. They only have one row, but can have different columns. Let's define a vector and assign numbers to a variable:

In [None]:
# Create a vector containing a random series of numbers
vector_1 <- c(0, 5, 6, 3, 6, 9, 3)
vector_1

In [None]:
# Your turn: make another vector with seven different numbers
example_1 <- c(2, 8, 4, 7, 1, 9, 5)  # Complete this line
example_1

In [None]:
# We can use seq() function to create sequential numbers
# seq(min, max, increment)
vector_2 <- seq(4, 6, 0.2)
vector_2

In [None]:
# Create another sequence
example_2 <- seq(0, 10, 0.15)
example_2

In [None]:
# Combine vectors into a data frame using rbind()
df <- rbind(vector_1, example_1)
df

## R as a Calculator 🧮

R doesn't always need data to perform its functions. It also works just like a calculator:

In [None]:
# Addition
3 + 4

In [None]:
# Subtraction
5 - 2

In [None]:
# Multiplication
3 * 6

In [None]:
# Division
8 / 2

In [None]:
# Your turn: what is 22 times 56 plus 8 minus 200?
22 * 56 + 8 - 200

## Working with Data Frames 📋

Data frames are two-dimensional, meaning they have multiple rows and columns. Let's work with the built-in **iris** dataset, which describes different species of flowers:

In [None]:
# Assign the iris data to an object called flower
flower <- iris

# Look at the data structure
str(iris)

In [None]:
# Load our agricultural dataset
crop_data <- read.csv("sample_crop_data.csv")
str(crop_data)

The `str()` function tells you about data types:
- **num**: numeric values
- **chr**: character values  
- **Factor**: design factors in the experiment
- **int**: whole integer values

## Data Visualization: Frequency Tables 📊

When looking at data, a good first step is to take a big picture approach by looking at the distribution of data points through simple visualization. Let's start with frequency tables:

In [None]:
# How many samples belong to each species?
table(iris$Species)

In [None]:
# Two-way frequency table
# Note: This might show many values, so let's group them
frequency_table <- table(iris$Species, cut(iris$Sepal.Length, seq(4, 8, 0.5)))
frequency_table

In [None]:
# Your turn: How would you remake this table for Petal.Width?
petal_table <- table(iris$Species, cut(iris$Petal.Width, seq(0, 3, 0.5)))
petal_table

## Data Visualization: Histograms 📈

Histograms show the range of data, location with the highest concentration of measurements, and shape of distribution (symmetric or skewed). They're easy to make with just one function:

In [None]:
# Create a histogram of Sepal Length
hist(iris$Sepal.Length,
     main = "Distribution of Sepal Length",
     xlab = "Sepal Length (cm)",
     col = "lightblue")

In [None]:
# Adjust the number of "bins" 
hist(iris$Sepal.Length, 
     breaks = 15,
     main = "Sepal Length - More Bins",
     xlab = "Sepal Length (cm)",
     col = "lightgreen")

In [None]:
# What happens when we decrease the number of bins?
hist(iris$Sepal.Length, 
     breaks = 5,
     main = "Sepal Length - Fewer Bins",
     xlab = "Sepal Length (cm)",
     col = "lightcoral")

## Agricultural Data Analysis 🌾

Let's apply what we learned to our agricultural dataset:

In [None]:
# Look at the first few rows of our crop data
head(crop_data)

In [None]:
# Create a frequency table for crop types
table(crop_data$crop_type)

In [None]:
# Create a histogram of crop yields
hist(crop_data$yield,
     main = "Distribution of Crop Yields",
     xlab = "Yield (tons/ha)",
     col = "gold",
     breaks = 10)

In [None]:
# Calculate basic statistics
mean(crop_data$yield)     # Average yield
median(crop_data$yield)   # Median yield
max(crop_data$yield)      # Maximum yield
min(crop_data$yield)      # Minimum yield

## Practice Exercises 🎯

Try these exercises to test your understanding:

In [None]:
# Exercise 1: Create a vector of plant heights
plant_heights <- c(25, 30, 28, 32, 27, 29, 31, 26)

# Calculate the mean height
mean_height <- mean(plant_heights)
cat("Average plant height:", mean_height, "cm\n")

In [None]:
# Exercise 2: Create a histogram of plant heights
hist(plant_heights,
     main = "Distribution of Plant Heights",
     xlab = "Height (cm)",
     col = "lightgreen",
     breaks = 5)

In [None]:
# Exercise 3: Explore the relationship between rainfall and yield
plot(crop_data$rainfall, crop_data$yield,
     main = "Rainfall vs Crop Yield",
     xlab = "Rainfall (mm)",
     ylab = "Yield (tons/ha)",
     col = "blue",
     pch = 16)

## Summary 🎉

Congratulations! You've completed Week 1. You learned:

✅ **Data types** in R (integer, numeric, character, logical, factor)  
✅ **Vectors** and how to create them  
✅ **Basic arithmetic** operations  
✅ **Data frames** and structure exploration  
✅ **Frequency tables** for categorical data  
✅ **Histograms** for visualizing distributions  
✅ **Agricultural data analysis** with real datasets  

**Next week:** We'll dive deeper into descriptive statistics and advanced data visualization!