# Data Manipulation with Tidyverse, Part I

Use this Jupyter Notebook to take notes during the lecture. Fill it out with text, code, and comments while watching the video lecture to gain hands-on experience running your first code. Insert your own text and code blocks in this notebook while following the video lecture. This will help you learn by doing and serve as notes for future reference.

In this lecture and the next, we're going to go through a lot of different concepts. Remember, you can use the Complete Notebook as a reference for all that we will cover in this lecture. 

The goal of this lecture and the next is:

1) Learn about packages, the tidyverse
2) Learn how to manipulate and clean data with tidyverse

# Why is data manipulation important 

Presumably, we are all planning to do Environmental Data Science

- We will have a hypothesis of how the world works
- We will want to construct a model that approximates that 
- We will need data from the real world to build that model that approximates the world
- The data we have may not be set up to be plugged into the model we'd like to run
- However, it could be *manipulated* so that we can use the data we have with the model we want

My personal example: American Time Use Survey and Travel Cost Model, Cell Phone Data and green space

# What are packages and how can I get them?

**What are they:**

- A package contains a bunch of pre-built functions 
- Anyone can load and use them
- Saves you a ton of time because someone already figured out how to do it

**Tidyverse**

- Collection of R packages 
- All are meant for data science
- Have shared syntax
- Makes it easier to import, tidy, transform, visualize, and model data in R
- Shout out to Hadley Wickham and co 

**Installing a package vs. loading a package** 

- You only need to install a package on your local computer once (your lab has all the packages pre-installed)
- You then "load" that package in a script when you want to use it using the `library()` command

For reference, the code to install a package on your local computer is below. We do not need to run this code on each of our local machines. 

```r
# installing packages that are a part of the tidyverse using r code
install.packages("dplyr")
install.packages("tidyr")
install.packages("ggplot2")
```

You can load packages after you've installed them with the library function

In [None]:
# load dplyr, the package we will focus on today
library(dplyr)

# Manipulating and cleaning with dplyr 

- Going to introduce the primary functions in the dplyr package
    - `mutate()`
    - `if_else()`
    - `filter()`
    - `select()`
    - `group_by()`
    - `summarise()`


In [None]:
# Build our trusty dataset, but let's call this one myBase
myBase <- data.frame(
   name = c("Andie", "Bridger", "Scott"),
   gender = c("Female", "non-binary", "Male"),
   male = c(FALSE, FALSE, TRUE),
   income_cat = c("middle", "poor", "rich"),
   park_dist = c(1, 0.5, 0.1)
)
myBase

# begin inserting here!

# `mutate()`
*insert here*

# `if_else()`

Last lecture, we said that women and non-binary people had their distances from parks recorded wrong. Non-male people were a quarter mile closer to the park than originally recorded.

In [None]:
# goes through each row and changes distance if someone is not male
for (i in seq_along(myBase$male)) {
    if (myBase$male[i] == FALSE) { # check if someone is "not male"
        myBase$park_dist_correct[i] <- myBase$park_dist[i] - 0.25 # adjust
    }else{
        myBase$park_dist_correct[i] <- myBase$park_dist[i]
    }
}

# begin inserting here

# `filter()`

*insert here*

# `select()`
*insert here*

# `group_by()`

*insert here*

In [None]:
# create an environmental dataset (don't clear this!)
df_env_long <- data.frame(
    ecosystem = c("Forest", "Desert", "Wetland", "Grassland", "Urban", "Forest", "Desert", "Wetland", "Grassland", "Urban"),
    species_richness = c(120, 45, 80, 60, 30, 110, 50, 85, 65, 35),
    pollution_level = c("Low", "High", "Medium", "Low", "High", "Low", "High", "Medium", "Low", "High")
)


# `summarise()` 
*insert here*