# Data Manipulation with Tidyverse, Part II

In the Data Foundations module, we covered Part I of Data Manipulation with the Tidyverse and began learning how to clean datasets using packages in the Tidyverse. In this lecture, we will cover a few more advanced data manipulation techniques. 

Remember, **data manipulation is essential in environmental data science** because real-world data often isn’t in a format that fits the models we want to use. By transforming the data, we can use it to build accurate and precise models that approximate how the world works. Without data manipulation, we may not be able to build the most accurate model. 

In part I, we used the following function from the `dplyr` package 

  - `mutate()`
  - `if_else()`
  - `filter()`
  - `select()`
  - `group_by()`
  - `summarise()`

In this part II lecture, we are going to learn about `join()` functions in the `dplyr` package, as well as `pivot()` functions in the `tidyr` package 

  - `left_join()`
  - `pivot_longer()`
  - `pivot_wider()`


First things first, let's load our packages 

In [None]:
# begin inserting here

# Summary 

- `left_join(x, y, by = "key")`: Add columns from `y` to `x` by matching rows using a shared column (called a **key**).
- `pivot_longer(cols, names_to, values_to)`: Turn multiple columns into **key-value pairs**, where the original column names become values in a new "key" column, and the original cell values become values in a new "value" column.

  **What’s a key-value pair?**  
  A **key** is a label that identifies what kind of data a value represents, and the **value** is the data itself.  
  For example, if you have columns named `Jan`, `Feb`, and `Mar`, each with temperature values, `pivot_longer()` will turn those columns into:
  
  | location | Month | Temperature |
  |----------|-------|-------------|
  | Forest   | Jan   | 5           |
  | Forest   | Feb   | 6           |
  | Forest   | Mar   | 10          |

  Here, `Month` is the **key**, and `Temperature` is the **value**.

- `pivot_wider(names_from, values_from)`: Spread key-value pairs back into multiple columns.

  **Example of key-value pairs for `pivot_wider()`:**  
  Suppose you have a dataset in long format like this:

  | station | day       | rainfall_mm |
  |---------|-----------|-------------|
  | A       | Monday    | 5           |
  | A       | Tuesday   | 10          |
  | A       | Wednesday | 3           |
  | B       | Monday    | 0           |
  | B       | Tuesday   | 0           |
  | B       | Wednesday | 12          |

  Using `pivot_wider(names_from = day, values_from = rainfall_mm)`, this will transform into:

  | station | Monday | Tuesday | Wednesday |
  |---------|--------|---------|-----------|
  | A       | 5      | 10      | 3         |
  | B       | 0      | 0       | 12        |

  Here, `day` is the **key**, and `rainfall_mm` is the **value**.