

# 04 Data Wrangling


* Load packages
* Visualization
* Tidy data
* Data wrangling


---

**In this session we will start using packages, focus on visualization and third, get a better understanding of how to manage our data tables more efficiently.**

![](https://raw.githubusercontent.com/GC-alex/QM/master/figs/header_sized_small.jpg)


---

# Packages

A **package** is a collection of previously programmed functions, often including functions for specific tasks. 
The R community provides hundreds of them, all for their own purpose and free for you to use.

If you take a carefull look at our previous sessions (01 Introduction to R), we already used some packages for the more advanced plots. You can spot them, because we used them with the `library()`function.

Some Packages we already used:
**ggplot2**,
**maps**,
**dplyr**

Most packages need to be installed on your local machine before you can use them. Simply use the `install.packages("name_of_package")` function for this purpose. Once installed, you can always use the stored functions with the `library()`function. In our **JupyterNotebooks**, please refer from installing packages and simply load installed packages. On your local R, feel free to install as many packages as you like.

---
**Note:**

*But which packages are installed here? Simply check the **install.R** file in the repository - please don't try to make any changes in this file.*

---



# Visualization

In the first session you already made your first plot. But now we want to dig further into R's capabilities.
R has some nice inbuilt function to vizualize our data. Let's try the following functions on our data `airquality`
Just take a variable of the dataset, that you want to explore deeper. 

`hist()` - take one variable of your dataset!

`plot()` - take one variable for the x-axis and a second for the y-axis!

`boxplot()` - take one variable and the Months!



---
**Note**
* *You forgot which variables there are in the airquality dataset? Use the `head()`function!*
* *The boxplot needs a `~` instead of a comma to seperate the variable and the month!*
---

Now let's try to use a package called `ggplot2`. It creates much nicer plots than the standard R ones but it takes more effort to make them.
Use the `library()` function on the name of our package 

Now let's do a histogramm-plot with **ggplot2**:

`ggplot(data = airquality, aes(x = Ozone)) +
  geom_histogram()`

`ggplot()` is our function but it needs more specifique inputs, than our functions that we used before in our standard R. 

`data=` is our used dataset. `aes()` means *aesthetics* and specifies things such as colours, sizes or in this case simply which exact column to plot. `geom_histogram()` gets added via an `+` and specifies your *diagram type*.

Please try a scatterplot with the `geom_point()` *diagram type* and `y= Temp` as an added *aesthetics*. 

Let's try a boxplot now! This time we ware using `x = factor(Month)` and y with Ozone, as our *aesthetics*. And `geom_boxplot()`as *diagram type* creates the boxplot.

Compare all the plots, that you just created!

You can also use factor variables to change aesthetics of the `geom_point` and `geom_line` layers.

Try the following!
`ggplot(data = airquality, aes(x = Ozone, y = Temp, col = factor(Month))) + geom_point()`

Can you find out which part of the code, you have to replace to get `shape`s instead of colours?

We can do the same for `geom_line` diagram types. In this case the aesthetics are called `fill` instead of `col`and `linetype` instead of `shape`. 

---
**Note**

You want to create a `geom_line` plot? Don't forget to use the right data for that: you can try `Day` and `Temp`!

---


Let try to plot a `geom_point` with x = `Ozone`, `y= Temp`, `col = Wind` and `size = Solar.R` !


Now we created many plots, but none of them really looks nice. Let's change that and add a title and axis descriptions. 

`ggplot(data = airquality, aes(x = Ozone, y = Temp, col = factor(Month))) +
  geom_point() +
  labs(x = "Ozone concentration [ppm]", 
       y = "Temperature [°F]", 
       title = "Relationship between Ozone und Temperture in New York")`

# Challenge

Please do **Challenge 04** to practice!

# Summary

*


# List of functions Used
**Load Packages**

`install.packages()`

`library()`

**Visualization:**

R:
`hist()`
`plot()`
`boxplot()`

GGplot2:
`ggplot()`
