# Lesson 04 - Data Visualization with `ggplot2`
* [xkcd: Decline](https://xkcd.com/523/)
![alt text](http://imgs.xkcd.com/comics/decline.png "There's also a spike on the Fourier transform at the one-month mark where --' 'You want to stop talking right now.'")

## Load Data and Packages

In [None]:
library(datasets)
delays07 <- read.csv('data/2007_ORD.csv')

Like `dplyr`, `ggplot2` is a library that must be loaded:

In [None]:
library("ggplot2")
library("dplyr")

## Simple Scatter Plot

In [None]:
plot(x = airquality$Temp, y = airquality$Ozone, main="Ozone(Temp)")

## Simple Plot with `ggplot2`
* We first need to clean data before using `ggplot()`

In [None]:
tempOzone <- airquality %>% select(Temp, Ozone) %>% filter(!is.na(Ozone))
head(tempOzone)

* The `ggplot()` command creates an empty plot from a data frame
  * Need to add axis labels with: `aes(x = X_data, y = Y_data)`
  * Need to explicitly specify the way the data will be displayed by adding a `geom_*()` object
    * For example: `geom_point()`
    * More options: http://docs.ggplot2.org/current/
  * Neet to add a title with `ggtitle()`

In [None]:
ggplot(tempOzone)
ggplot(tempOzone) + aes(x = Temp, y = Ozone)

In [None]:
ggplot(tempOzone) + aes(x = Temp, y = Ozone) + geom_point(alpha = 0.5, color = "red") + ggtitle("Ozone(Temp)")

## Boxplot
* Example with `geom_boxplot()`
  * Need to convert Month as a factor

In [None]:
monthTemp <- airquality %>% select(Month, Temp)
monthTemp$Month <- as.factor(monthTemp$Month)
summary(monthTemp$Month)

In [None]:
ggplot(monthTemp) + aes(x = Month, y = Temp) + geom_boxplot()

In [None]:
ggplot(monthTemp) + aes(x = Month, y = Temp) + geom_boxplot(alpha = 0) + geom_point(alpha = 0.3, color = "red")

## Multi-line Plots
* In `aes()`, specify to `group` entries by one column, typically a factor: `group = colName`
  * Still in `aes()`, it is possible to colorize each group name
    * Factors will get random colors
    * Numeric values will get a gradient of color
* Use `geom_line()` to display data as lines

In [None]:
monthTempOzone <- airquality %>% select(Month, Temp, Ozone) %>% filter(!is.na(Ozone))
monthTempOzone$Month <- as.factor(monthTempOzone$Month)

ggplot(monthTempOzone) + aes(x = Temp, y = Ozone, group = Month, colour = Month) + geom_line()

## Exercise 1 - Plots of Delays vs (DayOfWeek, PeriodOfDay)
* From Lesson 03 - Exercise 3
  * Convert DayOfWeek and PeriodOfDay to factors
  * Create a plot where x is DayOfWeek, y is mDelay, and we see one line per PeriodOfDay
  * Create a plot where x is PeriodOfDay, y is mDelay, and we see one line per DayOfWeek

In [None]:
 dayTimeDelay <-      delays07 %>% select(DayOfWeek, CRSDepTime, DepDelay) %>% filter(!is.na(DepDelay))
  withPeriods <-  dayTimeDelay %>% mutate(PeriodOfDay = CRSDepTime %/% 600)
byPeriodOfDay <-   withPeriods %>% group_by(DayOfWeek, PeriodOfDay)
         summ <- byPeriodOfDay %>% summarize(mDelay = mean(DepDelay, na.rm=TRUE))

## Exercise 2 - Data from 2007 and 2008
* Compute average `DepDelay` per `Month` for 2007 and 2008
* Plot both years where x is the `Month`, y is the average `DepDelay`, and group entries by `Year`
* Use `rbind()` to concatenate rows together

In [None]:
delays08 <- read.csv('data/2008_ORD.csv')