# Appendix C Data 


## C.3 Pipe operator `%>%`


Under the hood, `x %>% f(y)` turns into `f(x, y)`, and `x %>% f(y) %>% g(z)` turns into `g(f(x, y), z)` and so on. Functions in `tidyverse` are almost always easy to write and read using `%>%`. But we can use `%>%` on any function, not just those defined in `tidyverse`. Here is an example on printing "hello world" using pipe. 

We will now manipulate the `tibble` object with the pipe operator `%>%`. The `dplyr`/`tidyverse` package offers the `mutate()` and `transmute()` commands to add new variables to data tibbles. The syntax is:
```{r}
<tibble> %>% mutate(<new variable> = <formula for new variable>,  <other new variable> = <other formula>)
```
This returns a copy of `<tibble>` with the new variables added on. `transmute()` does the same thing as `mutate()` but only keeps the new variables.

Let us zoom in on a few variables of interest.

Use `%>%` to create the table above. 

Additional variable can be added using the `mutate()` function. We already have an `air_time` variable. Let us compute the total time for the flight by subtracting the time of departure `dep_time` from time of arrival `arr_time`.

Another way to do the same thing is by using `%>%` twice.

We notice something odd though. When we subtract 5h 17m from 8h 30m we should get 3h 13m, i.e. 193 minutes. But instead we get 313 minutes below.

The issue is that `dep_time` and `arr_time` are in the hour-minute notation, so you cannot add and subtract them like regular numbers. We should first convert these times into the number of minutes elapsed since midnight.

We want add to new variables `new_dep` and `new_arr` but we need to write a function first that can do the conversion. The function is given below; we'll learn how it works later in the semester. For now just think of it as a black box that converts times from one format to another.

Let us test the function on 530. That's 5h 30min, i.e., 330 minutes since midnight.

The `hourmin2min` function is _vectorized_: given a vector, it outputs a vector.

Let us now create two new variables obtained from `arr_time` and `dep_time` by converting them into minutes since midnight. In the same command, we can also create a new `total_time` column containing their difference.

Now we can subtract the departure time `new_dep` from the arrival time `new_arr` to get a new variable `total_time`.

Let us try to do the same thing using pipe just using one line of code.  In this code, we are only interested in the following variables -- `dep_time`, `arr_time`, `new_dep`, `new_arr`, and `total_time`.  

We can even plot the data by adding a `ggplot` command at the end after manipulating your data.

Let's try to create a table for each month with the mean delay time.  Then plot a barchart for each month.  