# Pipe Operator

The Pipe Operator __%>%__ is very useful when working with the dplyr library and it;s going to allow us to chain together multiple operations or fuctions on a data set.

Let's see why we would want to use a Pipe Operator and it's motivation

First, load in the __dplyr__ library and set the built-in data frame, mtcars to the variable df

In [1]:
library(dplyr)
df <- mtcars

"package 'dplyr' was built under R version 3.6.3"
Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union



Let's say we want to filter down our data based on multiple criteria. We will need to nest our arguments

Let's say we want to filter our data frame for cars that have more than 20 MPG. 
We will start with

In [2]:
filter(df,mpg>20)

mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1


We will also want a random sample of cars with more than 20 MPG. We can use __sample_n()__ to do that

In [3]:
sample_n(filter(df,mpg>20),size=5)

mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1
24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1


And now let's arrange our random sample from greatest MPG to least MPG. We can use the __arrange()__ function to do that. We will also need to add the __desc()__ function to indicate that we want the values to descend.

In [4]:
arrange(sample_n(filter(df,mpg>20),size=5),desc(mpg))

mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
27.3,4,79.0,66,4.08,1.935,18.9,1,1,4,1
24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1
21.4,4,121.0,109,4.11,2.78,18.6,1,1,4,2
21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4


That's our data selection. It's kind of hard to read though with so many nested functions. We can try to use multiple assignments to make it more readable

In [5]:
# Multiple Assignments

a <- filter(df,mpg>20)
b <- sample_n(a,size = 5)
result <- arrange(b,desc(mpg))

print(result)

   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
2 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
3 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
4 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
5 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4


This is much more readable, but we assigned two different variables in the process. Those variables are taking up memory in our system. This is where the pipe operator comes in handy.

The pipe operator, __%>%__, is used to nest functions in a more readable format. 

 - newdf <- data  %>% operation1() %>% operation2() %>% operation3 

In [6]:
# Pipe Operator

result <- df %>% filter(mpg>20) %>% sample_n(size=5) %>% arrange(desc(mpg))

result

mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
26.0,4,120.3,91,4.43,2.14,16.7,0,1,5,2
24.4,4,146.7,62,3.69,3.19,20.0,1,0,4,2
21.4,4,121.0,109,4.11,2.78,18.6,1,1,4,2
21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4


And that's the pipe operator! Pretty easy!