# Data Wrangling

<h5>

**Wrangling** /ˈræŋ.ɡəl.ɪŋ/

the activity of taking care of, controlling, or moving animals, especially large animals such as cows or horses

</h5>

([Cambridge Dictionary](https://dictionary.cambridge.org/dictionary/english/wrangling))

![Cattle Wrangler - image from https://commons.wikimedia.org/wiki/File:Pioneer_Day_Wrangler.jpg](https://upload.wikimedia.org/wikipedia/commons/thumb/8/83/Pioneer_Day_Wrangler.jpg/320px-Pioneer_Day_Wrangler.jpg)

**[Data wrangling](https://en.wikipedia.org/wiki/Data_wrangling)** commonly refers to the transformation of data from one "input" format (e.g., `.csv` files from an experiment), to a different format (e.g., a tidy dataframe) that is more appropriate to the needs of an analysis. In the context of the ExPra experiments, you will use data wrangling techniques to implement the transformations and data cleaning steps specified in your preregistrations.

## Setup

### Setup Part 1: Install Packages

We will use [`tidyverse`](https://www.tidyverse.org/) packages to implement our data wrangling. The "tidyverse" is a series of packages which share a philosphy based around code and data structures that are (a) tidy, and (b) readable. These packages are frequently based around tidy dataframes, known as "[tibbles](https://tibble.tidyverse.org/)". You can install the tidyverse packages like so:

```
install.packages("tidyverse")
```

This includes many packages that we won't be using today, but which will be useful in other parts of the course (e.g., on Data Visualisation).

Remember, you should install packages in the console - never in a script that your share with others. This is because otherwise, your script will go to the effort of reinstalling a package *every time* it is run!

### Setup Part 2: Check the Packages Load

Now, we can test that the packages we will be using today actually load. You should be able to run this code without any errors:

In [1]:
options(repr.plot.width=3.5, repr.plot.height=3, repr.matrix.max.rows=10)

In [2]:
library(dplyr)
library(tidyr)


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union




The [`dplyr`](https://dplyr.tidyverse.org/) package has useful functions for manipulating tibbles (dataframes), such as sorting, filtering, and editing columns. The [`tidyr`](https://tidyr.tidyverse.org/) package has functions that help us tidy or reformat dataframes.

<img src="https://www.tidyverse.org/css/images/hex/dplyr.png" width=138>
<img src="https://www.tidyverse.org/css/images/hex/tidyr.png" width=138>

### Setup Part 3: Check the Data Loads

Finally, check that you can access the dataset we'll be using in this session. The `starwars` dataset is a dataset built into R that contains details of characters from the Star Wars films:

In [3]:
print(starwars)

[90m# A tibble: 87 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Luke Sk~    172    77 blond      fair       blue            19   male  mascu~
[90m 2[39m C-3PO       167    75 [31mNA[39m         gold       yellow         112   none  mascu~
[90m 3[39m R2-D2        96    32 [31mNA[39m         white, bl~ red             33   none  mascu~
[90m 4[39m Darth V~    202   136 none       white      yellow          41.9 male  mascu~
[90m 5[39m Leia Or~    150    49 brown      light      brown           19   fema~ femin~
[90m 6[39m Owen La~    178   120 brown, gr~ light      blue            52   male  mascu~
[90m 7[39m Beru Wh~    165    75 brown      light      blue          

This snapshot shows an example of tidy data - a philosophy of organising data such that each observation (*character*) has a single row, with all variables tied to that character as a single column.

Now that we're all set up, let's start a-wrangling...

## `arrange()`

We can use the `arrange()` function to sort by variables in the dataframe. For example, we can arrange all characters in order of height (shortest to tallest) like so:

In [4]:
arrange(starwars, height)

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Yoda,66,17,white,green,brown,896,male,masculine,,Yoda's species,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi",,
Ratts Tyerell,79,15,none,"grey, blue",unknown,,male,masculine,Aleen Minor,Aleena,The Phantom Menace,,
Wicket Systri Warrick,88,20,brown,brown,brown,8,male,masculine,Endor,Ewok,Return of the Jedi,,
Dud Bolt,94,45,none,"blue, grey",yellow,,male,masculine,Vulpter,Vulptereen,The Phantom Menace,,
R2-D2,96,32,,"white, blue",red,33,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Finn,,,black,dark,dark,,male,masculine,,Human,The Force Awakens,,
Rey,,,brown,light,hazel,,female,feminine,,Human,The Force Awakens,,
Poe Dameron,,,brown,light,brown,,male,masculine,,Human,The Force Awakens,,T-70 X-wing fighter
BB8,,,none,none,black,,none,masculine,,Droid,The Force Awakens,,


In [5]:
arrange(starwars, height) |> print()

[90m# A tibble: 87 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Yoda         66    17 white      green      brown            896 male  mascu~
[90m 2[39m Ratts T~     79    15 none       grey, blue unknown           [31mNA[39m male  mascu~
[90m 3[39m Wicket ~     88    20 brown      brown      brown              8 male  mascu~
[90m 4[39m Dud Bolt     94    45 none       blue, grey yellow            [31mNA[39m male  mascu~
[90m 5[39m R2-D2        96    32 [31mNA[39m         white, bl~ red               33 none  mascu~
[90m 6[39m R4-P17       96    [31mNA[39m none       silver, r~ red, blue         [31mNA[39m none  femin~
[90m 7[39m R5-D4        97    32 [31mN

The `arrange()` function, like most `dplyr` verb functions, takes the dataframe (`starwars`) as its first argument, and the variable names (e.g., `height`) as subsequent arguments.

In height order, we can see that Yoda is the shortest character, at 66 cm, with podracer [Ratts Tyerell](https://starwars.fandom.com/wiki/Ratts_Tyerell) next shortest, at 79 cm.

We can also arrange the dataframe in *descending* order, with the *`desc()`* function.

In [6]:
arrange(starwars, desc(height))

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Yarael Poof,264,,none,white,yellow,,male,masculine,Quermia,Quermian,The Phantom Menace,,
Tarfful,234,136,brown,brown,blue,,male,masculine,Kashyyyk,Wookiee,Revenge of the Sith,,
Lama Su,229,88,none,grey,black,,male,masculine,Kamino,Kaminoan,Attack of the Clones,,
Chewbacca,228,112,brown,unknown,blue,200,male,masculine,Kashyyyk,Wookiee,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",AT-ST,"Millennium Falcon, Imperial shuttle"
Roos Tarpals,224,82,none,grey,orange,,male,masculine,Naboo,Gungan,The Phantom Menace,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Finn,,,black,dark,dark,,male,masculine,,Human,The Force Awakens,,
Rey,,,brown,light,hazel,,female,feminine,,Human,The Force Awakens,,
Poe Dameron,,,brown,light,brown,,male,masculine,,Human,The Force Awakens,,T-70 X-wing fighter
BB8,,,none,none,black,,none,masculine,,Droid,The Force Awakens,,


In [7]:
arrange(starwars, desc(height)) |> print()

[90m# A tibble: 87 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Yarael ~    264    [31mNA[39m none       white      yellow          [31mNA[39m   male  mascu~
[90m 2[39m Tarfful     234   136 brown      brown      blue            [31mNA[39m   male  mascu~
[90m 3[39m Lama Su     229    88 none       grey       black           [31mNA[39m   male  mascu~
[90m 4[39m Chewbac~    228   112 brown      unknown    blue           200   male  mascu~
[90m 5[39m Roos Ta~    224    82 none       grey       orange          [31mNA[39m   male  mascu~
[90m 6[39m Grievous    216   159 none       brown, wh~ green, y~       [31mNA[39m   male  mascu~
[90m 7[39m Taun We     213   

This shows us that long-necked Jedi, [Yarael Poof](https://starwars.fandom.com/wiki/Yarael_Poof), is the tallest character, at 264 cm.

As well as sorting by numeric variables, we can also sort by character variables. For instance, we can sort by hair colour. If we do this we will sort hair alphabetically, from "auburn" to "white".

In [8]:
arrange(starwars, hair_color)

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Mon Mothma,150,,auburn,fair,blue,48.0,female,feminine,Chandrila,Human,Return of the Jedi,,
Wilhuff Tarkin,180,,"auburn, grey",fair,blue,64.0,male,masculine,Eriadu,Human,"Revenge of the Sith, A New Hope",,
Obi-Wan Kenobi,182,77.0,"auburn, white",fair,blue-gray,57.0,male,masculine,Stewjon,Human,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",Tribubble bongo,"Jedi starfighter , Trade Federation cruiser, Naboo star skiff , Jedi Interceptor , Belbullab-22 starfighter"
Biggs Darklighter,183,84.0,black,light,brown,24.0,male,masculine,Tatooine,Human,A New Hope,,X-wing
Boba Fett,183,78.2,black,fair,brown,31.5,male,masculine,Kamino,Human,"The Empire Strikes Back, Attack of the Clones , Return of the Jedi",,Slave 1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
C-3PO,167,75,,gold,yellow,112,none,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R2-D2,96,32,,"white, blue",red,33,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
R5-D4,97,32,,"white, red",red,,none,masculine,Tatooine,Droid,A New Hope,,
Greedo,173,74,,green,black,44,male,masculine,Rodia,Rodian,A New Hope,,


In [9]:
arrange(starwars, hair_color) |> print()

[90m# A tibble: 87 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Mon Mot~    150  [31mNA[39m   auburn     fair       blue            48   fema~ femin~
[90m 2[39m Wilhuff~    180  [31mNA[39m   auburn, g~ fair       blue            64   male  mascu~
[90m 3[39m Obi-Wan~    182  77   auburn, w~ fair       blue-gray       57   male  mascu~
[90m 4[39m Biggs D~    183  84   black      light      brown           24   male  mascu~
[90m 5[39m Boba Fe~    183  78.2 black      fair       brown           31.5 male  mascu~
[90m 6[39m Lando C~    177  79   black      dark       brown           31   male  mascu~
[90m 7[39m Watto       137  [31mNA[39m   black      blue, grey yell

Finally, we can sort by multiple variables at once with comma-separated statements. For instance, we may want to sort by hair colour alphabetically, and then by height descendingly:

In [10]:
arrange(starwars, hair_color, desc(height))

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Mon Mothma,150,,auburn,fair,blue,48,female,feminine,Chandrila,Human,Return of the Jedi,,
Wilhuff Tarkin,180,,"auburn, grey",fair,blue,64,male,masculine,Eriadu,Human,"Revenge of the Sith, A New Hope",,
Obi-Wan Kenobi,182,77,"auburn, white",fair,blue-gray,57,male,masculine,Stewjon,Human,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",Tribubble bongo,"Jedi starfighter , Trade Federation cruiser, Naboo star skiff , Jedi Interceptor , Belbullab-22 starfighter"
Bail Prestor Organa,191,,black,tan,brown,67,male,masculine,Alderaan,Human,"Attack of the Clones, Revenge of the Sith",,
Gregar Typho,185,85,black,dark,brown,,male,masculine,Naboo,Human,Attack of the Clones,,Naboo fighter
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Jabba Desilijic Tiure,175,1358,,"green-tan, brown",orange,600,hermaphroditic,masculine,Nal Hutta,Hutt,"The Phantom Menace, Return of the Jedi, A New Hope",,
Greedo,173,74,,green,black,44,male,masculine,Rodia,Rodian,A New Hope,,
C-3PO,167,75,,gold,yellow,112,none,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R5-D4,97,32,,"white, red",red,,none,masculine,Tatooine,Droid,A New Hope,,


In [11]:
arrange(starwars, hair_color, desc(height)) |> print()

[90m# A tibble: 87 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Mon Mot~    150  [31mNA[39m   auburn     fair       blue            48   fema~ femin~
[90m 2[39m Wilhuff~    180  [31mNA[39m   auburn, g~ fair       blue            64   male  mascu~
[90m 3[39m Obi-Wan~    182  77   auburn, w~ fair       blue-gray       57   male  mascu~
[90m 4[39m Bail Pr~    191  [31mNA[39m   black      tan        brown           67   male  mascu~
[90m 5[39m Gregar ~    185  85   black      dark       brown           [31mNA[39m   male  mascu~
[90m 6[39m Biggs D~    183  84   black      light      brown           24   male  mascu~
[90m 7[39m Boba Fe~    183  78.2 black      fair 

### Check your Knowledge!

Try coming up with code to solve the following:

1. Sort descendingly by mass, such that the most massive character comes first.

2. Sort by gender, eye colour, and then height. Which character is the first observation? Why?

## `filter()`

The `filter()` function is useful for subsetting data. For example, we can easily filter our dataset to find all our data on Darth Vader. The following code says that we should *filter* the dataframe called *starwars* to only include rows where the variable called `name` has the value `"Darth Vader"`:

In [12]:
filter(starwars, name=="Darth Vader")

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Darth Vader,202,136,none,white,yellow,41.9,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1


In [13]:
filter(starwars, name=="Darth Vader") |> print()

[90m# A tibble: 1 x 14[39m
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m Darth Va~    202   136 none       white      yellow          41.9 male  mascu~
[90m# i 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m


<img src="https://media1.giphy.com/media/Uu4WP50jNo1uZeor4t/giphy.gif" width=250>

<sub><sup>[via giphy](https://media.giphy.com/media/Uu4WP50jNo1uZeor4t/giphy.gif)</sup></sub>

To find all characters that are *not* Darth Vader, we can use `!=`, which stands for "does not equal." This returns all rows in the dataframe where the `name` column does not have the value `"Darth Vader"`.

In [14]:
filter(starwars, name!="Darth Vader")

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Luke Skywalker,172,77,blond,fair,blue,19,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
C-3PO,167,75,,gold,yellow,112,none,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R2-D2,96,32,,"white, blue",red,33,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
Leia Organa,150,49,brown,light,brown,19,female,feminine,Alderaan,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",Imperial Speeder Bike,
Owen Lars,178,120,"brown, grey",light,blue,52,male,masculine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Rey,,,brown,light,hazel,,female,feminine,,Human,The Force Awakens,,
Poe Dameron,,,brown,light,brown,,male,masculine,,Human,The Force Awakens,,T-70 X-wing fighter
BB8,,,none,none,black,,none,masculine,,Droid,The Force Awakens,,
Captain Phasma,,,unknown,unknown,unknown,,,,,,The Force Awakens,,


In [15]:
filter(starwars, name!="Darth Vader") |> print()

[90m# A tibble: 86 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Luke Sk~    172    77 blond      fair       blue            19   male  mascu~
[90m 2[39m C-3PO       167    75 [31mNA[39m         gold       yellow         112   none  mascu~
[90m 3[39m R2-D2        96    32 [31mNA[39m         white, bl~ red             33   none  mascu~
[90m 4[39m Leia Or~    150    49 brown      light      brown           19   fema~ femin~
[90m 5[39m Owen La~    178   120 brown, gr~ light      blue            52   male  mascu~
[90m 6[39m Beru Wh~    165    75 brown      light      blue            47   fema~ femin~
[90m 7[39m R5-D4        97    32 [31mNA[39m         white, red red 

We can also filter to include a list of characters. To do this, we first define a character vector of characters we wish to keep. We can then filter to only include characters whose `name` is in (`%in%`) that vector:

In [16]:
cool_droids <- c("C-3PO", "R2-D2", "IG-88")
filter(starwars, name %in% cool_droids)

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
C-3PO,167,75,,gold,yellow,112,none,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R2-D2,96,32,,"white, blue",red,33,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
IG-88,200,140,none,metal,red,15,none,masculine,,Droid,The Empire Strikes Back,,


In [17]:
filter(starwars, name %in% cool_droids) |> print()

[90m# A tibble: 3 x 14[39m
  name  height  mass hair_color skin_color  eye_color birth_year sex   gender   
  [3m[90m<chr>[39m[23m  [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m    
[90m1[39m C-3PO    167    75 [31mNA[39m         gold        yellow           112 none  masculine
[90m2[39m R2-D2     96    32 [31mNA[39m         white, blue red               33 none  masculine
[90m3[39m IG-88    200   140 none       metal       red               15 none  masculine
[90m# i 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m


As with `==` and `!=`, we can invert `%in%` to only include characters who are *not* in the list. To do this, we put an exclamation mark (`!`) at the *start* of the statement:

In [18]:
filter(starwars, !(name %in% cool_droids))

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Luke Skywalker,172,77,blond,fair,blue,19.0,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
Darth Vader,202,136,none,white,yellow,41.9,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1
Leia Organa,150,49,brown,light,brown,19.0,female,feminine,Alderaan,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",Imperial Speeder Bike,
Owen Lars,178,120,"brown, grey",light,blue,52.0,male,masculine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
Beru Whitesun lars,165,75,brown,light,blue,47.0,female,feminine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Rey,,,brown,light,hazel,,female,feminine,,Human,The Force Awakens,,
Poe Dameron,,,brown,light,brown,,male,masculine,,Human,The Force Awakens,,T-70 X-wing fighter
BB8,,,none,none,black,,none,masculine,,Droid,The Force Awakens,,
Captain Phasma,,,unknown,unknown,unknown,,,,,,The Force Awakens,,


In [19]:
filter(starwars, !(name %in% cool_droids)) |> print()

[90m# A tibble: 84 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Luke Sk~    172    77 blond      fair       blue            19   male  mascu~
[90m 2[39m Darth V~    202   136 none       white      yellow          41.9 male  mascu~
[90m 3[39m Leia Or~    150    49 brown      light      brown           19   fema~ femin~
[90m 4[39m Owen La~    178   120 brown, gr~ light      blue            52   male  mascu~
[90m 5[39m Beru Wh~    165    75 brown      light      blue            47   fema~ femin~
[90m 6[39m R5-D4        97    32 [31mNA[39m         white, red red             [31mNA[39m   none  mascu~
[90m 7[39m Biggs D~    183    84 black      light      brown         

The `filter()` function can also deal with numeric values. For instance, we can filter to only include characters who are shorter than, or are exactly, 96 cm tall. To do this we use `<=`, which stands for "less than or equal to", or "≤".

In [20]:
filter(starwars, height<=96)

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
R2-D2,96,32.0,,"white, blue",red,33.0,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
Yoda,66,17.0,white,green,brown,896.0,male,masculine,,Yoda's species,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi",,
Wicket Systri Warrick,88,20.0,brown,brown,brown,8.0,male,masculine,Endor,Ewok,Return of the Jedi,,
Dud Bolt,94,45.0,none,"blue, grey",yellow,,male,masculine,Vulpter,Vulptereen,The Phantom Menace,,
Ratts Tyerell,79,15.0,none,"grey, blue",unknown,,male,masculine,Aleen Minor,Aleena,The Phantom Menace,,
R4-P17,96,,none,"silver, red","red, blue",,none,feminine,,Droid,"Attack of the Clones, Revenge of the Sith",,


In [21]:
filter(starwars, height<=96) |> print()

[90m# A tibble: 6 x 14[39m
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m R2-D2         96    32 [31mNA[39m         white, bl~ red               33 none  mascu~
[90m2[39m Yoda          66    17 white      green      brown            896 male  mascu~
[90m3[39m Wicket S~     88    20 brown      brown      brown              8 male  mascu~
[90m4[39m Dud Bolt      94    45 none       blue, grey yellow            [31mNA[39m male  mascu~
[90m5[39m Ratts Ty~     79    15 none       grey, blue unknown           [31mNA[39m male  mascu~
[90m6[39m R4-P17        96    [31mNA[39m none       silver, r~ red, blue         [31mNA[39m none  femin~
[90m# i 5 more variables: homeworld <chr>

Finally, as in other `dplyr` functions, we can combine multiple comma-separated statements in one use of the `filter()` function. Here we filter to only include characters who:
* come from Tatooine
* are at least 100 cm tall
* are human

In [22]:
filter(starwars, homeworld=="Tatooine", height>=100, species=="Human")

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Luke Skywalker,172,77.0,blond,fair,blue,19.0,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
Darth Vader,202,136.0,none,white,yellow,41.9,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1
Owen Lars,178,120.0,"brown, grey",light,blue,52.0,male,masculine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
Beru Whitesun lars,165,75.0,brown,light,blue,47.0,female,feminine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
Biggs Darklighter,183,84.0,black,light,brown,24.0,male,masculine,Tatooine,Human,A New Hope,,X-wing
Anakin Skywalker,188,84.0,blond,fair,blue,41.9,male,masculine,Tatooine,Human,"Attack of the Clones, The Phantom Menace , Revenge of the Sith","Zephyr-G swoop bike, XJ-6 airspeeder","Trade Federation cruiser, Jedi Interceptor , Naboo fighter"
Shmi Skywalker,163,,black,fair,brown,72.0,female,feminine,Tatooine,Human,"Attack of the Clones, The Phantom Menace",,
Cliegg Lars,183,,brown,fair,blue,82.0,male,masculine,Tatooine,Human,Attack of the Clones,,


In [23]:
filter(starwars, homeworld=="Tatooine", height>=100, species=="Human") |> print()

[90m# A tibble: 8 x 14[39m
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m Luke Sky~    172    77 blond      fair       blue            19   male  mascu~
[90m2[39m Darth Va~    202   136 none       white      yellow          41.9 male  mascu~
[90m3[39m Owen Lars    178   120 brown, gr~ light      blue            52   male  mascu~
[90m4[39m Beru Whi~    165    75 brown      light      blue            47   fema~ femin~
[90m5[39m Biggs Da~    183    84 black      light      brown           24   male  mascu~
[90m6[39m Anakin S~    188    84 blond      fair       blue            41.9 male  mascu~
[90m7[39m Shmi Sky~    163    [31mNA[39m black      fair       brown           72   fema