# Data Wrangling 1

<h5>

**Wrangling** /ˈræŋ.ɡəl.ɪŋ/

the activity of taking care of, controlling, or moving animals, especially large animals such as cows or horses

</h5>

([Cambridge Dictionary](https://dictionary.cambridge.org/dictionary/english/wrangling))

![Cattle Wrangler - image from https://commons.wikimedia.org/wiki/File:Pioneer_Day_Wrangler.jpg](https://upload.wikimedia.org/wikipedia/commons/thumb/8/83/Pioneer_Day_Wrangler.jpg/320px-Pioneer_Day_Wrangler.jpg)

**[Data wrangling](https://en.wikipedia.org/wiki/Data_wrangling)** commonly refers to the transformation of data from one "input" format (e.g., `.csv` files from an experiment), to a different format (e.g., a tidy dataframe) that is more appropriate to the needs of an analysis. In the context of the ExPra experiments, you will use data wrangling techniques to implement the transformations and data cleaning steps specified in your preregistrations.

We will use [`tidyverse`](https://www.tidyverse.org/) packages to implement our data wrangling. The "tidyverse" is a series of packages which share a philosphy based around code and data structures (e.g., "[tibbles](https://tibble.tidyverse.org/)") that are (a) tidy, and (b) readable.

The most useful tidyverse packages for wrangling data are `dplyr` and `tidyr`. The [`dplyr`](https://dplyr.tidyverse.org/) package has useful functions for manipulating tibbles (dataframes), such as sorting, filtering, and editing columns. The [`tidyr`](https://tidyr.tidyverse.org/) package has functions that help us tidy or reformat dataframes.

<img src="https://www.tidyverse.org/css/images/hex/dplyr.png" width=138>
<img src="https://www.tidyverse.org/css/images/hex/tidyr.png" width=138>

<br>

In this session, we will cover how to use 6 very handy functions from `dplyr` package:

* [`arrange()`](#arrange)
* [`filter()`](#filter)
* [`select()`](#select)
* [`pull()`](#pull)
* [`mutate()`](#mutate)
* [`summarise()`](#summarise)

## Setup

### Setup Part 1: Install Packages

You can install the tidyverse packages like so:

```
install.packages("tidyverse")
```

This includes many packages that we won't be using today, but which will be useful in other parts of the course (e.g., we will use `ggplot2` for Data Visualisation).

````{margin}
```{warning}
Remember, you should install packages in the console - never in a script that your share with others. This is because otherwise, your script will go to the effort of reinstalling a package *every time* it is run!
````

### Setup Part 2: Check `dplyr` Loads

Now, we can test that the package we will be using today actually load. You should be able to run this code without any errors:

In [2]:
options(repr.plot.width=3.5, repr.plot.height=3, repr.matrix.max.rows=10)

In [3]:
library(dplyr)


Attaching package: 'dplyr'


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union




### Setup Part 3: Check the Data Loads

Finally, check that you can access the dataset we'll be using in this session. The `starwars` dataset is a dataset built into `dplyr` that contains details of characters from the Star Wars films:

In [4]:
print(starwars)

[90m# A tibble: 87 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Luke Sk~    172    77 blond      fair       blue            19   male  mascu~
[90m 2[39m C-3PO       167    75 [31mNA[39m         gold       yellow         112   none  mascu~
[90m 3[39m R2-D2        96    32 [31mNA[39m         white, bl~ red             33   none  mascu~
[90m 4[39m Darth V~    202   136 none       white      yellow          41.9 male  mascu~
[90m 5[39m Leia Or~    150    49 brown      light      brown           19   fema~ femin~
[90m 6[39m Owen La~    178   120 brown, gr~ light      blue            52   male  mascu~
[90m 7[39m Beru Wh~    165    75 brown      light      blue          

This snapshot shows an example of tidy data - a philosophy of organising data such that each observation (*character*) has a single row, with all variables tied to that character as a single column.

Now that we're all set up, let's start a-wrangling...

## `arrange()`

We can use the `arrange()` function to sort by variables in the dataframe.

### Numeric Variables

For example, we can arrange all characters in order of height (shortest to tallest) like so:

In [5]:
arrange(starwars, height)

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Yoda,66,17,white,green,brown,896,male,masculine,,Yoda's species,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi",,
Ratts Tyerell,79,15,none,"grey, blue",unknown,,male,masculine,Aleen Minor,Aleena,The Phantom Menace,,
Wicket Systri Warrick,88,20,brown,brown,brown,8,male,masculine,Endor,Ewok,Return of the Jedi,,
Dud Bolt,94,45,none,"blue, grey",yellow,,male,masculine,Vulpter,Vulptereen,The Phantom Menace,,
R2-D2,96,32,,"white, blue",red,33,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Finn,,,black,dark,dark,,male,masculine,,Human,The Force Awakens,,
Rey,,,brown,light,hazel,,female,feminine,,Human,The Force Awakens,,
Poe Dameron,,,brown,light,brown,,male,masculine,,Human,The Force Awakens,,T-70 X-wing fighter
BB8,,,none,none,black,,none,masculine,,Droid,The Force Awakens,,


In [6]:
arrange(starwars, height) |> print()

[90m# A tibble: 87 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Yoda         66    17 white      green      brown            896 male  mascu~
[90m 2[39m Ratts T~     79    15 none       grey, blue unknown           [31mNA[39m male  mascu~
[90m 3[39m Wicket ~     88    20 brown      brown      brown              8 male  mascu~
[90m 4[39m Dud Bolt     94    45 none       blue, grey yellow            [31mNA[39m male  mascu~
[90m 5[39m R2-D2        96    32 [31mNA[39m         white, bl~ red               33 none  mascu~
[90m 6[39m R4-P17       96    [31mNA[39m none       silver, r~ red, blue         [31mNA[39m none  femin~
[90m 7[39m R5-D4        97    32 [31mN

The `arrange()` function, like most `dplyr` verb functions, takes the dataframe (`starwars`) as its first argument, and the variable names (e.g., `height`) as subsequent arguments.

In height order, we can see that Yoda is the shortest character, at 66 cm, with podracer [Ratts Tyerell](https://starwars.fandom.com/wiki/Ratts_Tyerell) next shortest, at 79 cm.

<br><br>
We can also arrange the dataframe in *descending* order, with the *`desc()`* function.

In [7]:
arrange(starwars, desc(height))

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Yarael Poof,264,,none,white,yellow,,male,masculine,Quermia,Quermian,The Phantom Menace,,
Tarfful,234,136,brown,brown,blue,,male,masculine,Kashyyyk,Wookiee,Revenge of the Sith,,
Lama Su,229,88,none,grey,black,,male,masculine,Kamino,Kaminoan,Attack of the Clones,,
Chewbacca,228,112,brown,unknown,blue,200,male,masculine,Kashyyyk,Wookiee,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",AT-ST,"Millennium Falcon, Imperial shuttle"
Roos Tarpals,224,82,none,grey,orange,,male,masculine,Naboo,Gungan,The Phantom Menace,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Finn,,,black,dark,dark,,male,masculine,,Human,The Force Awakens,,
Rey,,,brown,light,hazel,,female,feminine,,Human,The Force Awakens,,
Poe Dameron,,,brown,light,brown,,male,masculine,,Human,The Force Awakens,,T-70 X-wing fighter
BB8,,,none,none,black,,none,masculine,,Droid,The Force Awakens,,


In [8]:
arrange(starwars, desc(height)) |> print()

[90m# A tibble: 87 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Yarael ~    264    [31mNA[39m none       white      yellow          [31mNA[39m   male  mascu~
[90m 2[39m Tarfful     234   136 brown      brown      blue            [31mNA[39m   male  mascu~
[90m 3[39m Lama Su     229    88 none       grey       black           [31mNA[39m   male  mascu~
[90m 4[39m Chewbac~    228   112 brown      unknown    blue           200   male  mascu~
[90m 5[39m Roos Ta~    224    82 none       grey       orange          [31mNA[39m   male  mascu~
[90m 6[39m Grievous    216   159 none       brown, wh~ green, y~       [31mNA[39m   male  mascu~
[90m 7[39m Taun We     213   

This shows us that long-necked Jedi, [Yarael Poof](https://starwars.fandom.com/wiki/Yarael_Poof), is the tallest character, at 264 cm.

<br>

### Character Variables

As well as sorting by numeric variables, we can also sort by character variables. For instance, we can sort by hair colour. If we do this we will sort hair alphabetically, from "auburn" to "white".

In [9]:
arrange(starwars, hair_color)

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Mon Mothma,150,,auburn,fair,blue,48.0,female,feminine,Chandrila,Human,Return of the Jedi,,
Wilhuff Tarkin,180,,"auburn, grey",fair,blue,64.0,male,masculine,Eriadu,Human,"Revenge of the Sith, A New Hope",,
Obi-Wan Kenobi,182,77.0,"auburn, white",fair,blue-gray,57.0,male,masculine,Stewjon,Human,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",Tribubble bongo,"Jedi starfighter , Trade Federation cruiser, Naboo star skiff , Jedi Interceptor , Belbullab-22 starfighter"
Biggs Darklighter,183,84.0,black,light,brown,24.0,male,masculine,Tatooine,Human,A New Hope,,X-wing
Boba Fett,183,78.2,black,fair,brown,31.5,male,masculine,Kamino,Human,"The Empire Strikes Back, Attack of the Clones , Return of the Jedi",,Slave 1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
C-3PO,167,75,,gold,yellow,112,none,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R2-D2,96,32,,"white, blue",red,33,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
R5-D4,97,32,,"white, red",red,,none,masculine,Tatooine,Droid,A New Hope,,
Greedo,173,74,,green,black,44,male,masculine,Rodia,Rodian,A New Hope,,


In [10]:
arrange(starwars, hair_color) |> print()

[90m# A tibble: 87 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Mon Mot~    150  [31mNA[39m   auburn     fair       blue            48   fema~ femin~
[90m 2[39m Wilhuff~    180  [31mNA[39m   auburn, g~ fair       blue            64   male  mascu~
[90m 3[39m Obi-Wan~    182  77   auburn, w~ fair       blue-gray       57   male  mascu~
[90m 4[39m Biggs D~    183  84   black      light      brown           24   male  mascu~
[90m 5[39m Boba Fe~    183  78.2 black      fair       brown           31.5 male  mascu~
[90m 6[39m Lando C~    177  79   black      dark       brown           31   male  mascu~
[90m 7[39m Watto       137  [31mNA[39m   black      blue, grey yell

<br>

### Combining Statements

Finally, we can sort by multiple variables at once with comma-separated statements. For instance, we may want to sort by hair colour alphabetically, and then by height descendingly:

In [11]:
arrange(starwars, hair_color, desc(height))

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Mon Mothma,150,,auburn,fair,blue,48,female,feminine,Chandrila,Human,Return of the Jedi,,
Wilhuff Tarkin,180,,"auburn, grey",fair,blue,64,male,masculine,Eriadu,Human,"Revenge of the Sith, A New Hope",,
Obi-Wan Kenobi,182,77,"auburn, white",fair,blue-gray,57,male,masculine,Stewjon,Human,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",Tribubble bongo,"Jedi starfighter , Trade Federation cruiser, Naboo star skiff , Jedi Interceptor , Belbullab-22 starfighter"
Bail Prestor Organa,191,,black,tan,brown,67,male,masculine,Alderaan,Human,"Attack of the Clones, Revenge of the Sith",,
Gregar Typho,185,85,black,dark,brown,,male,masculine,Naboo,Human,Attack of the Clones,,Naboo fighter
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Jabba Desilijic Tiure,175,1358,,"green-tan, brown",orange,600,hermaphroditic,masculine,Nal Hutta,Hutt,"The Phantom Menace, Return of the Jedi, A New Hope",,
Greedo,173,74,,green,black,44,male,masculine,Rodia,Rodian,A New Hope,,
C-3PO,167,75,,gold,yellow,112,none,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R5-D4,97,32,,"white, red",red,,none,masculine,Tatooine,Droid,A New Hope,,


In [12]:
arrange(starwars, hair_color, desc(height)) |> print()

[90m# A tibble: 87 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Mon Mot~    150  [31mNA[39m   auburn     fair       blue            48   fema~ femin~
[90m 2[39m Wilhuff~    180  [31mNA[39m   auburn, g~ fair       blue            64   male  mascu~
[90m 3[39m Obi-Wan~    182  77   auburn, w~ fair       blue-gray       57   male  mascu~
[90m 4[39m Bail Pr~    191  [31mNA[39m   black      tan        brown           67   male  mascu~
[90m 5[39m Gregar ~    185  85   black      dark       brown           [31mNA[39m   male  mascu~
[90m 6[39m Biggs D~    183  84   black      light      brown           24   male  mascu~
[90m 7[39m Boba Fe~    183  78.2 black      fair 

<br>

### Check your Knowledge!

Try coming up with code to solve the following:

1. Sort descendingly by mass, such that the most massive character comes first.

2. Sort by gender, eye colour, and then height. Which character is the first observation? Why?

## `filter()`

The `filter()` function is useful for subsetting rows of data.

<br>

### Character Variables

For example, we can easily filter our dataset to find all our data on Darth Vader. The following code says that we should *filter* the dataframe called *starwars* to only include rows where the variable called `name` has the value `"Darth Vader"`:

In [13]:
filter(starwars, name=="Darth Vader")

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Darth Vader,202,136,none,white,yellow,41.9,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1


In [14]:
filter(starwars, name=="Darth Vader") |> print()

[90m# A tibble: 1 x 14[39m
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m Darth Va~    202   136 none       white      yellow          41.9 male  mascu~
[90m# i 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m


<img src="https://media1.giphy.com/media/Uu4WP50jNo1uZeor4t/giphy.gif" width=250>

<sub><sup>[via giphy](https://media.giphy.com/media/Uu4WP50jNo1uZeor4t/giphy.gif)</sup></sub>

<br><br>
To find all characters that are *not* Darth Vader, we can use `!=`, which stands for "does not equal." This returns all rows in the dataframe where the `name` column does not have the value `"Darth Vader"`.

In [15]:
filter(starwars, name!="Darth Vader")

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Luke Skywalker,172,77,blond,fair,blue,19,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
C-3PO,167,75,,gold,yellow,112,none,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R2-D2,96,32,,"white, blue",red,33,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
Leia Organa,150,49,brown,light,brown,19,female,feminine,Alderaan,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",Imperial Speeder Bike,
Owen Lars,178,120,"brown, grey",light,blue,52,male,masculine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Rey,,,brown,light,hazel,,female,feminine,,Human,The Force Awakens,,
Poe Dameron,,,brown,light,brown,,male,masculine,,Human,The Force Awakens,,T-70 X-wing fighter
BB8,,,none,none,black,,none,masculine,,Droid,The Force Awakens,,
Captain Phasma,,,unknown,unknown,unknown,,,,,,The Force Awakens,,


In [16]:
filter(starwars, name!="Darth Vader") |> print()

[90m# A tibble: 86 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Luke Sk~    172    77 blond      fair       blue            19   male  mascu~
[90m 2[39m C-3PO       167    75 [31mNA[39m         gold       yellow         112   none  mascu~
[90m 3[39m R2-D2        96    32 [31mNA[39m         white, bl~ red             33   none  mascu~
[90m 4[39m Leia Or~    150    49 brown      light      brown           19   fema~ femin~
[90m 5[39m Owen La~    178   120 brown, gr~ light      blue            52   male  mascu~
[90m 6[39m Beru Wh~    165    75 brown      light      blue            47   fema~ femin~
[90m 7[39m R5-D4        97    32 [31mNA[39m         white, red red 


<br><br>
We can also filter to include a list of characters. To do this, we first define a character vector of characters we wish to keep. We can then filter to only include characters whose `name` is in (`%in%`) that vector:

In [17]:
cool_droids <- c("C-3PO", "R2-D2", "IG-88")
filter(starwars, name %in% cool_droids)

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
C-3PO,167,75,,gold,yellow,112,none,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R2-D2,96,32,,"white, blue",red,33,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
IG-88,200,140,none,metal,red,15,none,masculine,,Droid,The Empire Strikes Back,,


In [18]:
filter(starwars, name %in% cool_droids) |> print()

[90m# A tibble: 3 x 14[39m
  name  height  mass hair_color skin_color  eye_color birth_year sex   gender   
  [3m[90m<chr>[39m[23m  [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m    
[90m1[39m C-3PO    167    75 [31mNA[39m         gold        yellow           112 none  masculine
[90m2[39m R2-D2     96    32 [31mNA[39m         white, blue red               33 none  masculine
[90m3[39m IG-88    200   140 none       metal       red               15 none  masculine
[90m# i 5 more variables: homeworld <chr>, species <chr>, films <list>,[39m
[90m#   vehicles <list>, starships <list>[39m



<br><br>
As with `==` and `!=`, we can invert `%in%` to only include characters who are *not* in the list. To do this, we put an exclamation mark (`!`) at the *start* of the statement:

In [19]:
filter(starwars, !(name %in% cool_droids))

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Luke Skywalker,172,77,blond,fair,blue,19.0,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
Darth Vader,202,136,none,white,yellow,41.9,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1
Leia Organa,150,49,brown,light,brown,19.0,female,feminine,Alderaan,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",Imperial Speeder Bike,
Owen Lars,178,120,"brown, grey",light,blue,52.0,male,masculine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
Beru Whitesun lars,165,75,brown,light,blue,47.0,female,feminine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Rey,,,brown,light,hazel,,female,feminine,,Human,The Force Awakens,,
Poe Dameron,,,brown,light,brown,,male,masculine,,Human,The Force Awakens,,T-70 X-wing fighter
BB8,,,none,none,black,,none,masculine,,Droid,The Force Awakens,,
Captain Phasma,,,unknown,unknown,unknown,,,,,,The Force Awakens,,


In [20]:
filter(starwars, !(name %in% cool_droids)) |> print()

[90m# A tibble: 84 x 14[39m
   name     height  mass hair_color skin_color eye_color birth_year sex   gender
   [3m[90m<chr>[39m[23m     [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m 1[39m Luke Sk~    172    77 blond      fair       blue            19   male  mascu~
[90m 2[39m Darth V~    202   136 none       white      yellow          41.9 male  mascu~
[90m 3[39m Leia Or~    150    49 brown      light      brown           19   fema~ femin~
[90m 4[39m Owen La~    178   120 brown, gr~ light      blue            52   male  mascu~
[90m 5[39m Beru Wh~    165    75 brown      light      blue            47   fema~ femin~
[90m 6[39m R5-D4        97    32 [31mNA[39m         white, red red             [31mNA[39m   none  mascu~
[90m 7[39m Biggs D~    183    84 black      light      brown         

<br>

### Numeric Variables

The `filter()` function can also deal with numeric values. For instance, we can filter to only include characters who are shorter than, or are exactly, 96 cm tall. To do this we use `<=`, which stands for "less than or equal to", or "≤".

In [21]:
filter(starwars, height<=96)

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
R2-D2,96,32.0,,"white, blue",red,33.0,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
Yoda,66,17.0,white,green,brown,896.0,male,masculine,,Yoda's species,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi",,
Wicket Systri Warrick,88,20.0,brown,brown,brown,8.0,male,masculine,Endor,Ewok,Return of the Jedi,,
Dud Bolt,94,45.0,none,"blue, grey",yellow,,male,masculine,Vulpter,Vulptereen,The Phantom Menace,,
Ratts Tyerell,79,15.0,none,"grey, blue",unknown,,male,masculine,Aleen Minor,Aleena,The Phantom Menace,,
R4-P17,96,,none,"silver, red","red, blue",,none,feminine,,Droid,"Attack of the Clones, Revenge of the Sith",,


In [22]:
filter(starwars, height<=96) |> print()

[90m# A tibble: 6 x 14[39m
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m R2-D2         96    32 [31mNA[39m         white, bl~ red               33 none  mascu~
[90m2[39m Yoda          66    17 white      green      brown            896 male  mascu~
[90m3[39m Wicket S~     88    20 brown      brown      brown              8 male  mascu~
[90m4[39m Dud Bolt      94    45 none       blue, grey yellow            [31mNA[39m male  mascu~
[90m5[39m Ratts Ty~     79    15 none       grey, blue unknown           [31mNA[39m male  mascu~
[90m6[39m R4-P17        96    [31mNA[39m none       silver, r~ red, blue         [31mNA[39m none  femin~
[90m# i 5 more variables: homeworld <chr>

<br>

### Combining Statements

Finally, as in other `dplyr` functions, we can combine multiple comma-separated statements in one use of the `filter()` function. Here we filter to only include characters who:
* come from Tatooine
* are at least 100 cm tall
* are human

In [23]:
filter(starwars, homeworld=="Tatooine", height>=100, species=="Human")

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<int>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Luke Skywalker,172,77.0,blond,fair,blue,19.0,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
Darth Vader,202,136.0,none,white,yellow,41.9,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1
Owen Lars,178,120.0,"brown, grey",light,blue,52.0,male,masculine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
Beru Whitesun lars,165,75.0,brown,light,blue,47.0,female,feminine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,
Biggs Darklighter,183,84.0,black,light,brown,24.0,male,masculine,Tatooine,Human,A New Hope,,X-wing
Anakin Skywalker,188,84.0,blond,fair,blue,41.9,male,masculine,Tatooine,Human,"Attack of the Clones, The Phantom Menace , Revenge of the Sith","Zephyr-G swoop bike, XJ-6 airspeeder","Trade Federation cruiser, Jedi Interceptor , Naboo fighter"
Shmi Skywalker,163,,black,fair,brown,72.0,female,feminine,Tatooine,Human,"Attack of the Clones, The Phantom Menace",,
Cliegg Lars,183,,brown,fair,blue,82.0,male,masculine,Tatooine,Human,Attack of the Clones,,


In [24]:
filter(starwars, homeworld=="Tatooine", height>=100, species=="Human") |> print()

[90m# A tibble: 8 x 14[39m
  name      height  mass hair_color skin_color eye_color birth_year sex   gender
  [3m[90m<chr>[39m[23m      [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m 
[90m1[39m Luke Sky~    172    77 blond      fair       blue            19   male  mascu~
[90m2[39m Darth Va~    202   136 none       white      yellow          41.9 male  mascu~
[90m3[39m Owen Lars    178   120 brown, gr~ light      blue            52   male  mascu~
[90m4[39m Beru Whi~    165    75 brown      light      blue            47   fema~ femin~
[90m5[39m Biggs Da~    183    84 black      light      brown           24   male  mascu~
[90m6[39m Anakin S~    188    84 blond      fair       blue            41.9 male  mascu~
[90m7[39m Shmi Sky~    163    [31mNA[39m black      fair       brown           72   fema

## `select()`

The `select()` function is useful for subsetting *columns*.

### Selecting Specific Columns

For instance, if we only want to select the `name` and `height` columns, we can do this like so:

In [25]:
select(starwars, name, height)

name,height
<chr>,<int>
Luke Skywalker,172
C-3PO,167
R2-D2,96
Darth Vader,202
Leia Organa,150
...,...
Rey,
Poe Dameron,
BB8,
Captain Phasma,


In [26]:
select(starwars, name, height) |> print()

[90m# A tibble: 87 x 2[39m
   name               height
   [3m[90m<chr>[39m[23m               [3m[90m<int>[39m[23m
[90m 1[39m Luke Skywalker        172
[90m 2[39m C-3PO                 167
[90m 3[39m R2-D2                  96
[90m 4[39m Darth Vader           202
[90m 5[39m Leia Organa           150
[90m 6[39m Owen Lars             178
[90m 7[39m Beru Whitesun lars    165
[90m 8[39m R5-D4                  97
[90m 9[39m Biggs Darklighter     183
[90m10[39m Obi-Wan Kenobi        182
[90m# i 77 more rows[39m


<br><br>

To include all columns *except* a given column, we can use the exclamation mark (`!`) or a minus (`-`).

In [27]:
select(starwars, !height)

name,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<list>,<list>,<list>
Luke Skywalker,77,blond,fair,blue,19.0,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
C-3PO,75,,gold,yellow,112.0,none,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R2-D2,32,,"white, blue",red,33.0,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
Darth Vader,136,none,white,yellow,41.9,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1
Leia Organa,49,brown,light,brown,19.0,female,feminine,Alderaan,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",Imperial Speeder Bike,
...,...,...,...,...,...,...,...,...,...,...,...,...
Rey,,brown,light,hazel,,female,feminine,,Human,The Force Awakens,,
Poe Dameron,,brown,light,brown,,male,masculine,,Human,The Force Awakens,,T-70 X-wing fighter
BB8,,none,none,black,,none,masculine,,Droid,The Force Awakens,,
Captain Phasma,,unknown,unknown,unknown,,,,,,The Force Awakens,,


In [31]:
select(starwars, !height) |> print()

[90m# A tibble: 87 x 13[39m
   name   mass hair_color skin_color eye_color birth_year sex   gender homeworld
   [3m[90m<chr>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m      [3m[90m<chr>[39m[23m          [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m [3m[90m<chr>[39m[23m  [3m[90m<chr>[39m[23m    
[90m 1[39m Luke~    77 blond      fair       blue            19   male  mascu~ Tatooine 
[90m 2[39m C-3PO    75 [31mNA[39m         gold       yellow         112   none  mascu~ Tatooine 
[90m 3[39m R2-D2    32 [31mNA[39m         white, bl~ red             33   none  mascu~ Naboo    
[90m 4[39m Dart~   136 none       white      yellow          41.9 male  mascu~ Tatooine 
[90m 5[39m Leia~    49 brown      light      brown           19   fema~ femin~ Alderaan 
[90m 6[39m Owen~   120 brown, gr~ light      blue            52   male  mascu~ Tatooine 
[90m 7[39m Beru~    75 brown      light      blue            47   fem

<br>

### Reordering Columns

When we use `select()`, we can also specify the order of columns. For instance, we can select the columns `name`, `gender`, and `homeworld` in a new order.

In [32]:
select(starwars, homeworld, name, gender)

homeworld,name,gender
<chr>,<chr>,<chr>
Tatooine,Luke Skywalker,masculine
Tatooine,C-3PO,masculine
Naboo,R2-D2,masculine
Tatooine,Darth Vader,masculine
Alderaan,Leia Organa,feminine
...,...,...
,Rey,feminine
,Poe Dameron,masculine
,BB8,masculine
,Captain Phasma,


In [33]:
select(starwars, homeworld, name, gender) |> print()

[90m# A tibble: 87 x 3[39m
   homeworld name               gender   
   [3m[90m<chr>[39m[23m     [3m[90m<chr>[39m[23m              [3m[90m<chr>[39m[23m    
[90m 1[39m Tatooine  Luke Skywalker     masculine
[90m 2[39m Tatooine  C-3PO              masculine
[90m 3[39m Naboo     R2-D2              masculine
[90m 4[39m Tatooine  Darth Vader        masculine
[90m 5[39m Alderaan  Leia Organa        feminine 
[90m 6[39m Tatooine  Owen Lars          masculine
[90m 7[39m Tatooine  Beru Whitesun lars feminine 
[90m 8[39m Tatooine  R5-D4              masculine
[90m 9[39m Tatooine  Biggs Darklighter  masculine
[90m10[39m Stewjon   Obi-Wan Kenobi     masculine
[90m# i 77 more rows[39m


<br>

### Selecting a Range of Columns

We can say that we want to select variables in a range (e.g., from `name` until `hair_color`) with a colon (`:`).

In [29]:
select(starwars, name:hair_color)

name,height,mass,hair_color
<chr>,<int>,<dbl>,<chr>
Luke Skywalker,172,77,blond
C-3PO,167,75,
R2-D2,96,32,
Darth Vader,202,136,none
Leia Organa,150,49,brown
...,...,...,...
Rey,,,brown
Poe Dameron,,,brown
BB8,,,none
Captain Phasma,,,unknown


In [30]:
select(starwars, name:hair_color) |> print()

[90m# A tibble: 87 x 4[39m
   name               height  mass hair_color   
   [3m[90m<chr>[39m[23m               [3m[90m<int>[39m[23m [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m        
[90m 1[39m Luke Skywalker        172    77 blond        
[90m 2[39m C-3PO                 167    75 [31mNA[39m           
[90m 3[39m R2-D2                  96    32 [31mNA[39m           
[90m 4[39m Darth Vader           202   136 none         
[90m 5[39m Leia Organa           150    49 brown        
[90m 6[39m Owen Lars             178   120 brown, grey  
[90m 7[39m Beru Whitesun lars    165    75 brown        
[90m 8[39m R5-D4                  97    32 [31mNA[39m           
[90m 9[39m Biggs Darklighter     183    84 black        
[90m10[39m Obi-Wan Kenobi        182    77 auburn, white
[90m# i 77 more rows[39m


<br>

### Selection Helpers

The tidyverse also supports a range of [selection helpers](https://dplyr.tidyverse.org/reference/dplyr_tidy_select.html). For example, we can use `everything()` to select everything not already selected. This can be useful if we want to set a new first column (e.g., `homeworld`), but keep the other columns in their existing order:

In [None]:
select(starwars, homeworld, everything())

In [None]:
select(starwars, homeworld, everything()) |> print()


<br><br>
We can also use selection helpers to select variables that *start* (`starts_with()`), *end* (`ends_with()`), or *contain* (`contains()`) a given string.

Here we select variables that:
* Contain "color"
* End with the character "s"

In [34]:
select(starwars, contains("color"), ends_with("s"))

hair_color,skin_color,eye_color,mass,species,films,vehicles,starships
<chr>,<chr>,<chr>,<dbl>,<chr>,<list>,<list>,<list>
blond,fair,blue,77,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
,gold,yellow,75,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
,"white, blue",red,32,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
none,white,yellow,136,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1
brown,light,brown,49,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",Imperial Speeder Bike,
...,...,...,...,...,...,...,...
brown,light,hazel,,Human,The Force Awakens,,
brown,light,brown,,Human,The Force Awakens,,T-70 X-wing fighter
none,none,black,,Droid,The Force Awakens,,
unknown,unknown,unknown,,,The Force Awakens,,


In [35]:
select(starwars, contains("color"), ends_with("s")) |> print()

[90m# A tibble: 87 x 8[39m
   hair_color    skin_color  eye_color  mass species films  vehicles  starships
   [3m[90m<chr>[39m[23m         [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m     [3m[90m<dbl>[39m[23m [3m[90m<chr>[39m[23m   [3m[90m<list>[39m[23m [3m[90m<list>[39m[23m    [3m[90m<list>[39m[23m   
[90m 1[39m blond         fair        blue         77 Human   [90m<chr>[39m  [90m<chr [2]>[39m [90m<chr [2]>[39m
[90m 2[39m [31mNA[39m            gold        yellow       75 Droid   [90m<chr>[39m  [90m<chr [0]>[39m [90m<chr [0]>[39m
[90m 3[39m [31mNA[39m            white, blue red          32 Droid   [90m<chr>[39m  [90m<chr [0]>[39m [90m<chr [0]>[39m
[90m 4[39m none          white       yellow      136 Human   [90m<chr>[39m  [90m<chr [0]>[39m [90m<chr [1]>[39m
[90m 5[39m brown         light       brown        49 Human   [90m<chr>[39m  [90m<chr [1]>[39m [90m<chr [0]>[39m
[90m 6[39m brown, grey   light       

## `pull()`

While `select()` subsets columns in our dataframe, the output is still stored as a column in a dataframe. Instead, we may want to store the results in a more standard data format. The `pull()` function makes this easy.

In [38]:
pull(starwars, name)

<br>

This is a tidyverse equivalent to the `$` operator. For instance, another way of writing this would be:

In [39]:
starwars$name

<br>

Extracting variables in this way can be useful for passing the output to more standard R functions. In this example, we use the `unique()` function to find all unique values for the `homeworld` variable, sorted in alphabetic order.

In [41]:
hws <- pull(starwars, homeworld)
sort(unique(hws))

## `mutate()`

The `mutate()` function allows us to edit columns and add new columns in dataframes.

### Editing Existing Columns

Suppose we want to convert all values in the `species` column to be lowercase. We can do this like so, with the `tolower()` function:

In [None]:
mutate(starwars, species=tolower(species))

In [None]:
mutate(starwars, species=tolower(species)) |> print()