In [6]:
library(tidyverse)

# Subset columns using their names and types

## Overview of selection features

Tidyverse selections implement a dialect of R where operators make it easy to select variables:

* `:` for selecting a range of consecutive variables.

* `!` for taking the complement of a set of variables.

* `&` and `|` for selecting the intersection or the union of two sets of variables.

* `c()` for combining selections.

In addition, you can use **selection** helpers. Some helpers select specific columns:

* `everything()`: Matches all variables.

* `last_col()`: Select last variable, possibly with an offset.

These helpers select variables by matching patterns in their names:

* `starts_with()`: Starts with a prefix.

* `ends_with()`: Ends with a suffix.

* `contains()`: Contains a literal string.

* `matches()`: Matches a regular expression.

* `num_range()`: Matches a numerical range like x01, x02, x03.

These helpers select variables from a **character vector**:

* `all_of()`: Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

* `any_of()`: Same as `all_of()`, except that no error is thrown for names that don't exist.

This helper selects variables with a function:

* `where()`: Applies a function to all variables and selects those for which the function returns TRUE.

```R
select(.data, ...)
```

# Examples

In [11]:
head(starwars)

name,height,mass,hair_color,skin_color,eye_color,birth_year,sex,gender,homeworld,species,films,vehicles,starships
Luke Skywalker,172,77,blond,fair,blue,19.0,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
C-3PO,167,75,,gold,yellow,112.0,none,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R2-D2,96,32,,"white, blue",red,33.0,none,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
Darth Vader,202,136,none,white,yellow,41.9,male,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1
Leia Organa,150,49,brown,light,brown,19.0,female,feminine,Alderaan,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",Imperial Speeder Bike,
Owen Lars,178,120,"brown, grey",light,blue,52.0,male,masculine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,


select variable by name

In [10]:
#Selcect column Sepal.Length and Sepal.Width
iris %>% select(Sepal.Length, Sepal.Width) %>% head()

Sepal.Length,Sepal.Width
5.1,3.5
4.9,3.0
4.7,3.2
4.6,3.1
5.0,3.6
5.4,3.9


In [13]:
iris %>% pivot_longer(Sepal.Length) %>% head()

Sepal.Width,Petal.Length,Petal.Width,Species,name,value
3.5,1.4,0.2,setosa,Sepal.Length,5.1
3.0,1.4,0.2,setosa,Sepal.Length,4.9
3.2,1.3,0.2,setosa,Sepal.Length,4.7
3.1,1.5,0.2,setosa,Sepal.Length,4.6
3.6,1.4,0.2,setosa,Sepal.Length,5.0
3.9,1.7,0.4,setosa,Sepal.Length,5.4


In [14]:
iris %>% pivot_longer(c(Sepal.Length, Sepal.Width)) %>% head()

Petal.Length,Petal.Width,Species,name,value
1.4,0.2,setosa,Sepal.Length,5.1
1.4,0.2,setosa,Sepal.Width,3.5
1.4,0.2,setosa,Sepal.Length,4.9
1.4,0.2,setosa,Sepal.Width,3.0
1.3,0.2,setosa,Sepal.Length,4.7
1.3,0.2,setosa,Sepal.Width,3.2


### Operators

The `:` operator selects a range of consecutive variables:

In [16]:
#select columns from `height` to `sex`
starwars %>% select(height:sex) %>% head()

height,mass,hair_color,skin_color,eye_color,birth_year,sex
172,77,blond,fair,blue,19.0,male
167,75,,gold,yellow,112.0,none
96,32,,"white, blue",red,33.0,none
202,136,none,white,yellow,41.9,male
150,49,brown,light,brown,19.0,female
178,120,"brown, grey",light,blue,52.0,male


The `!` operator negates a selection:

In [17]:
#select columns that are not from `height` to `sex`
starwars %>% select(!height:sex) %>% head()

name,gender,homeworld,species,films,vehicles,starships
Luke Skywalker,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens","Snowspeeder , Imperial Speeder Bike","X-wing , Imperial shuttle"
C-3PO,masculine,Tatooine,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope",,
R2-D2,masculine,Naboo,Droid,"The Empire Strikes Back, Attack of the Clones , The Phantom Menace , Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",,
Darth Vader,masculine,Tatooine,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope",,TIE Advanced x1
Leia Organa,feminine,Alderaan,Human,"The Empire Strikes Back, Revenge of the Sith , Return of the Jedi , A New Hope , The Force Awakens",Imperial Speeder Bike,
Owen Lars,masculine,Tatooine,Human,"Attack of the Clones, Revenge of the Sith , A New Hope",,


In [19]:
#select columns whose names are not Sepal.Width, Petal.Width
iris %>% select(!c(Sepal.Width, Petal.Width)) %>% head()

Sepal.Length,Petal.Length,Species
5.1,1.4,setosa
4.9,1.4,setosa
4.7,1.3,setosa
4.6,1.5,setosa
5.0,1.4,setosa
5.4,1.7,setosa


In [20]:
#select columns whose name do not end with 'Width'
iris %>% select(!ends_with('Width')) %>% head()

Sepal.Length,Petal.Length,Species
5.1,1.4,setosa
4.9,1.4,setosa
4.7,1.3,setosa
4.6,1.5,setosa
5.0,1.4,setosa
5.4,1.7,setosa


`&` and `|` take the intersection or the union of two selections:

In [22]:
#Select column having name start with 'Sepal' and ends with 'Length'
iris %>% select(starts_with('Sepal') & ends_with('Length')) %>% head()

Sepal.Length
5.1
4.9
4.7
4.6
5.0
5.4


In [4]:
#select numeric columns
iris %>% select(where(is.numeric)) %>% head()

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
5.1,3.5,1.4,0.2
4.9,3.0,1.4,0.2
4.7,3.2,1.3,0.2
4.6,3.1,1.5,0.2
5.0,3.6,1.4,0.2
5.4,3.9,1.7,0.4


# Arguments

### `...`	

`<tidy-select>` One or more unquoted expressions separated by commas. Variable names can be used as if they were positions in the data frame, so expressions like `x:y` can be used to select a range of variables

In [8]:
#select columns having name ends with 'length' and column Species
iris %>% select(ends_with('Length'), Species) %>% head()

Sepal.Length,Petal.Length,Species
5.1,1.4,setosa
4.9,1.4,setosa
4.7,1.3,setosa
4.6,1.5,setosa
5.0,1.4,setosa
5.4,1.7,setosa
