In [3]:
library(tidyverse)

* **`all_of()`**
* **`any_of()`**
* **`contains()`**
* **`ends_with()`**
* **`everything()`**
* **`last_col()`**
* **`matches()`**
* **`num_range()`**
* **`one_of()`**
* **`starts_with()`**
* **`where()`**

# Select variables from character vectors

```r
all_of(x)

any_of(x, ..., vars = NULL)
```

### all_of()

In [2]:
#To refer to these variables in selecting function, use all_of():

vars <- c('Sepal.Length', 'Sepal.Width')
#This will raise a message
iris %>% select(vars) %>% head()

Note: Using an external vector in selections is ambiguous.
i Use `all_of(vars)` instead of `vars` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.


Sepal.Length,Sepal.Width
5.1,3.5
4.9,3.0
4.7,3.2
4.6,3.1
5.0,3.6
5.4,3.9


In [3]:
#Use this instead
iris %>% select(all_of(vars)) %>% head()

Sepal.Length,Sepal.Width
5.1,3.5
4.9,3.0
4.7,3.2
4.6,3.1
5.0,3.6
5.4,3.9


If any of the variable is missing from the data frame, that's an error:

In [6]:
#starwars %>% select(vars)

### any_of()

Use `any_of()` to allow missing variables:

In [7]:
vars <- c('cyl', 'Sepal.Length', 'Sepal.Width', 'displ')

iris %>% select(any_of(vars)) %>% head()

Sepal.Length,Sepal.Width
5.1,3.5
4.9,3.0
4.7,3.2
4.6,3.1
5.0,3.6
5.4,3.9


`any_of()` is especially useful to remove variables from a data frame because calling it again does not cause an error:

In [9]:
iris %>% select(-any_of(vars)) %>% head()
#NOTE: use iris %>% select(!vars) will raise an error

Petal.Length,Petal.Width,Species
1.4,0.2,setosa
1.4,0.2,setosa
1.3,0.2,setosa
1.5,0.2,setosa
1.4,0.2,setosa
1.7,0.4,setosa


# Select all variables or the last variable

* **`everything()`** selects all variable. It is also useful in combination with other tidyselect operators.

* **`last_col()`** selects the last variable.

### everything()

In [14]:
iris %>% select(everything()) %>% head()

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa


In [2]:
#First select Species, then select the rest
iris %>% select(Species, everything()) %>% head()

Species,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
setosa,5.1,3.5,1.4,0.2
setosa,4.9,3.0,1.4,0.2
setosa,4.7,3.2,1.3,0.2
setosa,4.6,3.1,1.5,0.2
setosa,5.0,3.6,1.4,0.2
setosa,5.4,3.9,1.7,0.4


In [15]:
mtcars %>% pivot_longer(everything())

name,value
mpg,21.000
cyl,6.000
disp,160.000
hp,110.000
drat,3.900
wt,2.620
qsec,16.460
vs,0.000
am,1.000
gear,4.000


### last_col()

In [6]:
args(last_col)

In [16]:
#Select last column
iris %>% select(last_col())

Species
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa


In [7]:
# select 1-th column for the end 
# default 0-th column for the end (i.e: the last column)
iris %>% select(last_col(offset = 1))

Petal.Width
0.2
0.2
0.2
0.2
0.2
0.4
0.3
0.2
0.2
0.1


In [17]:
iris %>% pivot_longer(last_col())

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,name,value
5.1,3.5,1.4,0.2,Species,setosa
4.9,3.0,1.4,0.2,Species,setosa
4.7,3.2,1.3,0.2,Species,setosa
4.6,3.1,1.5,0.2,Species,setosa
5.0,3.6,1.4,0.2,Species,setosa
5.4,3.9,1.7,0.4,Species,setosa
4.6,3.4,1.4,0.3,Species,setosa
5.0,3.4,1.5,0.2,Species,setosa
4.4,2.9,1.4,0.2,Species,setosa
4.9,3.1,1.5,0.1,Species,setosa


# Select variables that match a pattern

These selection helpers match variables according to a given pattern.

* **`starts_with()`**: Starts with a prefix.

* **`ends_with()`**: Ends with a suffix.

* **`contains()`**: Contains a literal string.

* **`matches()`**: Matches a regular expression.

* **`num_range()`**: Matches a numerical range like x01, x02, x03.

```R
starts_with(match, ignore.case = TRUE, vars = NULL)

ends_with(match, ignore.case = TRUE, vars = NULL)

contains(match, ignore.case = TRUE, vars = NULL)

matches(match, ignore.case = TRUE, perl = FALSE, vars = NULL)

num_range(prefix, range, width = NULL, vars = NULL)
```

In [19]:
#select columns having name starts with 'Sepal'
iris %>% select(starts_with('Sepal')) %>% head()

Sepal.Length,Sepal.Width
5.1,3.5
4.9,3.0
4.7,3.2
4.6,3.1
5.0,3.6
5.4,3.9


You can supply multiple prefixes or suffixes. Note how the order of variables depends on the order of the suffixes and prefixes:

In [21]:
iris %>% select(starts_with(c('Petal', 'Sepal'))) %>% head()

Petal.Length,Petal.Width,Sepal.Length,Sepal.Width
1.4,0.2,5.1,3.5
1.4,0.2,4.9,3.0
1.3,0.2,4.7,3.2
1.5,0.2,4.6,3.1
1.4,0.2,5.0,3.6
1.7,0.4,5.4,3.9


**`contains()`** selects columns whose names contain a word:

In [22]:
iris %>% select(contains("al")) %>% head()

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
5.1,3.5,1.4,0.2
4.9,3.0,1.4,0.2
4.7,3.2,1.3,0.2
4.6,3.1,1.5,0.2
5.0,3.6,1.4,0.2
5.4,3.9,1.7,0.4


Using regex:

In [24]:
iris %>% select(matches("[pt]al")) %>% head()

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
5.1,3.5,1.4,0.2
4.9,3.0,1.4,0.2
4.7,3.2,1.3,0.2
4.6,3.1,1.5,0.2
5.0,3.6,1.4,0.2
5.4,3.9,1.7,0.4


In [26]:

df <- data.frame(x4 = 0, x5 = 0, x6 = 0, x7 = 0, x8 = 0)
df

x4,x5,x6,x7,x8
0,0,0,0,0


In [28]:
#Matches x5, x6, x7
df %>% select(num_range('x', 5:7))

x5,x6,x7
0,0,0


# Select variables with a function

This selection helper selects the variables for which a function returns TRUE.

```r
where(fn)
```

# Examples

select factor columns

In [4]:
iris %>% select(where(is.factor)) %>% head()

Species
setosa
setosa
setosa
setosa
setosa
setosa


In [5]:
iris %>% str()

'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
