<img src = './subsetting.png'/>

In [21]:
library(tidyverse)

# Simplifying vs. Preserving Subsetting

* **Simplifying**: Return the simplest data structure that can represent the output
* **Preserving**: The output has the same data sructure as the input (when you you `drop = FASLE`,it's preserving)

In [3]:
# atomic vector
# preserving is the same as simplifying
letters[[2]]

letters[2]

In [10]:
# list
# preserving return a list
# simplifying return a vector

player <- list(name = 'VN pikachu', level= 31)

print(player[['name']]) # simplifying, return a vector

[1] "VN pikachu"


In [8]:
print(player['name'])   # preserving, return a list

$name
[1] "VN pikachu"



In [18]:
mat <- matrix(1:12, nrow = 3)

mat

mat[, 2]             # simplifying, return a vector

mat[, 2, drop = F]   # preserving, return a matrix of 1 column

0,1,2,3
1,4,7,10
2,5,8,11
3,6,9,12


0
4
5
6


In [14]:
iris[, 'Species', drop = F]  # preserving, return a data.frame
iris[, 'Species', drop = T]  # simplying, return a factor

Species
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa
setosa


# Data Frame subsetting

Data Frame posseses characteristics of both:
* List
* Matrices

Subset with a single vector, behave like list

In [26]:
# if you give a single vector to subset a data frame then it will behave like list
iris[c(1, 2)]  %>% str()

'data.frame':	150 obs. of  2 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...


Subset with 2 vectors, behave like matrices

In [27]:
# If you give 2 vectors to subset a data frame, then it will behave like matrices
iris[, c(1, 2)]  %>% str()

'data.frame':	150 obs. of  2 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...


# Subsetting operator

* `$` does partial matching (`x$y` is equivalent to `x[['y', exact = F]`)`
* `[[` does exact matching  (`x[['y']` is equivalent to `x[['y', exact = T]`)

In [29]:
player <- list(name = 'VN Pikachu')

In [31]:
player$na

In [32]:
player[['na']]

NULL

# Array and Matrices

 both matrices and arrays are just vectors with special attributes, you can subset them with a single vector, as if they were a 1D vector. Note that arrays in R are stored in column-major order:

In [1]:
values <- matrix(1:12, nrow = 3)

values

0,1,2,3
1,4,7,10
2,5,8,11
3,6,9,12


In [2]:
values[c(5, 9)]

# Missing and out-of-boundary indicies

`row[[col]]`|Zero-length	|OOB (int)	|OOB (chr)	|Missing
------------|---------------|-----------|-----------|-------
Atomic	    |Error	        |Error	    |Error	    |Error
List	    |Error	        |Error	    |NULL	    |NULL
NULL	    |NULL	        |NULL	    |NULL	    |NULL

In [1]:
# atomic vector
values <- 1:5

# zero length
values[[integer(0)]]

# out of bound integer indices
values[[6]]

# missing
values[[]]


ERROR: Error in values[[integer(0)]]: attempt to select less than one element in get1index


# Subsetting with `purrr::pluck` and `purrr::chuck`

Read the document

# Subsetting with nothing

Subsetting with nothing can be useful with assignment because it preserves the structure of the original object. Compare the following two expressions. In the first, `mtcars` remains a data frame because you are only changing the contents of `mtcars`, not `mtcars` itself. In the second, `mtcars` becomes a list because you are changing the object it is bound to.

In [1]:
mtcars[] <- lapply(mtcars, as.integer)
is.data.frame(mtcars)
#> [1] TRUE

mtcars <- lapply(mtcars, as.integer)
is.data.frame(mtcars)

In [2]:
# change all the values of a matrix to 0
values <- matrix(1:12, nrow = 3)

values[] <- 0   # we can't do values <- 0

values

0,1,2,3
0,0,0,0
0,0,0,0
0,0,0,0
