-
Notifications
You must be signed in to change notification settings - Fork 8
/
20-vectors.Rmd
275 lines (185 loc) · 9.63 KB
/
20-vectors.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
```{r setup20, message = FALSE, warning = FALSE, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
library(tidyverse)
library(ggplot2)
library(dplyr)
library(tidyr)
library(nycflights13)
library(babynames)
library(nasaweather)
library(lubridate)
library(purrr)
library(readr)
```
# Ch. 20: Vectors
```{block2, type='rmdimportant'}
**Key questions:**
* 20.3.5, #1
* 20.4.6. #1, 4, 5
```
```{block2, type='rmdtip'}
**Functions and notes:**
```
*Types of vectors, not including augmented types:*
```{r, echo=FALSE, fig.align = "default", fig.show='hold', out.width = "50%"}
knitr::include_graphics("data-structures-overview.png")
```
* Check special value types: `is.finite`, `is.infinite`, `is.na`, `is.nan`
![](check-special.png)
* `typeof` retruns type of vector
* `length` returns length of vector
* `pryr::object_size` view size of object stored
* specific `NA` values can be defined explicitly with `NA_integer_`, `NA_real_`, `NA_character_` (usually don't need to know)
* explicitly differentiate integers from doubles with `10L` v `10`
* explicit coersion functions: `as.logical`, `as.integer`, `as.double`, `as.character`, or use `col_[types]` when reading in so that coersion done at source
* test functions from `purrr` package that are more consistent than base R's
*Purrr versions for testing types:*
```{r, echo=FALSE, fig.align = "default", fig.show='hold'}
knitr::include_graphics("test-type.jpg")
```
* `set_names` lets you set names after the fact, e.g. `set_names(1:3, c("a", "b", "c"))`
* For more details on subsetting: http://adv-r.had.co.nz/Subsetting.html#applications
* `str` checks structure (excellent for use with lists)
* `attr` and `attributes` get and set attributes
* main types of attributes: **Names**, **Dimensions/dims**, **Class**. Class is important for object oriented programming which is covered in more detail here: http://adv-r.had.co.nz/OO-essentials.html#s3
* Used together to investigate details of code for functions
* `useMethod` in function syntax indicates it is a generic function
* `methods` lists all methods within a generic
* `getS3method` to show specific implementation of a method
```{r}
as.Date
methods("as.Date")
getS3method("as.Date", "default")
```
* Some tidyverse functions are not always easy to unpack with just the above^[For example try typing in `?select`, `methods("select")`, `getS3method("select", "data.frame")`, or `dplyr:::select_impl` into the console and the output may not immediately illuminate for you how `select()` works. If interested in more advanced topics, see [Advanced R Progamming](https://adv-r.hadley.nz/).]
* **Augmented vectors**: vectors with additional attributes, e.g. factors (levels, class = factors), dates and datetimes (tzone, class = (POSIXct, POSIXt)), POSIXlt (names, class = (POSIXLt, POSIXt)), tibbles (names, class = (tbl_df, tbl, data.frame), row.names) -- in the integer, double and double, list, list types.
* data frames only have class `data.frame`, whereas tibbles have `tbl_df`, and `tbl` as well
* `class` get or set class attribute
* `unclass` returns copy with 'class' attribute removed
## 20.3: Important types of atomic vector
### 20.3.5
1. Describe the difference between `is.finite(x)` and `!is.infinite(x)`.
* `is.finite` and `is.infinite` return `FALSE` for `NA` or `NaN` values, therefore these values become `TRUE` when negated as in the latter case, e.g.:
```{r}
is.finite(c(6,11,-Inf, NA, NaN))
!is.infinite(c(6,11,-Inf, NA, NaN))
```
1. Read the source code for `dplyr::near()` (Hint: to see the source code, drop the `()`). How does it work?
* safer way to test equality of floating point numbers (as has some tolerance for differences caused by rounding)
* it checks if the difference between the value is within `tol` which by default is `.Machine$double.eps^0.5`
1. A logical vector can take 3 possible values. How many possible values can an integer vector take? How many possible values can a double take? Use google to do some research.
* Part of the point here is that it's not 'infinite' like someone may be tempted to answer -- it's constrained by memory of the machine
* For integer it is 2 * 2 * 10^9
* For double it is 2 * 2 * 10^308
1. Brainstorm at least four functions that allow you to convert a double to an integer. How do they differ? Be precise.
* `as.integer`, `as.factor` (technically is going to a factor -- but this class is built on top of integers), `round`, `floor`, `ceiling`, these last 3 though do not change the type, which would remain a type double
1. What functions from the `readr` package allow you to turn a string into logical, integer, and double vector?
* The appropriate `parse_*` or `col_*` functions
## 20.4: Using atomic vectors
### 20.4.6
1. What does `mean(is.na(x))` tell you about a vector `x`? What about `sum(!is.finite(x))`?
* percentage that are `NA` or `NaN`
* number that are either `Inf`, `-Inf`, `NA` or `NaN`
1. Carefully read the documentation of `is.vector()`. What does it actually test for? Why does `is.atomic()` not agree with the definition of atomic vectors above?
* `is.vector` tests if it is a specific type of vector with no attributes other than names. This second requirement means that any augmented vectors such as factors, dates, tibbles all would return false.
* `is.atomic` returns TRUE to `is.atomic(NULL)` despite this representing the empty set.
1. Compare and contrast `setNames()` with `purrr::set_names()`.
* both assign names after fact
* `purrr::set_names` is stricter and returns an error in situations like the following where as `setNames` does not:
```{r, error = TRUE}
setNames(1:4, c("a"))
set_names(1:4, c("a"))
```
1. Create functions that take a vector as input and returns:
```{r}
x <- c(-3:14, NA, Inf, NaN)
```
1. The last value. Should you use `[` or `[[`?
```{r}
return_last <- function(x) x[[length(x)]]
return_last(x)
```
1. The elements at even numbered positions.
```{r}
return_even <- function(x) x[((1:length(x)) %% 2 == 0)]
return_even(x)
```
1. Every element except the last value.
```{r}
return_not_last <- function(x) x[-length(x)]
return_not_last(x)
```
1. Only even numbers (and no missing values).
```{r}
#only even and not na
return_even_no_na <- function(x) x[((1:length(x)) %% 2 == 0) & !is.na(x)]
return_even_no_na(x)
```
1. Why is `x[-which(x > 0)]` not the same as `x[x <= 0]`?
```{r}
x[-which(x > 0)] #which only reports the indices of the matches, so specifies all to be removed
x[x <= 0] #This method reports T/F'sNaN is converted into NA
```
* in the 2nd instance, `NaN`s will get converted to `NA`
1. What happens when you subset with a positive integer that's bigger than the length of the vector? What happens when you subset with a name that doesn't exist?
* In both cases you get back an `NA` (though it seems to take longer in the case when subsetting by a name that doesn't exist).
## 20.5: Recursive vectors (lists)
Example of subsetting items from a list:
```{r}
a <- list(a = 1:3, b = "a string", c = pi, d = list(c(-1,-2), -5))
a[[4]][[1]]
# equivalent alternatives:
# a$d[[1]]
# a[4][[1]][[1]]
```
* 3 ways of subsetting `[]`, `[[]]`, `$`
### 20.5.4.
```{r}
a <- list(a = 1:3, b = "a string", c = pi, d = list(-1, -5))
```
1. Draw the following lists as nested sets:
1. `list(a, b, list(c, d), list(e, f))`
1. `list(list(list(list(list(list(a))))))`
* I did not conform with Hadley's square v rounded syntax, but hopefully this gives a sense of what the above are:
![drawings 1 and 2 for 20.5.4.](nested-lists.png)
1. What happens if you subset a tibble as if you're subsetting a list? What are the key differences between a list and a tibble?
* Dataframe is just a list of vectors (columns) -- with the restriction that each column has the same number of elements whereas lists do not have this requirement
* Dataframe structure better connects elements by row structure, making subsetting by the qualities of these values much easier
## 20.7: Augmented vectors
### 20.7.4
1. What does `hms::hms(3600)` return?
```{r}
x <- hms::hms(3600)
```
How does it print?
```{r}
print(x)
```
What primitive type is the augmented vector built on top of?
```{r}
typeof(x)
```
What attributes does it use?
```{r}
attributes(x)
```
1. Try and make a tibble that has columns with different lengths. What happens?
* if the column is length one it will repeat for the length of the other column(s), otherwise if it is not the same length it will return an error
```{r, error = TRUE}
tibble(x = 1:4, y = 5:6) #error
tibble(x = 1:5, y = 6) #can have length 1 that repeats
```
1. Based on the definition above, is it ok to have a list as a
column of a tibble?
* Yes, as long as the number of elements align with the other length of the other columns -- this will come-up a lot in the modeling chapters.
## Appendix
### Subsetting nested lists
```{r}
x <- list("a", list(list("c", "d"), "e2"), list("e", "f"))
x
```
It can be confusing how to subset items in a nested list, lists output helps tell you the subsetting needed to extract particular items. For example, to output `list("c", "d")` requires `x[[2]][[1]]`, to output just `d` requires `x[[2]][[1]][[2]]`
*Subset nested lists:*
```{r, echo=FALSE, fig.align = "default", fig.show='hold'}
knitr::include_graphics("subsetting-lists-note.png")
```