In [6]:
library(tidyverse)

Vectors are the most basic R data objects and there are six types of atomic vectors. They are logical, integer, double, complex, character and raw.

# Vectors

`vector` produces a vector of the given length and mode.  
`as.vector`, a generic, attempts to coerce its argument into a vector of mode mode (the default is to coerce to whichever vector mode is most convenient): if the result is atomic all attributes are removed.

`is.vector` returns TRUE if x is a vector of the specified mode having no attributes other than names. It returns FALSE otherwise.

**Usage**

```R
vector(mode = "logical", length = 0)
as.vector(x, mode = "any")
is.vector(x, mode = "any")
```

**Arguments**


`mode`	
character string naming an atomic mode or "list" or "expression" or (except for vector) "any". Currently, is.vector() allows any type (see typeof) for mode, and when mode is not "any", is.vector(x, mode) is almost the same as typeof(x) == mode.

`length`	
a non-negative integer specifying the desired length. For a long vector, i.e., length > .Machine$integer.max, it has to be of type "double". Supplying an argument of length other than one is an error.

`x`	
an R object.

In [2]:
#create an empty list of length 5
vector('list', 5)

In [5]:
#create an empty integer vector o of length 3
vector('integer', 3)

In [6]:
vector('logical', 5)

In [7]:
vector('complex', 3)

In [8]:
vector('raw', 8)

[1] 00 00 00 00 00 00 00 00

In [9]:
vector('character', 3)

In [10]:
vector('numeric', 5)

# Vector basics

There are two types of vectors:

1. **Atomic vectors**, of which there are six types: logical, integer, double, character, complex, and raw. Integer and double vectors are collectively known as numeric vectors.

2. **Lists**, which are sometimes called recursive vectors because lists can contain other lists.

The chief difference between atomic vectors and lists is that atomic vectors are **homogeneous**, while lists can be **heterogeneous**. There’s one other related object: NULL. NULL is often used to represent the absence of a vector (as opposed to NA which is used to represent the absence of a value in a vector). NULL typically behaves like a vector of length 0. Figure 20.1 summarises the interrelationships.

![](https://d33wubrfki0l68.cloudfront.net/1d1b4e1cf0dc5f6e80f621b0225354b0addb9578/6ee1c/diagrams/data-structures-overview.png)

Vectors can also contain arbitrary additional metadata in the form of attributes. These attributes are used to create **augmented vectors** which build on additional behaviour. There are three important types of augmented vector:

- Factors are built on top of integer vectors.
- Dates and date-times are built on top of numeric vectors.
- Data frames and tibbles are built on top of lists.

# Properties

Every vector has 2 key properties:

1. Its **type**, which you can determine with `typeof()`.

2. Its **length**, which you can determine with `length()`.

In [1]:
typeof(letters)

In [2]:
length(letters)

# Using atomic vector 

Now that you understand the different types of atomic vector, it’s useful to review some of the important tools for working with them. These include:

1.How to convert from one type to another, and when that happens automatically.

2.How to tell if an object is a specific type of vector.

3.What happens when you work with vectors of different lengths.

4.How to name the elements of a vector.

5.How to pull out elements of interest.

### 1. Coercion

There are 2 ways to convert a vector from 1 type to another:
1. Explicit coercion happens when you call a function like `as.logical()`, `as.integer()`, `as.double()`, or `as.character()`. Whenever you find yourself using explicit coercion, you should always check whether you can make the fix upstream, so that the vector never had the wrong type in the first place. For example, you may need to tweak your readr `col_types` specification.

2. Implicit coercion happens when you use a vector in a specific context that expects a certain type of vector. For example, when you use a logical vector with a numeric summary function, or when you use a double vector where an integer vector is expected.

In [4]:
# proportion of values greater than 2
mean(1:10 > 2)

It’s also important to understand what happens when you try and create a vector containing multiple types with `c()`: the most complex type always wins.

In [10]:
c(T, 1L) %>% typeof()

c(1L, 2.3) %>% typeof()

c(2.5, 'character') %>% typeof()

An atomic vector can not have a mix of different types because the type is a property of the complete vector, not the individual elements. If you need to mix multiple types in the same vector, you should use a list

### 2. Test function

Sometimes you want to do different things based on the type of vector. One option is to use `typeof()`. Another is to use a test function which returns a TRUE or FALSE. Base R provides many functions like `is.vector()` and `is.atomic()`, but they often return surprising results. Instead, it’s safer to use the `is_*` functions provided by `purrr`, which are summarised in the table below.

|              |lgl|int|dbl|chr|list|
|--------------|:-:|:-:|:-:|:-:|:--:|
|`is_logical`  | x |   |   |   |    |
|`is_integer`  |   | x |   |   |    |
|`is_double`   |   |   | x |   |    |
|`is_numeric`  |   | x | x |   |    |
|`is_character`|   |   |   | x |    |
|`is_atomic`   | x | x | x | x |    |
|`is_list`     |   |   |   |   | x  |
|`is_vector`   | x | x | x | x | x  |

### 3. Scalars and recycling rules

As well as implicitly coercing the types of vectors to be compatible, R will also implicitly coerce the length of vectors. This is called vector **recycling**, because the shorter vector is repeated, or recycled, to the same length as the longer vector.

This is generally most useful when you are mixing vectors and “scalars”. I put scalars in quotes because R doesn’t actually have scalars: instead, a single number is a vector of length 1. Because there are no scalars, most built-in functions are vectorised, meaning that they will operate on a vector of numbers. That’s why, for example, this code works:

In [11]:
1:10 + 5

It’s intuitive what should happen if you add two vectors of the same length, or a vector and a “scalar”, but what happens if you add two vectors of different lengths?

In [12]:
1:10 * 1:2

Here, R will expand the shortest vector to the same length as the longest, so called recycling. This is silent except when the length of the longer is not an integer multiple of the length of the shorter:

In [13]:
1:10 * 1:3

"longer object length is not a multiple of shorter object length"

While vector recycling can be used to create very succinct, clever code, it can also silently conceal problems. For this reason, the vectorised functions in tidyverse will throw errors when you recycle anything other than a scalar. If you do want to recycle, you’ll need to do it yourself with **`rep()`**:

In [14]:
1:10 * rep(1:3, length.out = 10)

### 4. Naming vector

All types of vectors can be named. You can name them during creation with `c()`:

In [15]:
c(level = 31, exp = 3156, battles = 9106)

Or after the fact with `purrr::set_names()`:

In [17]:
purrr::set_names(c(31,  3156, 9106), c('level', 'exp', 'battle'))

### 5. Subsetting

`[` is the subsetting function, and is called like `x[a]`. There are four types of things that you can subset a vector with:

1. A numeric vector containing only integers. The integers must either be all positive, all negative, or zero. Subsetting with positive integers keeps the elements at those positions:

In [20]:
letters[c(3, 5, 10)]

By repeating a position, you can actually make a longer output than input:

In [21]:
letters[c(1, 1, 3, 3, 3)]

Negative values drop the elements at the specified positions: 

In [23]:
letters[c(-1, -3, -4)]   # remove a, c, d

It’s an error to mix positive and negative values:

In [24]:
try(letters[c(1, -1)])

Error in letters[c(1, -1)] : 
  only 0's may be mixed with negative subscripts


The error message mentions subsetting with zero, which returns no values:

In [26]:
letters[0]  # character(0)

2. Subsetting with a logical vector keeps all values corresponding to a TRUE value. This is most often useful in conjunction with the comparison functions.

In [27]:
letters[letters < 'd']

3. If you have a named vector, you can subset it with a character vector:

In [28]:
named_vector <- set_names(1:26, letters)
named_vector

In [31]:
named_vector['c']

In [35]:
named_vector[c('a', 'f')]

Like with positive integers, you can also use a character vector to duplicate individual entries.

In [36]:
named_vector[c('a', 'b', 'b')]

There is an important variation of `[` called `[[`. `[[` only ever extracts a single element, and always drops names. It’s a good idea to use it whenever you want to make it clear that you’re extracting a single item, as in a for loop. The distinction between `[` and `[[` is most important for lists.

In [40]:
named_vector[['a']]

named_vector[[1]]

Acessing out of range index return `NA` for `[`:

In [41]:
letters[100]

set_names(1:26, letters)['?']

Accessing out of range index raise an error for `[[`: 

In [42]:
letters[[100]]

ERROR: Error in letters[[100]]: subscript out of bounds


# Augmented Vector

Atomic vectors and lists are the building blocks for other important vector types like factors and dates. I call these **augmented** vectors, because they are vectors with additional **attributes**, including class. Because augmented vectors have a class, they behave differently to the atomic vector on which they are built. In this book, we make use of four important augmented vectors:

- Factors
- Dates
- Date-times
- Tibbles

### 1. Factor

Factors are designed to represent categorical data that can take a fixed set of possible values. Factors are built on top of integers, and have a levels attribute:

In [1]:
x <- factor(c("ab", "cd", "ab"), levels = c("ab", "cd", "ef"))

In [2]:
typeof(x)

In [3]:
attributes(x)

In [4]:
levels(x)

class(x)

### 2. Dates and Date-time

Dates in R are numeric vectors that represent the number of days since 1 January 1970.

In [5]:
birthday <- lubridate::ymd(011006)

birthday

In [7]:
typeof(birthday)

In [6]:
attributes(birthday)

In [10]:
# number of days sine 1 January 1970

as.numeric(birthday)

Date-times are numeric vectors with class POSIXct that represent the number of seconds since 1 January 1970. (In case you were wondering, “POSIXct” stands for “Portable Operating System Interface”, calendar time.)

In [14]:
x <- lubridate::ymd_hms("19700101T01:00", truncated = 1)
x

[1] "1970-01-01 01:00:00 UTC"

In [15]:
typeof(x)

In [16]:
attributes(x)

In [18]:
# number of seconds since 1/1/1970 00:00
as.numeric(x)

There is another type of date-times called POSIXlt. These are built on top of named lists:

In [19]:
y <- as.POSIXlt(x)
y

[1] "1970-01-01 01:00:00 UTC"

In [20]:
attributes(y)

POSIXlts are rare inside the tidyverse. They do crop up in base R, because they are needed to extract specific components of a date, like the year or month. Since lubridate provides helpers for you to do this instead, you don’t need them. POSIXct’s are always easier to work with, so if you find you have a POSIXlt, you should always convert it to a regular data time `lubridate::as_date_time()`.

### 3. Tibbles

Tibbles are augmented lists: they have class “tbl_df” + “tbl” + “data.frame”, and `names` (column) and `row.names` attributes:

In [21]:
tb <- tibble::tibble(x = 1:5, y = 5:1)

In [22]:
typeof(tb)

In [23]:
attributes(tb)

The difference between a tibble and a list is that all the elements of a data frame must be vectors with the same length. All functions that work with tibbles enforce this constraint.

# Vector Creation

### Single element vector

In [1]:
TRUE

In [2]:
34L

In [3]:
3.432

In [4]:
3 + 2i

In [5]:
'VN Pikachu'

In [6]:
charToRaw('abcd')

[1] 61 62 63 64

### Multiple elements vector

#### using **`:`** 

In [7]:
1:10

In [9]:
1.2:4.3

<hr>

using **`seq()`**

```R
seq(..., from, to, by, length.out, along.with)
```

typical usage:

```R
seq(from, to)
seq(from, to, by= )
seq(from, to, length.out= )
seq(along.with= )
seq(from)
seq(length.out= )
```

In [1]:
#seq(from, to) : create a range from -> to
seq(1, 10)

In [3]:
#seq(from, to, by=)
seq(1, 10, by = 2)

In [5]:
#seq(from, to, length.out =): np.linspace(from, to, length.out)

seq(0, 1, length.out = 5)

In [7]:
#seq(from)
seq(10)

In [9]:
names = c('VN Pikachu', 'Tank Cao', 'THE BEST')

seq(along.with = names) #create a sequence with length of a vector

<hr>

using **`c()`**

**Combine Values into a Vector or List**

This is a generic function which combines its arguments.

The default method combines its arguments to form a vector. All arguments are coerced to a common type which is the type of the returned value, and all attributes except names are removed.

**Usage**

```R
c(..., recursive = FALSE, use.names = TRUE)
```

**Arguments**

```R
...	
objects to be concatenated.

recursive	
logical. If recursive = TRUE, the function recursively descends through lists (and pairlists) combining all their elements into a vector.

use.names	
logical indicating if names should be preserved.
```

In [11]:
c(1, 3.2, 34.3)

In [25]:
#create an attribute vector
c(name = 'VN Pikachu', clan = 'VN Champions')

<hr>

**attribute free**

In [35]:
#attribute-free vector: use.names = FALSE
c(name = 'VN Pikachu', clan = 'VN Champions', use.names = FALSE)

<hr>

**recursive**

In [38]:
#not recursive
c(list(A = c(B = 1)), recursive = FALSE)

In [37]:
#recursive
c(list(A = c(B = 1)), recursive = TRUE)

<hr>

**combine values**

In [49]:
#combine values into a vector
c(1:5, c(3, 2, 1), seq(20, 25))

In [50]:
#combine into a list
c(list('VN Pikachu', 31), list(c(35, 32), 'VN'))

<hr>

using **`rep()`**: Replicate Elements of Vectors and Lists

**Usage**

```R
rep(x, ...)

rep.int(x, times)

rep_len(x, length.out)
```

**Arguments**


`x`	
a vector (of any mode including a list) or a factor or (for rep only) a POSIXct or POSIXlt or Date object; or an S4 object containing such an object.

`...`	
further arguments to be passed to or from other methods. For the internal default method these can include:

`times`
an integer-valued vector giving the (non-negative) number of times to repeat each element if of length length(x), or to repeat the whole vector if of length 1. Negative or NA values are an error. A double vector is accepted, other inputs being coerced to an integer or double vector.

`length.out`
non-negative integer. The desired length of the output vector. Other inputs will be coerced to a double vector and the first element taken. Ignored if NA or invalid.

`each`
non-negative integer. Each element of x is repeated each times. Other inputs will be coerced to an integer or double vector and the first element taken. Treated as 1 if NA or invalid.
    ```

In [9]:
names <- c('VN Pikachu', 'Tank Cao', 'Meomeo888')
#repeat vector names 3 times
rep(names, 3)

In [10]:
#repeat each element of vector `name` 2 times
rep(names, each = 2)

In [11]:
#repeat first and second elements 3 times, 3rd element 2 times
rep(names, c(3, 3, 2))

In [12]:
#repeat each element 2 times, take only first 5 values
rep(names, each = 2, len = 5)

In [14]:
rep(names, each = 2, len = 10) #recycle if len 

In [16]:
#length.out = 5
rep_len(names, 5)

In [18]:
#repeat vector 2 times
rep.int(names, 2)

# Get the length of a vector

**`length()`**: Get or set the length of vectors (including lists) and factors, and of any other R object for which a method has been defined.

```R
length(x)
length(x) <- value
```

In [1]:
levels <- c(31, 35, 33)
#get the length of vector `levels`
length(levels)

In [5]:
#set the length of vector `levels` to 5
length(levels) <- 5
levels

# Checking vector

**`is.vector`**

In [16]:
is.vector(1:10)

In [17]:
is.vector(c(1, 5, 2))

# Converting to vector

**`as.vector`**

In [21]:

stats = c(31, 57611, 2799)
names(stats) <- c('Level', 'Damage', 'Exp')
stats

In [24]:
#produce attribute-free vectors
as.vector(stats)

<hr>

In [29]:
mat <- 1:4
dim(mat) <- c(2,2)
mat

0,1
1,3
2,4


In [30]:
class(mat)

In [31]:
#convert a matrix to a vector
as.vector(mat)

# Accessing vector elements

1-based index

In [11]:
day = c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
day

### Integer indexing

In [22]:
day[1]

In [23]:
day[3]

In [24]:
day[c(1,3,4)]

In [12]:
day[c(1,2,1)]

### Logical Indexing

In [26]:
day[c(TRUE, TRUE, FALSE, TRUE, FALSE, FALSE, TRUE)]

### Negative indexing

In [36]:
#select every value, except the 2-th value
day[-2]

In [35]:
#select every values, except values at 1-th and 3-th position (1-based)
day[c(-1, -3)]

### Attribute 

In [1]:
player <- c(name = 'VN Pikachu', clan = 'VN Champions', gender = 'Male', country = 'Viet Nam')
player

Accessing like a dict

In [2]:
player['name']

In [3]:
#equivalent
player[1]

In [4]:
#access multiple values
player[c('name', 'country')]

# Slicing

In [46]:
day[1:2]

In [48]:
day[1:5]

In [49]:
day[5:1]

# Manipulation

### vector Element Recycling

If we apply arithmetic operations to two vectors of unequal length, then the elements of the shorter vector are recycled to complete the operations.

In [39]:
c(1, 2) + c(1, 1, 1, 1) #equivalent c(1, 2, 1, 2) + c(1, 1, 1, 1)

### sorting

using **`sort`** function

In [40]:
values = c(1, 5, 2, 8, 6, 2, 3)
sort(values)

### concatenation

using **`c()`**

In [50]:
vector1 = c(1,2)
vector2 = c(3,4)

c(vector1, vector2)