# Preparation

### Quick Note
This is unrelated to the current project (Google Merchandise analysis). This is just my initial attempt on learning R that's a waste if not uploaded

### Configuring R notebook on VS Code
1. Make sure you already installed R, Python, and Jupyter extension
2. Create or open an "ipynb" file (you already did it by reading this)
3. Select a Python kernel first (top right side) and VS Code will install the dependencies for you
4. Next, run `R` console in VS Code terminal and run these 2 commands:
    ```
    install.packages('IRkernel')
    IRkernel::installspec()
    ```
5. After successfull installation, Ctrl + Shift + P on VS Code and choose/write "Reload Window"
6. Now, change the notebook kernel from Python to R
7. Install any other required dependencies if prompted

## Concept
Main references:
- https://r4ds.hadley.nz/
- http://adv-r.hadley.nz/
- https://cran.r-project.org/doc/manuals/R-intro.html
- https://cran.r-project.org/doc/manuals/R-lang.html

### Basic variable type

- integer = `1L`, `2L`, etc (without L it will be double)
- numeric (double) = `1.5`, `2.0`, `3.4`, etc
- complex (imaginary) = `5i`, `7 + 3i`, etc
- logical = `TRUE`, `FALSE`, `NA`
- null = `NULL` (it's an object of its own, unlike `NA` which is a possible value of `logical` type)
- character = `'hello'`, `"world"`, etc (can be single or double quotes; empty string is not `NULL` or `NA`)
- c (atomic vector)  = `c(1L, 2L, 3L)`, `c(1:10)` (can only contain a single type and not nested)
- list (non-atomic vector) = `list('abc', 123L, 4.56)` (can contain multiple types and be nested)
- matrix/array = explained at later section

General:
- R variable must start with a character, and accept `.` and `_` as valid part of variable
- Print variable: `print(x)` or just `x` (`print` can't be used with `+`/`,` to combine string directly)
- Check variable type: `class(x)` or `typeof(x)`, `is.integer(x)`, `is.atomic(x)`, etc
- Check if variable identical: `identical(x,y)` (true if both have same type and value)
- Convert variable type: `as.integer(x)`, `as.double(x)`, etc
- Check possible attributes of an object: `attributes(x)`
- List all variables: `ls()` or `objects()`
- Remove variables: `rm(x, y, z)`

Character (string):
- Check character length: `nchar(x)`
- Trim extra whitespaces: `trimws(x)`
- Convert char to raw ASCII (in hex): `charToRaw(x)`
- Combine variables: `paste('Hello', 1234L)` or `paste0('Hello', 1234L)` (result will always be character, with the surrounding quotes)
- Combine variables (raw): `cat(c('My', 'name', 'is', 'Andi'))` (no quote, the resulting class will be `NULL` if checked)

In [120]:
# ----------
# Combine variable test

# Simple text
test <- 'Hello\nworld'
# Combine variables (space separated)
# The "\n" text will also appear on the output
paste(test, 1234L)
# Same as above but not space separated
paste0(test, 1234L)
# Combine variable (raw)
# The "\n" will be a real newline
cat(test, 1234L)

# Vector of text
test <- c('Hello', 'world')
# When string is part of vector, paste can produce different outputs
# It depends on the notebook output type (plain/markdown/html?)
# For markdown it will be:
#   1. 'Hello'
#   2. 'world'
paste(test)
class(paste(test))
# Will be raw output "Hello world" (without quotes)
# Notice that the class will be NULL
cat(test)
class(cat(test))

Hello
world 1234

Hello worldHello world

### Atomic vector (`c`)
- The **data type will always be the same** for all elements (the type that can convert/include all elements will be used)
- The members will be **flattened to 1 dimension**, hence some people believed `c` stands for "combine"
- There is also another type of vector called `list` (will be explained later)
- Empty vector (no member) is categorized as `NULL`, but vector with empty string is still `string`

Tips:
- R index starts from 1, not 0
- The check the vector length, use `length(x)`
- To create an initial vector with fixed length, use `vector(type, length)` or `rep(value,length)`
- To check if something is an (atomic) vector, use `is.vector(x, mode = type)` (type can be `integer`, etc)

In [186]:
# Standard declaration
# Can also be string or other primitive types
test <- c(1L, 2L, 3L, 4L, 5L)
test
class(test)

# Faster declaration for integer range
# No need to specify the L if using the X:Y format
# Will always be integer, unless joined with other value
test <- c(1:5)
test
class(test)

# Same as above but with more flexibility
# "from" 1 "to" 5 "by" 1
# Without L will be numeric (double)
test <- seq(1L, 5L, 1L)
test
class(test)

# Initialization with fixed value and length
# Will fill the vector with 5 zero-integer-values
test <- rep(0L, 5)
test
class(test)

# Same as above but longer syntax
test <- vector('integer', 5)
test
class(test)

In [187]:
# ----------
# Data type consistency

# Will be numeric (double), notice the 6 without L
test <- c(1:5, 6)
test
class(test)

# Will be integer again if using 6L
test <- c(1:5, 6L)
test
class(test)

# Will be character (string)
test <- c(1L, 2L, '3', '4', 5L)
test
class(test)

# Will be numeric (double)
test <- c(1L, 2.0, 3L, 4L, 5L)
test
class(test)

# Will be numeric (TRUE = 1)
test <- c(TRUE, 2.0, 3L)
test
class(test)

# Will be character (TRUE = 'TRUE')
# Since character can contain all other types
test <- c(TRUE, 2.0, 3L, '4')
test
class(test)

In [189]:
# ----------
# Nested element test

# Will be flattened into 1 dimension (1-4)
test <- c(1L, 2L, c(3L, 4L))
test
class(test)

# Will also be flattened into 1 dimension (1-4)
# Even though both element types are the same (vector)
test <- c(c(1L, 2L), c(3L, 4L))
test
class(test)

# Will be 1 dimension "list" of different types (non-atomic vector)
# That is: '1' 2 3 '4'
test <- c(list('1', 2L), list(3L, '4'))
test
class(test)

# Will also be 1 dimension "list", but same element types (integer)
# Even if all types are the same, list is still non-atomic
test <- c(list(1L, 2L), list(3L, 4L))
test
class(test)

# The correct method to convert list to atomic vector
# Since list is also a vector, we need to use the "mode" parameter
test <- c(list(1L, 2L), list(3L, 4L))
as.vector(test, mode = 'integer')
class(as.vector(test, mode = 'integer'))

In [124]:
# ----------
# Type test

test <- c(1:5)

# True
is.atomic(test)
is.vector(test)
is.integer(test)

# False
is.array(test)
is.list(test)
is.matrix(test)

# Conclusion:
# - Atomic vector is not a list/matrix/array
# - Atomic vector also inherits the original data type

### Non-atomic vector (list)
- List is also a vector with type list (`vector('list', x)`)
- The members can have multiple data types
- Can be nested (accept both `list` and `vector` as member element)

Tips:
- The check the vector length, use `length(x)`
- To check the type of each list element, use `str(x)` (also work for nested `list`)
- To create an empty list with fixed length, use `vector('list', length)`
- To check if something is a list, use `is.vector(x, mode = 'list')`

In [148]:
# Standard declaration
# Can be nested like below
test <- list(c(1L,2L), c(3L,4L), 5L)
class(test)
str(test)

# True
is.vector(test)
is.list(test)

# False
is.atomic(test)
is.integer(test)
is.array(test)
is.matrix(test)

# Conclusion:
# - List is also a vector but not atomic
# - List is not an array/matrix

List of 3
 $ : int [1:2] 1 2
 $ : int [1:2] 3 4
 $ : int 5


### Matrix/array
- Matrix/array is simply an atomic vector with the attribute `dim` and optionally `dimnames` attached to the vector ([reference](https://cran.r-project.org/doc/manuals/R-lang.html#Attributes-1))
- They are not a vector type though, so if checked with `class` the result will be both `matrix` and `array` (2 classes)
- Although 2D, it can be used for 1 dimension data by setting the row/col number to 1
- Both 1D and 2D indexing can be used (e.g. to access last element of 2x5 matrix you can use either `x[10]` or `x[2,5]`)
- Usually used for integer and numeric (double), but can be used for other data type such as character

Tips:
- To check the matrix size (rows and cols), use `dim(x)`

In [132]:
# Matrix uses "nrow" and "ncol" argument
# The order is column-wise (unless using "byrow = FALSE")
#   11 13 15 17 19
#   12 14 16 18 20
test <- matrix(11:20, nrow = 2, ncol = 5)
test
class(test)

# Array uses "dim" argument
# It doesn't have "byrow" argument unlike matrix
test <- array(11:20, dim = c(2, 5))
test
class(test)

# True
identical(
  matrix(1:10, 2, 5),
  array(1:10, c(2, 5))
)

# True
is.atomic(test)
is.integer(test)
is.array(test)
is.matrix(test)

# False
is.vector(test)
is.list(test)

0,1,2,3,4
11,13,15,17,19
12,14,16,18,20


0,1,2,3,4
11,13,15,17,19
12,14,16,18,20


### Named vector (dict/associative array)
- There are 3 examples, each for c, matrix/array, and list
- There is also a separate section for slicing and indexing

In [231]:
# ----------
# Atomic vector (c)

# Declare a named vector
# Like a dict in Python or associative array in JS
test <- c(a = 11, b = 12, c = 13, e = 14, f = 15)
test
# Alternative way
test <- setNames(c(11:15), c('a', 'b', 'c', 'd', 'e'))
# Another alternative way
test <- c(11:15)
names(test) <- c('a', 'b', 'c', 'd', 'e')

# Will return a new vector where value is more than 11
# If not named index, the removed index will be replaced by the index above it
test[test > 11]
# Will return the index of the vector where value is more than 11
which(test > 11)
# Will return the name of the vector where value is more than 11
names(test[test > 11])
names(which(test > 11))

# To revert the named index and make it a normal vector
names(test) <- NULL
test

In [248]:
# ----------
# Matrix/array

# Create a 2x5 matrix
#   11 13 15 17 19
#   12 14 16 18 20
test <- matrix(11:20, nrow = 2, ncol = 5)
# Set the row and column names
rownames(test) <- c('1st', '2nd')
colnames(test) <- c('a', 'b', 'c', 'd', 'e')
test

# Alternative way using "dimnames" argument
test <- matrix(11:20, nrow = 2, ncol = 5,
  dimnames = list(
    c('1st', '2nd'),
    c('a', 'b', 'c', 'd', 'e')
  )
)

# Another alternative way
test <- matrix(11:20, nrow = 2, ncol = 5)
dimnames(test) <- list(c('1st', '2nd'), c('a', 'b', 'c', 'd', 'e'))

# Get all elements of '1st' row
test['1st',]
# Get all elements of 'a' column
test[,'a']
# Get the value of '2nd' row and 'a' column
test['2nd','a']
# Same as above but using row index instead of name
test[2,'a']

# To revert the named index and make it a normal matrix
# You can also use "rownames" and "colnames"
dimnames(test) <- NULL
test

Unnamed: 0,a,b,c,d,e
1st,11,13,15,17,19
2nd,12,14,16,18,20


Unnamed: 0,a,b,c,d,e
1,11,13,15,17,19
2,12,14,16,18,20


In [None]:
# ----------
# List

# Declare a named vector
# Like a dict in Python or associative array in JS
test <- c(a = 1, b = 2, c = 3, e = 4, f = 5)
# Alternative way (assign the name later)
test <- c(11:15)
names(test) <- c('a', 'b', 'c', 'd', 'e')

# Will return a new vector where value is more than 11
# If not named index, the removed index will be replaced by the index above it
test[test > 11]
# Will return the index of the vector where value is more than 11
which(test > 11)
# Will return the name of the vector where value is more than 11
names(test[test > 11])
names(which(test > 11))

### Vector indexing and slicing
- There are 3 examples, each for c, matrix/array, and list

In [127]:
# ----------
# Atomic vector (c)

test <- c(11:15)

# Will return a new vector where value is more than 11
test[test > 11]
match(test > 11, test)

# Access the first element (index start from 1)
test[1]
# All except index 1 (RIP Python dev)
test[-1]
# From index 1 to 3
test[1:3]
# From index 3 to 1 (reverse)
test[3:1]
# All except index 1 to 3
test[-(1:3)]
# Index 2 and 4 only
test[c(2,4)]

# Not sure why, but R treats : first over +
# Even though the documentation says otherwise
# Below means from index 2 to 4, add +1 to each value
test[1+2:4]
# Below is the real "from index 3 to 4"
test[(1+2):4]

# What the fuck?
# Too lazy to find it the logic behind it
test[1-2:4]

In [141]:
# ----------
# Matrix/array

# Create a 2x5 matrix
#   11 13 15 17 19
#   12 14 16 18 20
test <- matrix(11:20, 2, 5)
test

# Get the last element (20)
test[10]
test[2,5]
# Get the first row and its columns
# 11 13 15 17 19
test[1,]
# Get the first column and its rows
# 11 12
test[,1]
# Get the first 4 columns from row 1
# 11 13 15 17 19
test[1,1:4]

0,1,2,3,4
11,13,15,17,19
12,14,16,18,20


In [149]:
# ----------
# List

# The index:
#   1 = c(1L,2L)
#   2 = c(3L,4L)
#   3 = 5L
test <- list(c(1L,2L), c(3L,4L), 5L)
str(test)

# Equal to list(c(3,4))
# Aka a list containing just the 2nd vector
test[2]
# Equal to c(3,4)
# Aka the 2nd vector only (without the parent list)
test[[2]]
# Equal to 4
# Aka the 2nd element of the 2nd vector (without the parent list/vector)
test[[2]][2]

# Equal to list(5)
# Aka a list containining the 3rd element
test[3]
# Equal to 5
# Aka the 3rd element itself
test[[3]]

List of 3
 $ : int [1:2] 1 2
 $ : int [1:2] 3 4
 $ : int 5
