<a href="https://colab.research.google.com/github/ZhenYuan2002/R/blob/main/Object_and_Functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Objects and Functions in R**

- Objects and their names
- Object types
- Data types
- Atomic vectors
- Lists
- Coercion
- S3 atomic vectors
- S3 lists
- Missing values
- Time-series-specific objects
- Functions in R
- Comparisons
- Conditions
- Conditional execution
- Iterations

In R, functions are designed to work with specific object types. They may have strict input/output object requirements. Functions from other packages are not necessarily optimized for specific use cases. You may need to perform additional steps to prepare a dataset before passing an input to a function. Similarly, the result from a function may require post-processing befoe you can analyze/report it. Hence, basic knowledge about R objects, data types, and how to write functions in R is required.

You can access R's help file for any function by using ? or help functions in R. For example, ?typeof or help(typeof) will take you to the relevant documentation of the typeof function in R.


# **Objects and their names**



---
**Creating an object**

In R, we first create an object and then assign a name to that object.

A vector of numbers is created using either of the following functions:
- c(), short for combination to create a vector
- :, compact, memory-efficient way of alternative representation that applies to certain vector types

Assignment operators is used to bind an object to its name
1) <- (MOST COMMON)
2) <<-
3) =

The assignment operators <- and <<- are bi-directional, meaning that the object and its name can change sides as long as the arrow is pointing towards the name of the object.

---
**Naming convention**

- A name can consist of letters, digits, . or _
- A name should not start with _ or a digit
- A name cannot include reserved keywords
- No white space between combinations of letters, digits, . and _
- Avoid using double quotes in names and identifiers



In [None]:
# Creating an object
x1 <- c(1,2,3,4,5)
c(1,2,3,4,5) -> x2
x3 <- 1:5
x1
x2
x3

In [None]:
# Naming convention
.x <- 1:5
x_y <- 1:5
.x
x_y

# **Object Types**

Objects in R can be loosely defined into two groups: base objects and objects used for Object-Oriented Programming (OOP).

There are various Object-Oriented (OO) systems in R, such as S3, R6 and S4, with the first one being the most common.

The metadata of an object is stored in attributes, which can be considered as name-value pairs of an object.

Names and dimensions are two common attributes, which are preserved with object.

Different types of objects and data structures are created by adding various attributes to a base object.

# **Data Types**

Vectors are foundational building blocks of nearly all data structures in R. The simplest form of a vector is a scalar or a one-element vector, which represents a single value.

Vector is an umbrella data type that has two different families of base objects underneath it: atomic vectors and lists. These two broad types of base vectors are further subdivided based on data types and their structure.

---
**Atomic Vectors**

An atomic vector is a fundamental data structure that contains elements of only one data type. Common data types are as follows:

- Numeric
- Double
- Integer
- Logical
- Character

---
**Lists**

A list can contain elements of different types and structures. It serves as the foundation for more complex objects, such as data frames. By applying specific attributes to a base object, you can construct more complex data structures, such as matrices (two-dimensional rectangular structures) and arrays (multi-dimensional generalizations of matrices). OO objects behave differently from regular base objects when passed to a generic function. An S3 object is built on top of base objects by assigning class attributes:

- S3 atomic vectors
- Factors
- Dates
- Date-times (POSIXct/POSIXlt)
- S3 lists
- Data frames
- Tibbles



# **Atomic Vectors**

In an atomic vector, all elements must be of the same data type. Atomic vectors have four types: integer, double, logical and character. The first two are commonly known as numeric vectors.

typeof() returns the type of an object

The following functions test whether a vector is of a particular type.
- is.double()
- is.integer()
- is.logical()
- is.character()

attr() sets an individual attribute to an object. It can also be used to retrieve an individual attribute of an object.

structure() sets multiple attributes of an object in one function call.

attributes() retrieves attributes of an object that is already set.

---
**Double**

Double covers decimals, scientific, and hexadecimal numbers. The special values Inf (infinity), -Inf (negative infinity), and NaN (Not a Number) can be added in double atomic vectors.

**Integer**

Integer vectors only represent whole numbers. The trailing L is a must, else the numbers are considered as doubles.

**Logical**

Logical vectors have TRUE or FALSE entries. The abbreviated forms, T and F, are accepted too. Most mathematical functions work on logical vectors, coercing TRUE to 1 and FALSE to 0 before applying that function.

Although the logical constants TRUE and FALSE are reserved keywords, T and F are not reserved, and can be reassigned to other values. When T and F are not explicitly defined in the workspace, R interprets them as logical values equivalent to TRUE and FALSE. However, if you assign a different value to T or F, their behaviour changes accordingly.

**Character**

Strings are a combination of letters and numbers when surrounded by either double quotes or single quotes. When a vector has multiple strings, it is called a character vector.

When special characters such as \t (tab) and \n (newlines) are used, then the thewriteLines() function processes those more appropriately than the print() function. Backslash-escaping is required when double quotes are used inside a double-quoted string.

The nchar() function counts the characters in each element of a character vector. Note that white spaces are also counted as characters.



In [None]:
# Double Vector
double_vector <- c(7, 10.0, 19.025)
double_vector # Returns the values of the vector
typeof(double_vector) # Returns the type of the vector
is.double(double_vector) # Check whether the vector is of double type
is.integer(double_vector) # Check whether the vector is of integer type

# Set the names attributes to the object double_vector
attr(double_vector, 'names') <- c('a', 'b', 'c')
double_vector # Returns names and values
attr(double_vector, 'names') # Return names attributes from double_vector object


In [None]:
# Integer Vector
integer_vector <- c(0L, 7L, 11L)
integer_vector # Returns the values of the vector
typeof(integer_vector) # Returns the type of the vector
is.integer(integer_vector) # Returns whether the vector is of integer type

In [None]:
# Logical Vector
logical_vector <- c(TRUE, FALSE, TRUE)
typeof(logical_vector) # Returns the type of the vector
is.logical(logical_vector) # Returns whether the vector is of logical type
which(logical_vector) # Returns indices of TRUE
abs(logical_vector) # Returns TRUE as 1 and FALSE as 0
sum(logical_vector) # Add ups all TRUE (1) values
mean(logical_vector) # Returns the average of the vector

In [None]:
# Character Vector
character_vector <- c('this is', 'a', 'section of a book')
typeof(character_vector)
is.character(character_vector)
nchar(character_vector)

character_vector2 <- c("This book\t discussed \n\"Time Series Forecasting\".")
print(character_vector2)
writeLines(character_vector2)
nchar(character_vector2)

[1] "This book\t discussed \n\"Time Series Forecasting\"."
This book	 discussed 
"Time Series Forecasting".


# **Lists**

Lists are generic vectors in R. In lists, each element can be of any data type. This is the main distinction compared to an atomic vector, in which all elements must be of the same type.

A list is created using the list() function.

is.list() function checks whether an object is a list or not.

A common practice is to first create an empty list and then adding items sequentially in it. This is an useful method to store multiple outputs from a function.

The str() function displays the internal structure of an R object, and is used to inspect the structure of a list, especially if the list is a nested one.

The glimpse() function also shows the internal data as much as possible.

---
**Matrix**

A matrix represents vectors in a two-dimensional data structure. The dimension (dim) attribute transforms a vector to a matrix by passing a vector of size 2, specifying the number of rows and columns, respectively.

By default, the dim function fills the matrix column-wise using the elements from a vector.

The easiest way to create a matrix is using the matrix() function along with specifying nrow and ncol options.

The row names and column names can be specified. either using rownames() and colnames(), or by passing a list in the dimnames() function.

To create an identity matrix, you can use the shorthand diag().

---
**Array**

An array is a multidimensional data structure and a more generalized version of a matrix. A three-dimensional array can be created by passing a vector specifying dimensions inside the array() function.

If you have an array object and want to know its dimensions, use dim(array).



In [None]:
# Creating a list of different data types
list1 <- list(double_vector, integer_vector, logical_vector, character_vector)
list1

In [None]:
# Create an empty list and assign list items sequentially
empty_list <- list()
empty_list[[1]] <- 1:5
empty_list[[2]] <- c(letters[1:4])
empty_list

In [None]:
# Display internal structure of an R object
str(list1)

List of 4
 $ : Named num [1:3] 7 10 19
  ..- attr(*, "names")= chr [1:3] "a" "b" "c"
 $ : int [1:3] 0 7 11
 $ : logi [1:3] TRUE FALSE TRUE
 $ : chr [1:3] "this is" "a" "section of a book"


In [None]:
install.packages('dplyr')
library(dplyr)

Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




In [None]:
# Display internal structure of an R object
glimpse(list1)

List of 4
 $ : Named num [1:3] 7 10 19
  ..- attr(*, "names")= chr [1:3] "a" "b" "c"
 $ : int [1:3] 0 7 11
 $ : logi [1:3] TRUE FALSE TRUE
 $ : chr [1:3] "this is" "a" "section of a book"


In [None]:
list2 <- list(list1, list(seq(0,20,5), c(TRUE, TRUE, FALSE)))
glimpse(list2)

List of 2
 $ :List of 4
  ..$ : Named num [1:3] 7 10 19
  .. ..- attr(*, "names")= chr [1:3] "a" "b" "c"
  ..$ : int [1:3] 0 7 11
  ..$ : logi [1:3] TRUE FALSE TRUE
  ..$ : chr [1:3] "this is" "a" "section of a book"
 $ :List of 2
  ..$ : num [1:5] 0 5 10 15 20
  ..$ : logi [1:3] TRUE TRUE FALSE


In [None]:
# Transform a vector to a matrix
a <- 1:9
dim(a) <- c(3,3) # Specify a matrix of 3x3
a

attributes(a) # Returns the attribute, i.e. the dimension
is.matrix(a) # Returns whether the object is a matrix

0,1,2
1,4,7
2,5,8
3,6,9


In [None]:
# Using matrix() function along with specifying nrow and ncol options

a_matrix <- matrix(c(rep(TRUE,3), rep(TRUE,3), rep(FALSE,3)), nrow=3, ncol=3)
a_matrix

attributes(a_matrix)

0,1,2
True,True,False
True,True,False
True,True,False


In [None]:
rownames(a) <- c('a', 'b', 'c')
colnames(a) <- paste0('col_', 1:3)
a

dimnames(a_matrix) <- list(paste0('row_', 1:3), paste0('col_', 1:3))
a_matrix

Unnamed: 0,col_1,col_2,col_3
a,1,4,7
b,2,5,8
c,3,6,9


Unnamed: 0,col_1,col_2,col_3
row_1,True,True,False
row_2,True,True,False
row_3,True,True,False


In [None]:
# diag() to create an identity matrix
diag(3)

0,1,2
1,0,0
0,1,0
0,0,1


In [None]:
# Create a 3-dimensional array by passing a vector specifying dimensions

an_array <- array(1:18, c(3,3,2))
an_array
is.array(an_array)
dim(an_array)

# **Coercion**

RECALL: The rule for a vector object in R is that all elements must be of the same type.

Coercion occurs when two different data types are mixed into one vector. When different types of atomic vectors are combined deliberately, then they follow a fixed order of implicit coercion:

character -> double -> integer -> logical

---
**Explicit Coercion**

Explicit coercion involves coercing one type of vector into another type. The as.character(), as.double(), as.integer() and as.logical() functions are used for explicit coercion.

A mixed vector generates NA and warnings when it fails to coerce.


In [None]:
# Proof that integer overwrites logical
mixed_vector1 <- c(1L, TRUE, FALSE, 5L)
typeof(mixed_vector1)

# Proof that character overwrites double
mixed_vector2 <- c("Pen", 5.30, TRUE)
typeof(mixed_vector2)

In [None]:
# Explicit Coercion
as.double(c(TRUE, FALSE, FALSE))
as.double(c(TRUE, FALSE, "Paper", 1.5))

“NAs introduced by coercion”


# **S3 Atomic Vectors**

An S3 object allows for flexible and extensible OOP in R. S3 is used to create custom data structures that are tailored to specific needs.

In S3, an object must have a class attribute that describes the type of data stored in that object. The class attribute is set and retrieved by the class() function.

In time series analysis, we need some commonly used S3 objects that are based on base atomic vectors. Factors, dates and date-times are three common S3 atomic vectors.

---
**Factors**

A factor is internally an integer atomic vector with two attributes: class and levels. It stores categorical data and contains predefined values.

Created via the factor() function.

Purposes:
- Evaluates how many categories there are
- Remembers which category a data point belongs to
- Produces a vector of strings associated with the names of each of the categories

An ordered factor has an internal hierarchy in the factor levels, created using either the ordered() or factor() function. The levels can be made pretty while plotting, or more descriptive names can be supplied via the label arguments in either function.

cut() function can be used to convert a numeric vector to a factor.

To validate that a factor is actually an S3 object, you can use the otype() function from the sloop package.

---
Dates


