### R Classes
R has 5 basic or "atomic" classes:
* character
* numeric
* integer
* complex
* logical (true/false)

The most basic object is a vector:
* A vector can only contain objects of the same class
* The one exception is a _list_, represented as a vector but can be different classes

Empty vectors can be created with the vector() function. Vectors have 2 arguments, the class and length.

__Numbers__ are treated as numeric objects (double-precision real numbers)
If you explicitly want an integer you need to specify the L suffix
There is a special number Inf:

In [1]:
1/Inf

The value NaN represents a not a number undefined value.

R objects can have __attributes__:
* names, dimnames (dimension names)
* dimensions (e.g. matrices, arrays)
* class
* length
* other user-defined attributes/metadata

Attributes can be accessed using attributes() function

## Vectors and Lists
#### Creating Vectors

The c function can be used to create vectors and can be thought of as a concatenate function

In [5]:
c(0.5, 0.6)

In [6]:
c(TRUE, FALSE)

In [7]:
c(T, F)

In [8]:
c("a", "b", "c")

In [9]:
c(1+0i, 2+4i)

In [12]:
x <- vector("numeric", length=10)
print(x)

 [1] 0 0 0 0 0 0 0 0 0 0


What if you tried to concatenate two different data classes?

In [13]:
y1 = c(1.7, 'a') # character
print(y1)
y2 = c(TRUE, 2) # number
print(y2)
y3 = c('a', TRUE) # character
print(y3)

[1] "1.7" "a"  
[1] 1 2
[1] "a"    "TRUE"


You don't get an error as the coercion happens behind the scenes.

You can coerce the class explicitally:

In [18]:
x <- 0:6
class(x)

In [15]:
as.numeric(x)

In [16]:
as.logical(x)

In [17]:
as.character(x)

Nonsensical conversion results in NAs

In [19]:
x <- c('a', 'b', 'c')

In [20]:
as.numeric(x)

In eval(expr, envir, enclos): NAs introduced by coercion

In [21]:
as.logical(x)

In [22]:
as.complex(x)

In eval(expr, envir, enclos): NAs introduced by coercion

#### Lists

Lists are a type of vector that can contain elements of different classes.

In [23]:
x <- list(1, 'a', TRUE, 1 + 4i)
x

In [24]:
print(x)

[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE

[[4]]
[1] 1+4i



In [27]:
print(class(x))
print(class(x[[1]]))
print(class(x[[2]]))
print(class(x[[3]]))
print(class(x[[4]]))

[1] "list"
[1] "numeric"
[1] "character"
[1] "logical"
[1] "complex"


## Matrices

Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol)

We can create an empty matrix as follows:

In [3]:
m <- matrix(nrow = 2, ncol = 3)
print(m)

     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   NA   NA


In [4]:
dim(m)

In [5]:
attributes(m)

Matrices are constructed column wise, so entries can be thought of starting in the "upper-left" corner and running down the columns, as follows:

In [8]:
m <- matrix(1:6, nrow=2, ncol=3)
print(m)

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6


Matrices can also be created directly from vectors by adding a dimension attribute

In [10]:
m <- 1:10
dim(m) <- c(2, 5)
print(m)

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7    9
[2,]    2    4    6    8   10


Matrices can also be created by _column-binding_ or _row-binding_ with cbind() and rbing()

In [13]:
x <- 1:3
y <- 10:12
print(cbind(x, y))

     x  y
[1,] 1 10
[2,] 2 11
[3,] 3 12


In [14]:
print(rbind(x, y))

  [,1] [,2] [,3]
x    1    2    3
y   10   11   12


## Factors

Factors are used to represent categorical data. Factors can be unordered or ordered. You cna think of a  factor as an integer vector where each integer has a label.
* Factors are treated specially by modelling functions like lm() and glm()
* Using factors with labels is better than integers because factors are self-describing (i.e. variable names "Male" and "Female" are better than 1 and 2

In [15]:
x <- factor(c("yes", "yes", "no", "yes", "no"))
print(x)

[1] yes yes no  yes no 
Levels: no yes


In [16]:
table(x)

x
 no yes 
  2   3 

In [17]:
unclass(x)

In [21]:
print(attr(x,"levels"))

[1] "no"  "yes"


The order of the levels can be set using levels argument to factor(). This important in linear modelling because the first level is used as the baseline level.

The baseline level is the first level in the factor, it's determined by alphabetical order. No is determined as the baseline level in "yes" and "no". If you wanted "yes" then you would have to specify that:

In [22]:
x <- factor(c("yes", "yes", "no", "yes", "no"), levels=c("yes", "no"))
print(x)

[1] yes yes no  yes no 
Levels: yes no


## Missing Values

Missing values are denoted by NA or NaN for undefined mathematical operations
* is.na() is used to test objects if they are NA
* is.nan() is used to test objects if they are NaN
* NA values have a class also, so there are integer NA, character NA etc
* A NaN value is also NA but the converse is not true

In [24]:
x <- c(1, 2, NA, 10, 3)
is.na(x)

In [25]:
is.nan(x)

In [26]:
x <- c(1, 2, NaN, NA, 3)
is.na(x)

In [27]:
is.nan(x)

## Data Frames

Data frames are used to store tabular data
* They are represented as a special type of list where every element of the list has to have the same length
* Each element of the list can be thought of as a column and the length of each element of the list is the number of rows
* Unlike matrices, data frames can store different classes of objects in each column (just like lists); matrices must have every element the same class
* Data frames also have a special attribute called row.names
* Data frames are usually created by calling read.table() or read.csv()
* Can be converted to a matrix by calling data.matrix()


In [28]:
x <- data.frame(foo = 1:4, bar=c(T, T, F, F))
x

Unnamed: 0,foo,bar
1,1,1
2,2,1
3,3,0
4,4,0


In [29]:
nrow(x)

In [30]:
ncol(x)

## Names

R objects can also have names, which is very useful for writing readable cod eand self-desribing objects.

In [31]:
x <- 1:3
names(x)

NULL

In [32]:
names(x) <- c("foo", "bar", "norf")
x

In [33]:
names(x)

Lists can also have names

In [34]:
x <- list(a=1, b=2, c=3)
print(x)

$a
[1] 1

$b
[1] 2

$c
[1] 3



And Matrices:


In [35]:
m <- matrix(1:4, nrow=2, ncol=2)
# First element of list is row names, second element of list is column names.
dimnames(m) <- list(c("a", "b"), c("c", "d"))
m

Unnamed: 0,c,d
a,1,3
b,2,4
