# Appendix B Programming Basics

## B.1 Basic objects in `R`

### Vectors

The most basic element of statistics is a vector of numeric data. To create a vector in `R`, we use the following syntax.

In [2]:
v <- c(1, 2, 3, 4, 5)
print(v)

[1] 1 2 3 4 5


This create a variable named `v` and stores in it the vector of numbers `(1, 2, 3, 4, 5)`. Some things to note:

-  The left arrow `<-` is an operator used to assign values to objects. The equal sign `=` also works in `R`.  

- `c()` is a function that "combines values into a vector or list" (see [this help file](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/c) or type `?c` in the code cell). 

We can create a longer vector by combining two vectors. 

In [3]:
w<-6:10
x <- c(v,w)
print(x)

# Add more elements...


 [1]  1  2  3  4  5  6  7  8  9 10


Almost all operations in R are "vectorized", meaning that they can operate on vectors. As an example, to multiply each element of `x` by the number 2, we simply type the following. 

In [4]:
y <- 2*x
print(y)
# To add up all elements in `y`, we write 
 
# To calculate the mean of y
 
# or  


 [1]  2  4  6  8 10 12 14 16 18 20


Numeric vectors hold integers or doubles. By default, if we enter a number in `R` it is stored as a double. To explicitly store integers, we can use the as.integer function.

In [5]:
typeof(1)
typeof(as.integer(1))

It is possible to assign names to each entry of a vector. 

In [6]:
(v = c(a=1, b=2, c=3))
# c(a=1, b=2, stats306=3) == c(1, 2, 3)
names(v)
names(v) <- c("stats", 'ds', 'cs')


### Logical values

In `R`, the *boolean* (true / false) values are represented by the special values `TRUE` and `FALSE`, commonly abbreviated `T` and `F`.

In [7]:
u <- c(T, F, TRUE, TRUE, FALSE)
print(u)

[1]  TRUE FALSE  TRUE  TRUE FALSE


Logical values are results of logical statements (questions), for instance, comparisons of numbers. 

In [14]:
v <- c(1, 2, 3)




[1] FALSE  TRUE  TRUE
[1] 2 3


Often you will need to combine multiple logical conditions. To do this we have the **logical operators** (`&&` and `||`), which take the logical `and` and `or`, respectively, of several logical conditions.

In [None]:
# Weather
rain <-  
temp <-  
can_wear_shorts <-  
print(can_wear_shorts)

There is a subtle but important difference betwen the single and double versions of these operators. The single `&` performs entrywise `AND` over logical vectors:

In [None]:
# today's weather and yesterday weather
rain <- 
temp <- 
can_wear_shorts1 <-  # use &
can_wear_shorts2 <-  # use &&
print(can_wear_shorts1)
print(can_wear_shorts2)

Be careful when testing for equality in conditionals. The `==` operator will return a *vector* of logicals. If you want to make sure that any/all entries of a vector are `TRUE`, use the `any()` or `all()` functions:

In [61]:
v1 = c(1, 2, 3)
v2 = c(1, 1, 2)
v1 == v2
all(v1 == v2)
any(v1 == v2)
#if (v1 == v2) { print("Wrong!") }
#if (all(v1 == v2)) { print("All!") }
#if (any(v1 == v2)) { print("Any!") }

### Missing values

A very statistical feature of `R` that sets it apart from other languages is the built-in ability to handle missing data via the special value `NA` (not available). Think of `NA` as saying that `R` doesn't know the value of something. Is `NA` greater than 5? `R` doesn't know, because `R` don't know what the unobserved value supposed to be. Yet the `NA` is still counted as one sample, just not observed. 

In [8]:
x <- NA
is.na(x)
print(x > 5)
length(c(1, 2, NA))

### Matrix and array 

We can create vectors with more than one dimenions, which is known as a matrix (with two dimensions) or an array (with more than two dimensions). 

In [16]:
A<-matrix(1:9, nrow = 3, ncol = 3)

B <-array(1:20, dim=c(2,5,2))