# Getting Started with R

[Original source](https://cengel.github.io/R-intro/)

## Creating objects in R

doesn't print anything

In [6]:
weight_kg <- 55

but putting parenthesis around the call prints the value of `weight_kg`

In [4]:
(weight_kg <- 55)

and so does typing the name of the object

In [5]:
weight_kg 

For instance, we may want to convert this weight into pounds (weight in pounds is 2.2 times the weight in kg):

In [7]:
2.2 * weight_kg

We can also change a variable’s value by assigning it a new one:

In [8]:
weight_kg <- 57.5
2.2 * weight_kg

let’s store the weight in pounds in a new variable,

In [9]:
weight_lb <- 2.2 * weight_kg

Challenge
What do you think is the current content of the object weight_lb? 126.5 or 220?

In [10]:
weight_lb

### Functions and their arguments

A function usually gets one or more inputs called arguments. Functions often (but not always) return a value.

An example of a function call is:

In [15]:
a <- 3.1416
b <- round(a)
b

We can use `args(function)` or look at the help for this function using `?function`.

In [13]:
args(round)

In [14]:
?round

0,1
Round {base},R Documentation

0,1
x,"a numeric vector. Or, for round and signif, a complex vector."
digits,integer indicating the number of decimal places (round) or significant digits (signif) to be used. Negative values are allowed (see ‘Details’).
...,arguments to be passed to methods.


We see that if we want a different number of digits, we can type digits=2 or however many we want.

In [16]:
round(3.14159, digits = 2)

If you provide the arguments in the exact same order as they are defined you don’t have to name them:

In [17]:
round(3.14159, 2)

And if you do name the arguments, you can switch their order:

In [19]:
round(digits = 2, x = 3.14159)

* **NOTE**: R evaluates function arguments in three steps: **first**, _by exact matching_ on argument name, **then** _by partial matching_ on argument name, and **finally** _by position_.

## Vectors and data types

A vector is the most common and basic data type in R, and is pretty much the workhorse of R. A vector is composed by a series of values, which can be either numbers or characters.
Exists 6 **atomic vector types** (or data types) that R uses:
* `"character"`
* `"numeric"`
* `"logical"`
* `"integer"` for integer numbers (e.g., 2L, the L indicates to R that it’s an integer)
* `"complex"` to represent complex numbers with real and imaginary parts (e.g., 1 + 4i) 
* `"raw"`

We can assign a series of values to a vector using the `c()` function.

In [20]:
weight_g <- c(21, 34, 39, 54, 55)
weight_g

`length()` tells you how many elements are in a particular vector:

In [21]:
length(weight_g)

`class()` indicates the class (the type of element) of an object:

In [23]:
class(weight_g)

`str()` provides an overview of the structure of an object and its elements:

In [24]:
str(weight_g)

 num [1:5] 21 34 39 54 55


You can use the `c()` function to add other elements to your vector:

In [25]:
weight_g <- c(weight_g, 90) # add to the end of the vecto
weight_g

In [26]:
weight_g <- c(30, weight_g) # add to the beginning of the vector
weight_g

A vector can also contain characters:

In [27]:
animals <- c("mouse", "rat", "dog", "bear")
class(animals)

we will introduce a vector with logical values (the boolean data type).

In [28]:
has_tail <- c(TRUE, TRUE, TRUE, FALSE)
has_tail 

We’ve seen that atomic vectors can be of type character, numeric, integer, and logical. But what happens if we try to mix these types in a single vector?

Objects of different types get converted into a single, shared type within a vector. In R, we call converting objects from one class into another class **coercion**. These conversions happen according to a hierarchy, whereby some types get preferentially coerced into other types.

In [33]:
num_char <- c(1, 2, 3, 'a')
class(num_char)

In [34]:
num_logical <- c(1, 2, 3, TRUE)
class(num_logical)

In [35]:
char_logical <- c('a', 'b', 'c', TRUE)
class(char_logical)

In [36]:
tricky <- c(1, 2, 3, '4')
class(tricky)

## Subsetting vectors

In [38]:
animals <- c("mouse", "rat", "dog", "bear")
class(animals)

In [43]:
animals[0]

In [42]:
animals[1]

In [40]:
animals[c(3, 2)]

`':'` is a special function that creates numeric vectors of integers in increasing or decreasing order

In [44]:
animals[2:4]

In [45]:
animals[4:2]

You can exclude elements of a vector using the `“-”` sign:

In [46]:
animals[-2]

We can also repeat the indices to create an object with more elements than the original one:

In [47]:
more_animals <- animals[c(1, 2, 3, 2, 1, 1, 1, 1, 1, 4)]
more_animals

* **NOTE**: R indices start at 1. Programming languages like Fortran, MATLAB, Julia, and R start counting at 1, because that’s what human beings typically do. Languages in the C family (including C++, Java, Perl, and Python) count from 0 because that’s simpler for computers to do.

### Conditional subsetting

Another common way of subsetting is by using a logical vector. `TRUE` will select the element with the same index, while `FALSE` will not.

In [38]:
animals <- c("mouse", "rat", "dog", "bear")
class(animals)

In [48]:
has_tail <- c(TRUE, TRUE, TRUE, FALSE)
has_tail 

In [49]:
animals[has_tail]

In [47]:
more_animals <- animals[c(1, 2, 3, 2, 1, 1, 1, 1, 1, 4)]
more_animals

In [50]:
more_animals[has_tail]

To search for certain strings in a vector. One could use the `“or”` operator `|` to test for equality to multiple values, but this can quickly become tedious. The function `%in%` allows you to test if any of the elements of a search vector are found:

In [51]:
animals[animals == "bear" | animals == "rat"] # returns both rat and cat

In [52]:
animals %in% c("rat", "cat", "dog", "duck", "goat")

In [54]:
weight_g <- c(21, 34, 39, 54, 55)
weight_g

In [55]:
weight_g > 50    # will return logicals with TRUE for the indices that meet the condition

You can combine multiple tests using `&` (both conditions are true,`AND`) or `|` (at least one of the conditions is true, `OR`):

In [58]:
weight_g[weight_g < 30 | weight_g > 50]

In [59]:
weight_g[weight_g >= 30 & weight_g == 21]

## Missing data

As `R` was designed to analyze datasets, it includes the concept of missing data (which is uncommon in other programming languages). Missing data are represented in vectors as `NA`.

When doing operations on numbers, most functions will return `NA` if the data you are working with include missing values. This feature makes it harder to overlook the cases where you are dealing with missing data.

In [61]:
heights <- c(2, 4, 4, NA, 6)
heights

In [62]:
max(heights)

In [63]:
sum(heights)

You can add the argument `na.rm=TRUE` to calculate the result while ignoring the missing values.

In [64]:
max(heights, na.rm = TRUE)

In [65]:
sum(heights, na.rm = TRUE)

### Dealing with NA

If your data include missing values, you may want to become familiar with the functions:
* `is.na()`, 
* `na.omit()`, and 
* `complete.cases()`

In [69]:
# Extract elements which are not missing values.
heights[!is.na(heights)] 

In [70]:
# Returns the object with incomplete cases removed. The returned object is atomic.
na.omit(heights)

In [75]:
# Extract elements which are complete cases.
heights[complete.cases(heights)]

Challenge
Using this vector of length measurements, create a new vector with the NAs removed.

In [77]:
lengths <- c(10,24,NA,18,NA,20)
lengths

In [79]:
na.omit(lengths)

In [80]:
median(lengths, na.rm = TRUE)

In [81]:
mean(lengths, na.rm = TRUE)

## Common R Data Structures

Vectors are one of the many data structures that R uses. Other important ones are matrices (`matrix`), tables (`data.frame`), lists (`list`), and factors (`factor`).

### Matrix

To construct a matrix, we use a function conveniently called `matrix()`.

In [82]:
y <- matrix(1:20, nrow=5,ncol=4) # generates 5 x 4 numeric matrix
y

0,1,2,3
1,6,11,16
2,7,12,17
3,8,13,18
4,9,14,19
5,10,15,20


Subset a matrix with `[row , column]`:

In [85]:
y[,4]       # 4th column of matrix

In [86]:
y[3,]       # 3rd row of matrix

In [87]:
y[2:4,1:3]  # rows 2,3,4 of columns 1,2,3

0,1,2
2,7,12
3,8,13
4,9,14


### List

Lists can have elements of any type. Here is how we construct lists. You may have guessed that to construct a list, we use the `list()` function:

In [89]:
myl <- list(id="ID_1", a_vector=animals, a_matrix=y, age=5.3) # example of a list with 4 components
myl

0,1,2,3
1,6,11,16
2,7,12,17
3,8,13,18
4,9,14,19
5,10,15,20
