# Set-up

Before we do anything, we need to change Google Colab to use R as our programming language, not Python. To accomplish this, click on the "Runtime" dropdown menu, select "Chance runtime type," and set it to R from Python.

In this course, we will learn about statistics with its applications in R. R is a powerful programming language (this entire tutorial was created in R) but don't let the fact that it is a programming language scare you. We will focus mainly on the statistics with less emphasis on the programming components. Using R to conduct statistics is useful for many reasons, some of which are:

- Using a computer is immensely easier and more powerful than computing analyses by hand. If you ever feel discouraged from being stuck on a problem, just think of the graduate students that took entire summers to hand-calculate a single analysis!

- Would you want to build a house with only a hammer as your tool? It can be done but why limit yourself when it sets you back from putting a roof over your head! Spreadsheet programs, such as Excel, can do statistics and data visualization but it is rigid and clunky at best. R is flexible and powerful.

- R is free and open-sourced. Anyone, including your instructor, can create functionalities to improve R or ease the process of conducting data analysis. We will use several 'packages' that were created to improve your experience.

# Learning Objectives

Before we begin to use R itself, it may be helpful to start small, with this interactive interface. In this tutorial, you will:

- Learn the basics of how R can be used.

- Use R code blocks (called 'code chunks') to write some code yourself, including the creation of objects and using functions.

Without further ado, let's begin!

## Using R like a calculator

As we shall see, R is very versatile in its functionality but much of its usage in statistics is as a glorified calculator. Like most calculators, there is a specified order that calculations take place. The table below provides a list of the common operations used in R along with an example of their usage.

### Operations in R

NOTE: Anything behind the hashtag (i.e., # symbol) is NOT considered to be "code," these are known as "comments." I write comments above every line of code to tell myself (and you) what my code is trying to do

- Subtraction: uses the minus (-) key (e.g., 3 - 5 will produce -2).

- Addition: Uses the plus (+) key (e.g., 2 + 3 will produce 5).

- Multiplication: Uses the star or asterisks (*) key (e.g., 3 * 4 will produce 12).

- Division: Uses the slash/dash (/) key (e.g., 4 / 2 will produce 2).

- Exponent: Uses the carrot (^) key (e.g., 2^3 will produce 8).

In simple operations, spacing between numbers and the operator does not matter. For consistency and clarity, I always put 1 space  


In [1]:
## Addition
2 + 3

5+ 7

## Subtraction
3 -   5

7 -2

## Multiplication
3   * 4

2 *    6

## Division
4 / 2

9 / 3

## Exponent
2^8

3^2


ERROR: Error in kable(OperatorTable, caption = "The order of operations in R"): could not find function "kable"


### Try it yourself

Try to accomplish each of the specified calculations below to see how it works:

- 20 multiplied by 4

- 3 to the exponent (power) of 3

- 16 divided by 4

In [None]:
## 20 multiplied by 4


## 3 to the power of 3


## 16 divided by 4



### Orde*R* of operations

R uses a common standard for its order of operations. To help remember it, use the  acronym* **PEMDAS**. *The highest priority is computations within parentheses, followed by computing exponents, multiplication and division are next and given equal preference, followed by addition and subtraction that are also given equal preference.

1) **P**arentheses: "()"

2) **E**xponents: "^"

3) **M**ultiplication and **D**ivision: "*" and "/"

4) **A**ddition and **S**ubtraction: "+" and "-"

Add parentheses to make this expression equal 32 instead of 23:

In [None]:
## Add parentheses to make the expression equal 32 instead of 23
3 + 5 * 4

## Creating objects

R is an "object-oriented language," meaning that everything in R is an object. Let's start creating some to try it out.

### Assigning data to objects

R is an object-oriented language, meaning that everything is stored as an object, even functions are stored as objects. An object is something (e.g., an image, some output, data) that is stored in the environment (i.e., R's "memory"). Objects that are stored can be used for future operations. This may seem confusing at first so let's see how it works. In the code chunk, we create an object named "Test" and we assign that object (using "<-") a value of 5. The next line of code is calling the contents of the object to your output.

See how we are assigning the number five into the object named "Test":

In [2]:
## An object named "Test" is assigned a value of 5
Test <- 5

## If we execute code that is just an object name, it will return the contents of that object (we will do this a lot)
Test


### Objects are case sensitive

When using objects, we must be careful about spelling them correctly!

The code below would return an error because we did not define an object "test" in all lower-case, our object is all upper-case.

In [3]:
## Creating an object with all capital letters
TEST <- 4 + 3

## Calling an object with a different spelling
tEsT

ERROR: Error: object 'tEsT' not found


### Error checking

The above line of code did not work because we tried to call an object that was not created. Google Colab has embedded AI to help us our here. Click the "Explain error" button above to have some AI assistance to help figure out why the code failed. As said above, this failed because we called the wrong object name (case sensitive). Let's call the right name.

In [4]:
## Call the correct TEST object
TEST

### Object names must start with letters

Object names in R can be very flexible with only 1 rule: Object names MUST start with a letter. Numbers and special characters (underscore, period) can be used after that.

See the code chunks below to see examples of permissible object names and try the "Explain error" button again:

In [5]:
## New object with the square root of 25 as its contents
Test1 <- sqrt(25)

## Call object
Test1

## Create object with the numbers 1 through 5
Object2 <- 1:5

## Call Object2
Object2

## Cause an error, try object starting with number
1Broken <- 2 + 2



ERROR: Error in parse(text = input): <text>:14:2: unexpected symbol
13: ## Cause an error, try object starting with number
14: 1Broken
     ^


### Objects can be overwritten

An object can be overwritten and there is no 'undo' button to fix it (other than re-running previous code chunks). In the code chunk below, the object "a" is changed to contain a different value because we use the assign (i.e., <-) operator to store new data in the object.

The best practice is to *never* overwrite objects but to always create new objects (e.g., 'Object' then store any changes into 'NewObject').

Run the code chunk to see how the object "a" changes:

In [None]:
## Create object storing the number 3
a <- 3

## Call the object
a

## Overwrite our object by adding 2 to what was stored in object "a"
a <- a + 2

## Call the new object
a

### Objects can be re-used

Why use objects? Because we can store them and use them several times for ease! For example, we can see the total cost by multiplying the number of items by a fixed price.

Assigning a number to the object 'Units' to calculate how much it would cost to create 35 units at \\$5.99 per unit:


In [6]:
## Create object for the number of units
Units <- 35

## Create an object to show the full cost, multiply price by number of units bought
Cost <- 5.99 * Units

## Object with total costs
Cost

# Functions in R

R and other programs, like Python and Excel, make use of functions. Functions take specific inputs and does something specific to those inputs. We have already used some functions, such as the square root 'sqrt' function.

Functions have two elements to consider: arguments and default values. All functions include arguments that help tell the function the inputs and potentially change what a function even does. The 'round()' function will round the numeric value specified by the argument _x_ to the number of digits specified by the argument _digits_.

Things to remember:

- Function and argument names are case sensitive.

- Many arguments have default values, when arguments are not specified, the defaults will be used.

- Arguments are sometimes limited to certain kinds of inputs. For example, you cannot take the square root of a word, you can only square root a number!

### Arguments of a function

Here, we will practice changing the inputs to two arguments in the round() function. The argument name "x" tells the function the numeric value we wish to round whereas the argument name "digits" tells the function how many decimal places to round the number.

See how the code below will round the value in the *x* arguement (i.e., x = 3.14165) to only 2 decimal points.

In [None]:
## Round digits of pi to 2 decimal places
round(x = 3.14165, digits = 2)

Let's break down what happened in that function call.

- The function: The function was "round," we are telling R to use this function on some numbers. The purpose of this function (not all functions have obvious names) is to round the number of digits.

- The arguments: Two arguments were used. The round function has arguments "x" and "digits". The "x" argument is the number we want to round in the first place. The "digits" argument is how many digits we want to round the number to.

See some examples below where we change the two argument entries:

In [7]:
## Round digits of pi to 4 decimal places
round(x = 3.14165, digits = 4)

## Round the digits below to 4 decimal places
round(x = 5.2068123153, digits = 4)

### Lazy arguments in R

The names of arguments do not always need to be specified, _although it is good practice to always specify argument names_. Here, the order matters. If there are three arguments and you do not specify the argument names, values will be assumed to be in the order of the arguments. Many functions have a ton of arguments so that is why it's best practice to always specify arguments. **Just try to ignore any instances where I fail to listen to my own advice.**

See the code chunk below to show that the round functions works even when we fail to specify the "x = " and "digits = " argument names:

In [None]:
## Lazy argument calling in round
round(3.14165, 2)

## "Correct" argument calling in round, both work though
round(x = 3.14165, digits = 2)

### Functions within functions

Sometimes it can be useful to put functions inside functions. Nesting functions within functions can sometimes be convenient but it quickly becomes hard to read the code.

We can always use multiple lines of code to create objects.

In [8]:
# Function within function

## Round the output of the square root of 29 to 4 digits
round(x = sqrt(29), digits = 4)

# Multiple lines of code

## Create object with sqrt of 29
SqRt29 <- sqrt(29)

## Round the output of the sqrt object
round(x = SqRt29, digits = 4)

## What's your vector, Viktor?

A common term used in R and in statistics is a "vector," which is a set of numbers with a particular order. Image we measured the heights of everybody in a class. We can store those values as a vector of heights. You can potentially remember "vector" to be a "variable" with multiple values. In R, object names can be assigned an entire vector.

Below, the $\texttt{1:5}$ component of the code creates a *vector* of all integers from 1 to 5 and stores them as a single object named *vectorObject*.

In [None]:
## Object with a "vector" with digits from 1 to 5
VectorObject <- 1:5

## Call the object to see what it includes
VectorObject

## Create a big vector object with digits 1 to 100
BigVector <- 1:100
BigVector


### Con*cat*enate got your tongue?

To stitch together values into a vector, we can use the concatenate function: $\texttt{c()}$.

The code chunk below creates a vector object, named "ConcatFunc", that stores the values of 37, 4, 91, and 25 in that specific order. Notice here how each value is separated by a comma? That is important because it tells R that each value is separate.

In [9]:
## Object that concatenates (combines) multiple values into a vector
ConcatenateFunc <- c(37, 4, 91, 25)

### Indexing in R

R has some tools that help us find a specific value, known as an element, from a vector. Let's again imagine we collected height data from students. Maybe we know that Sally is the 4th person in our vector but we don't remember her height. We can *index* in R using square brackets after an object name.

Notice that the 4th element in our concatenate function is 65.


In [10]:
## Object of fake student heights
StudentHeight <- c(72, 68, 61, 65, 67, 70)

## Select the 4th entry in our vector (i.e., 65)
StudentHeight[4]

## Select the 2nd entry in our vector (i.e., 68)
StudentHeight[2]

## Alternative data types

R is very flexible and can handle values/elements besides numbers. Alternative data formats have unique typings, called "classes" in R. For example, numbers are considered to have the class 'numeric'. R can also interact with

- Character strings (i.e., words or sentences)

- Dates and datetimes which might appear like numbers but are treated differently

- Logical/binary values (i.e., TRUE or FALSE)

- User-created classes of objects to do unique manipulations specific to that object class.

To see the class of an object, merely use the $\texttt{class()}$ function on an object.

In [11]:
## Create a vector object with 3 values
NumberObject <- c(1, 2, 3)

## Call the class function to determine what type of object we have (numeric)
class(NumberObject)

### Characters in R

All elements in a vector must be the same, we cannot mix and match *in a vector*. There are other data structures, such as a data frame, that allow us to have different types of vectors but each vector must still have elements of the same type.

Notice how the vector below has the number 3 but the entire vector is still considered a character?

In [None]:
## Create vector object of characters and a numeric value
MixVector <- c("Hello", 3, "World")

## Determine the type of object we have
class(MixVector)

### Certain data types are constrained to certain functions/operations

Like in the real world, we cannot add numeric values and character values (2 + dog does not make sense).

Most functions in R have restrictions to only allow for specific types of classes as inputs. If the wrong object class is provided, R will return an error.

Try to apply the $\texttt{round()}$ function to the object below to (attempt to) round this object to 4 digits:

In [12]:
## Create object with character string
CharObject <- "Button"

## Call the round function
round(x = CharObject, digits = 4)

ERROR: Error in round(x = CharObject, digits = 3): non-numeric argument to mathematical function


### Logical operations

"Logical" data refers to binary TRUE and FALSE values, usually trying to make comparisons. For instance, is 16 greater (>) than 10? This statement is TRUE, which is our logical value.

Types of logical operators:

- Less than (<)

- Less than or equal to (<=)

- Greater than (>)

- Greater than or equal to (>=)

- Is equal to (==)

- Is NOT equal to (!=)

Let's see these in action.

In [14]:
## Several logical operators in action

## Less than
15 < 10
10 < 15

In [15]:
## Less than or equal to
4 <= 4
4 <= 3


In [16]:
## Greater than
10 > 8
5 > 41

In [17]:
## Greater than or equal to
6 >= 3
9 >= 11

In [18]:
## Is equal to
7 == 9
8 == 8

In [19]:
## Is NOT equal to
1 != 2
8 != 8

### Logical values as vector indexing

Let's use our class height example one more time to illustrate a point. Pretend we are going on a class trip to an amusement park with rollercoasters. These rollercoasters have a height requirement such that people must be 50 inches or taller to ride.

Notice that we can apply logical operations to an entire vector. All values _greater than or equal to_ 50 is flagged as TRUE whereas those _less than_ 50 are flagged as FALSE.

In [20]:
## Object with vector of heights
Heights <- c(50, 50, 48, 52, 54, 49, 50, 56, 50, 53)

## Return logical values comparing object entries 'greater than or equal to' 50
Heights >= 50

We can use this information to extract all values associated with TRUE from a vector! In the brackets, we create a logical vector (i.e., all TRUE or FALSE values). Using this vector as an index, R will only return values associated with TRUE and will "drop" values associated with FALSE.

In [None]:
## Object with vector of heights
Heights <- c(50, 50, 48, 52, 54, 49, 50, 56, 50, 53)

## Subset vector 'Heights' to only include values 'greater than or equal to' 50
Heights[Heights >= 50]

## Recap

This interactive tutorial has used R in the backend to execute each code chunk. In other words, you got some hands-on experience with R! These exercises have covered some of the basics of R:

- R can be used as a glorified calculator. At its simplest, we can run simple math computations.

- R is an object-oriented-language whereby everything in R is stored as an object in the environment (i.e., R's "memory"). We can store data, output, graphics, etc. as objects in R.

- R uses a combination of functions and objects. Technically, functions are themselves objects. Everything in R is an object on at least some level. We write multiple lines of code that build on each other to produce quite a broad range of functionalities and uses.

- R can store many data values in what are known as vectors. Each vector has specific elements, the first element being the first value in a vector, etc.

- R relies on many different types of data, such as numeric, character, and logical values. Each type has their own use cases and many functions require inputs to have specific classes (e.g., you cannot round a word/character to 2 digits).