# An Introduction to R

R is a language and environment for statistical computing and graphics. You can read more about it [here](https://www.r-project.org/about.html).

In this Notebook we will point out **some** useful basics of R to help you get started. If you want to learn more then there are a variety of resourses listed in the references section.

We will be assuming you are following along using RStudio, but you can also interact with the code by clicking the rocket launch button to open this in Google Colab.

To get started open RStudio, open a new R Script (e.g. File->New File->R Script) and save it (e.g. as **RBasics.R**).

## Getting to know R/RStudio

When you are working in R Studio you should see that the window is split into four panels. Commonly this will be:
* top left - any current script(s) (e.g. our RBasics.R script)
* bottom left - a console for experimental commands and any R output
* top right - the working environment with any saved variables or data structures
* bottom right - extras such as a files tab and R Help

![panels](./images/panels.png)

### The Console

If you want to test out a line of code (command) you can run it in the console before copying it to your script. Some useful features:
* You can use your keyboards up/down arrows to scroll back through previous commands.
* You can use `^R` to search your command history.
* You can clear the console using `^L`.

```{tip}
You can also use the console to access R's help guides (which will appear in the bottom right panel).

You can search R's documentation for a keyword by typing `??keyword`. You can get help on a particular function or package by typing `?function`.
```

In [None]:
# asking R to search its documentation for the keyword "mean"
??mean
# asking R to provide help on the function mean
?mean

## Preamble

Our script is where we will write and execute our code. Lets start our script **RBasics.R** by adding some preamble.

In RStudio you can create (collapsible) sections to keep your code organised. To create a section in R Studio go to Code->Insert Section, a pop up box will appear for you to name your section. We will start by creating a section called "Preamble". Once you click ok you should get something that looks like this:

In [None]:
# Preamble ----------------------------------------------------------------

```{tip}
The # symbol is used to indicate a comment in R. This is a line which R will skip over and not try to run as code. The best codes are well commented, so feel free to add your own comments to your code to help you understand it.
```

### Documentation

You may wish to start your code with some documentation. This could look something like:

In [None]:
# Author: [Your Name]
# Date: [Date]
# Description: [Brief description of what the script does]

### Install/Load Packages

Next you will need to install/load packages. We **install** packages in R once. We **load** packages we are going to use everytime we open/start R.

In [None]:
# Install your packages (first use only)
# install.packages("tidyverse")

# Load your packages (everytime you restart R)
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.4     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


```{tip}
There are thousands of R packages many of which you will never use and a handful of which you will use all the time. We will try to point you to the most relevant but you will also find more information in the references section.
```

### Clear R Environment

A common line you will often see in the Preamble of R codes is one which will clear the current workspace/environment.

In [None]:
# Clear everything from the environment
rm(list = ls())

```{caution}
This will delete/unassign all of the varaibles and data structures you have created so only use it if you wont lose something from the environment you will need.
```

### Set Working Directory

It may keep things streamlined if you set the path to your preferred working directory. Imagine you have a folder called **DataProject** sitting in your **Documents** folder. By setting the path to point to that folder automatically means that inputs will be taken from this folder and outputs will be put there.

In [None]:
# Set working directory using setwd("path/to/your/directory")
# setwd("~/Documents/DataProject")

## Basic Operations and Print Statements

Lets set up another section within our R script and call it **The Basics**

In [None]:
# The Basics --------------------------------------------------------------

### Basic Operations

Let's have a play with some very basic functions. Most of these are unremarkable:
* to add numbers use `+`
* to subtract numbers use `-`
* to multiply numbers use `*`
* to divide numbers use `/`
* to find powers use `^`
* to calculate the square root use `sqrt()`
* to calculate the sum use `sum()`
* to calculate the mean use `mean()`
* to round numbers use `round()`

There are many other simple functions such as the exponential function `exp()` and the natural logarithmic function `log()`. If you want to see if R has a function for something you can use the help to search that keyword (e.g. ``??exponetial``).

In [16]:
# a simple mathematical calculation using basic operations
1+2*2+3^2-4/2-sqrt(4)

```{note}
R, like most coding languages, follows the traditional mathematical rules of precedence, i.e. [BIDMAS](https://www.bbc.co.uk/bitesize/articles/znm8cmn#:~:text=BIDMAS%20is%20an%20acronym%20used,%2C%20Multiplication%2C%20Addition%2C%20Subtraction.).
```

### Print Statements

Print statements, like comments, are very useful when coding. Although you can make use of the simple print function `print`, we would recommend the concatenate and print function `cat`. Here's how it works:

In [18]:
# an example of a simple print statement in R
cat("The sum of the first ten numbers is:", sum(1:10), "\n")

The sum of the first ten numbers is: 55 


Anything you want R to print verbatim you put inside "" and anything else must either be an object or some function or operation which R can carry out. The final "\n" tells are to start a new line on ending the statement.

## Objects in R

### Assigning Objects

Objects in R are essentially containers. They may contain single peices of data (e.g. a variable) or be more complex structures of data (e.g. lists and data frames). Objects are created by **assigning** the data to that object using `<-`.

```{note}
There are actually no less than five different assignment operators as per this [documentation](https://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html). However, at a basic level using `<-` will serve you well.    
```

Lets set up a section in our R script called **Objects in R** and look at some examples.

In [1]:
# Objects in R ------------------------------------------------------------

# assign the value 16 to the object my_variable
my_variable<-16
# assign a set of values {1,2,3} to the object my_set
my_set<-c(1,2,3)
# assign a list of values {1,2,3} to the object my_list
my_list<-list(1,2,3)
# assign a list of sets {1,2,3} and {"A", "B", "C", "D"} to the object my_listofsets
my_listofsets<-list(c(1,2,3),c("A","B","C","D"))
# assign a name to each of the sets for future reference
my_listofsets2<-list(my_set1=c(1,2,3),my_set2=c("A","B","C","D"))
# assign a table of data to the object my_dataframe
my_dataframe<-data.frame(my_column1=c(1,2,3), my_column2=c("yes","no","maybe"))

After doing this our environment panel should look like this:

![environment](./images/environment.png)

Alternatively, you can get R to list all the currently assigned objects using `ls()`.

In [None]:
# list all of the currently assigned objects
ls()

```{note}
We saw above that `rm(list = ls())` will remove (and de-assign) all of the objects in the current session. You can remove/deassign individual objects using `rm()`, for example, `rm(my_variable)`.
```

There are various nuances here in the how things have been assigned and stored but two key important things to remember are:
* we store strings/words using quotation marks - single or double depending on preference.
* a list can contain sets of different length but every column in a dataframe must have the same length.
* objects have classes and it is important that the correct class has been selected for each object.


### Classes of Objects

Objects in R are automatically given a "class" (e.g. numeric, character). These indicate how data is stored in R.

In [14]:
# give the class of my_variable
cat("The object my_variable is of class:", class(my_variable), "\n")
# give the class of my_set
cat("The object my_set is of class:", class(my_set), "\n")
# give the class of my_list
cat("The object my_list is of class:", class(my_list), "\n")
# give the class of my_listofsets
cat("The object my_listofsets is of class:", class(my_listofsets), "\n")
# give the class of each set in my_listofsets2
cat("The set my_set1 within object my_listofsets2 is of class:", class(my_listofsets2$my_set1), "\n")
cat("The set my_set2 within object my_listofsets2 is of class:", class(my_listofsets2$my_set2), "\n")
# give the class of my_dataframe
cat("The object my_dataframe is of class:", class(my_dataframe), "\n")
cat("The column my_column1 within object my_dataframe is of class:", class(my_dataframe$my_column1), "\n")
cat("The column my_column2 within object my_dataframe is of class:", class(my_dataframe$my_column2), "\n")

The object my_variable is of class: numeric 
The object my_set is of class: numeric 
The object my_list is of class: list 
The object my_listofsets is of class: list 
The set my_set1 within object my_listofsets2 is of class: numeric 
The set my_set2 within object my_listofsets2 is of class: character 
The object my_dataframe is of class: data.frame 
The column my_column1 within object my_dataframe is of class: numeric 
The column my_column2 within object my_dataframe is of class: character 


```{note}
We can also use the function `typeof()`. If an object contains strings its type and class will be character. If an object is of class numeric it may be subcatergorised as either an integer or a double using `typeof()`.
```

In [15]:
# give the type of my_variable
cat("The object my_variable is of type:", typeof(my_variable), "\n")

The object my_variable is of type: double 


R is reasonably good at selected the right type and class for each object but occasionally it might get things wrong.

In [33]:
# assign a set of values {1,"two",3} to the object my_mixedset
my_mixedset<-c(1,"two",3)
# determine the class of my_mixedset
cat("The object my_mixedset is of class:", class(my_mixedset), "\n")

The object my_mixedset is of class: character 


Here, since we included a string in my_mixedset automatically is classes of type "character" and all values are stored as strings. We can check the class using `is` e.g. `is.numeric()` as change the class using `as` e.g. `as.numeric`.

In [34]:
# ask R if my_mixedset is of class "numeric"
is.numeric(my_mixedset)
# ask R to store my_mixedset as class "numeric" (note reassignment)
my_mixedset<-as.numeric(my_mixedset)
# ask R if my_mixedset is of class "numeric"
is.numeric(my_mixedset)

“NAs introduced by coercion”


```{warning}
When we force my_mixedset to be stored as a numeric object any strings are lost (irrevocably) and replaced by a mising data indicator NA. In [Data Workflow](https://cicelykrystyna.github.io/MD4002_RWorkshop/MD4002_RWorkshop.html) there is a more detailed example of how we can convert the "two" to a 2.
```

We can also change the more specific type of an object, for example,

In [38]:
# ask R to store my_variable as type "integer" (note reassignment)
my_variable<-as.integer(my_variable)
# give the type of my_variable
cat("The object my_variable is of type:", typeof(my_variable), "\n")

The object my_variable is of type: integer 


### Logical Objects

Sometimes our data may be stored as a set of true/false responses. R stores these as "logicals". We tell R this by writing TRUE or FALSE or more simply using T or F, for example,

In [41]:
# assign a set of true/false repsonses to the object my_truefalseset
my_truefalseset<-c(T,F,T,T,F,T)
# give the class of my_truefalseset
cat("The object my_truefalseset is of class:", class(my_truefalseset), "\n")

The object my_truefalseset is of class: logical 


Logicals or booleans where there are two distinct answers (binary) can be converted to numeric by flagging one reponse as 1 and the other reponse as 0. If we were to convert my_truefalseset to class "numeric" any Ts become 1s and Fs become 0s.

In [44]:
# ask R to store my_truefalseset as class "numeric" (note reassignment)
my_truefalseset<-as.numeric(my_truefalseset)
# print my_truefalseset
cat("my_truefalseset:", my_truefalseset, "\n")

my_truefalseset: 1 0 1 1 0 1 


Suppose we had another set of binary data e.g. a set of happy or sad responses. We can also convert this to either numeric or logical but we would just need to specify which of happy or sad is to be flagged as TRUE/1.

In [51]:
# assign a set of happy/sad repsonses to the object my_happysadset
my_happysadset<-c("happy", "sad", "sad", "sad", "happy")
# give the class of my_happysadset
cat("The object my_happysadset is of class:", class(my_happysadset), "\n")
# print my_happysadset as a logical or numeric
cat("my_happysadset:", as.logical(my_happysadset=="happy"), "\n")
cat("my_happysadset:", as.numeric(my_happysadset=="happy"), "\n")

The object my_happysadset is of class: character 
my_happysadset: TRUE FALSE FALSE FALSE TRUE 
my_happysadset: 1 0 0 0 1 
