# An Introduction to R

R is a language and environment for statistical computing and graphics. You can read more about it [here](https://www.r-project.org/about.html).

In this Notebook we will point out **some** useful basics of R to help you get started. If you want to learn more then there are a variety of resourses listed in the references section.

We will be assuming you are following along using RStudio, but you can also interact with the code by clicking the rocket launch button to open this is Google Colab.

To get started (having downloaded R and RStudio) open up RStudio and open a new R Script (e.g. File->New File->R Script) and save it (e.g. as **RBasics.R**).

## Getting to know R/RStudio

When you are working in R Studio you should see that the window is split into four panels. Commonly this will be:
* top left - any current script(s) (e.g. our RBasics.R script)
* bottom left - a console for experimental commands and any R output
* top right - the working environment with any saved variables or data structures
* bottom right - extras such as a files tab and R Help

![panels](./images/panels.png)

### The Console

If you want to test out a line of code (commands) you can run it in the console before copying it to your script. Some useful features:
* You can use your keyboards up/down arrows to scroll back through previous commands.
* You can use `^R` to search your command history.
* You can clear the console using `^L`.

```{tip}
You can also use the console to access R's help guides (which will appear in the bottom right panel).

You can search R's documentation for a keyword by typing `??keyword`. You can get help on a particular function or package by typing `?function`.
```

In [None]:
# asking R to search its documentation for the keyword "mean"
??mean
# asking R to provide help on the function mean
?mean

## Preamble

Our script is where we will write and execute our code. Lets start our script **RBasics.R** by adding some preamble.

In RStudio you can create (collapsible) sections to keep your code organised. To create a section in R Studio go to Code->Insert Section, a pop up box will appear for you to name your section. We will start by creating a section called "Preamble". Once you click ok you should get something that looks like this:

In [None]:
# Preamble ----------------------------------------------------------------

```{tip}
The symbol # is used to indicate a comment in R. This is a line which R will skip over and not try to run as code. The best codes are well commented, so feel free to add your own comments to your code to help you understand it.
```

### Documentation

You may wish to start your code with some documentation. This could look something like:

In [None]:
# Author: [Your Name]
# Date: [Date]
# Description: [Brief description of what the script does]

### Install/Load Packages

Next you will need to install/load packages. We **install** packages in R once. We **load** packages we are going to use everytime we open/start R.

In [None]:
# Install your packages (first use only)
# install.packages("tidyverse")

# Load your packages (everytime you restart R)
library(tidyverse)

── [1mAttaching core tidyverse packages[22m ──────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.1     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.4     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.4     
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


```{tip}
There are thousands of R packages many of which you will never use and a handful of which you will use all the time. We will try to point you to the most relevant but you will also find more information in the references section.
```

### Clear R Environment

A common line you will often see in the Preamble of R codes is one which will clear the current workspace/environment.

In [None]:
# Clear everything from the environment
rm(list = ls())

```{caution}
This will delete/unassign all of the varaibles and data structures you have created so only use it if you wont lose something from the environment you will need.
```

### Set Working Directory

It may keep things streamlined if you set the path to your preferred working directory. Imagine you have a folder called **DataProject** sitting in your **Documents** folder. By setting the path to point to that folder automatically means that inputs will be taken from this folder and outputs will be put there.

In [None]:
# Set working directory using setwd("path/to/your/directory")
# setwd("~/Documents/DataProject")

## Basic Operations and Print Statements

Lets set up another section within our R script and call it **The Basics**

In [None]:
# The Basics --------------------------------------------------------------

### Basic Operations

Let's have a play with some very basic functions. Most of these are unremarkable:
* to add numbers use `+`
* to subtract numbers use `-`
* to multiply numbers use `*`
* to divide numbers use `/`
* to find powers use `^`
* to calculate the square root use `sqrt()`
* to calculate the sum use `sum()`
* to calculate the mean use `mean()`
* to round numbers use `round()`

There are many other simple functions such as the exponential function `exp()` and the natural logarithmic function `log()`.

In [16]:
# a simple mathematical calculation using basic operations
1+2*2+3^2-4/2-sqrt(4)

```{note}
R, like most coding languages, follows the traditional mathematical rules of precedence, which are the same as [BIDMAS](https://www.bbc.co.uk/bitesize/articles/znm8cmn#:~:text=BIDMAS%20is%20an%20acronym%20used,%2C%20Multiplication%2C%20Addition%2C%20Subtraction.).
```

### Print Statements

Print statements, like comments, are very useful when coding. Although you can make use of the simple print function `print`, we would recommend the concatenate and print function `cat`. Here's how it works:

In [18]:
# an example of a simple print statement in R
cat("The sum of the first ten numbers is:", sum(1:10), "\n")

The sum of the first ten numbers is: 55 


Anything you want R to print verbatim you put inside "" and anything else must either be an object or some function or operation which R can carry out. The final "\n" tells are to start a new line on ending the statement.

## Objects in R

### Assigning Objects

Objects in R are essentially containers. They may contain single peices of data (e.g. a variable) or be more complex structures of data (e.g. lists and data frames). Objects are created by **assigning** the data to that object using `<-`.

```{note}
There are actually no less than five different assignment operators as per this [documentation](https://stat.ethz.ch/R-manual/R-patched/library/base/html/assignOps.html). However, at a basic level using `<-` will serve you well.    
```

Lets set up a section called **Objects in R** and look at some examples.

In [11]:
# Objects in R ------------------------------------------------------------

# assign the value 16 to the object my_variable
my_variable<-16
# assign a set of values {1,2,3} to the object my_set
my_set<-c(1,2,3)
# assign a set of values {1,2,3} to the object my_list
my_list<-list(1,2,3)
# assign a set of sets {1,2,3} and {"A", "B", "C", "D"} to the object my_listofsets(2)
my_listofsets<-list(c(1,2,3),c("A","B","C","D"))
my_listofsets2<-list(my_set1=c(1,2,3),my_set2=c("A","B","C","D"))
# assign a table of data to the object my_dataframe
my_dataframe<-data.frame(my_column1=c(1,2,3), my_column2=c("yes","no","maybe"))

After doing this our environment panel should look like this:

![environment](./images/environment.png)

Alternatively you can get R to list all the currently assigned objects using `ls()`. We saw above that `rm(list = ls())` will remove (and de-assign) all of the objects in the current session. You can remove/deassign individual objects using, for example, `rm(my_variable)`.

Looking at the environment panel we first note the difference between `my_set` and `my_list`. We can explore the difference by trying to calculate the mean of each:

In [10]:
# calculating the mean of the values given by my_set
mean(my_set)
# attempting to calculate the mean of the values given by my_list
mean(my_list)

“argument is not numeric or logical: returning NA”


Second we note the difference between `my_listofsets` and `my_listofsets2`. The following helps us to explore the differences:


In [13]:
# attempting to calculate the mean of the values given by my_listofsets
mean(my_listofsets)
# attempting to calculate the mean of the values given by my_listofsets2
mean(my_listofsets2)
# calculating the mean of the values given by the first set within my_listofsets2
mean(my_listofsets2$my_set1)
# attempting to calculate the mean of the values given by the second set within my_listofsets2
mean(my_listofsets2$my_set2)

“argument is not numeric or logical: returning NA”


“argument is not numeric or logical: returning NA”


“argument is not numeric or logical: returning NA”


```{note}
A list can contain sets of different length as we can see above. In a dataframe every column must have the same length.
```

In [None]:
#attempting to create a dataframe with different length columns
my_dataframe<-data.frame(my_column1=c(1,2,3), my_column2=c("A","B","C","D"))

### Types of Objects