# Introduction
## What is R and why use it?

R is a free software environment with its own programming language. 

It is especailly suited for statistical analysis and graphical outputs.

R's popularity as a data science tool as well as it being open source has made its applications vast.

R can work with a large variety of data formats and is (with a few add-ons) compatible with data from other software solutions (Excel, SPSS, SAS, STATA).

## Why use R for History?

<u>Data versatility:</u> It can work with numerical data, text data, dates etc.

<u>Complete workflow in one environment:</u> From data loading to cleaning to output (and even reporting, if you want)

<u>Lots of help to get:</u> Very large community offering their help via forums, free educational material, blogs and so on.

## Content of the R introduction

- The R language
- The RStudio environment
- Constructing variables and objects
- Working with strings
- Simple text mining with tables (dataframes)
- Using the help command
- Reading data from Excel
- Working with dates

The introduction will combine presenting R and R code in Jupyter Notebook while demonstrating in RStudio. You are encouraged to work and write in RStudio during the workshop.

*Please write along as we go through the different examples.*

*Download the slides via moodle to go back during the workshop, if necessary: https://forskning.moodle.aau.dk/course/view.php?id=4 (select "CAS" user and login)*

## The RStudio environment

During the workshop (and the other R workshops in CALDISS), we will be working with RStudio.

RStudio is an IDE for R (Integrated Development Environment) - Makes for a nicer workspace

<https://www.rstudio.com/products/rstudio/download/>

# The R language
R has it's own programming language. R works by you writing lines of code in that language (writing commands) and R interpreting that code (running commands).

R (and RStudio) has a limited user interface meaning almost all functionality (statistics, plots, simulations etc.) must be executed using code in the R language.

A programming language is a lot like any other language (except not being very dialogical): 
- You can only expect to be understood, if you speak the same language = R will only execute code written correctly
- You are contributing to the language by speaking it = Create your own functions in R that R will understand

## R as a calculator
So what does it mean that R interprets our code?
It means that you tell R to do something by writing a command and R will do that (if R can understand you).

R, for example, understands mathematical expressions:

In [1]:
2 + 5

In [2]:
0.37 * 256

I can "ask" R for different results and outputs using functions:

In [3]:
nchar("when in Rome, do as Romans do")

In [4]:
toupper("when in Rome, do as Romans do")

When you run a commmand that R doesn't know, R will throw an error:

In [5]:
finish_sentece("when in Rome")

ERROR: Error in finish_sentece("when in Rome"): could not find function "finish_sentece"


The commands in R are virtually endless, as you are able to create your own:

In [6]:
finish_sentence <- function(text) {
    paste0(text, ", do as Romans do")
}

finish_sentence("when in Rome")

# The R Language: Objects and Functions
R works by storing values in "objects". These objects can then be used in various commands like calculating differences, saving a file, creating a graph and so on. To simplify a bit: An object is some kind of stored value and a function is something that can manipulate a stored value (which then creates a new object). 

Most of R can be boiled down to these 3 basic steps:

1. Assign values to an object
2. Make sure R interprets the object correctly (its class)
3. Perfom some operation or manipulation on the object using a function

Translated to data analysis, the steps would (in general terms) look as follows:

1. Load our dataset: `data <- read.csv("my_datafile.csv")`
2. Check the that the variables are the correct class: `class(data$age)`
3. Perform some kind of analysis: `mean(data$age)`

The gap between these steps of course vary greatly.

## Objects
A lot of writing in R is about defining objects: A name to use to call up stored data.

Objects can be a lot of things: 
- a word
- a text
- a number
- a series of numbers
- a dataset 
- a corpus of texts
- a URL
- a formula
- a result 
- a filepath
- and so on...

When an object is defined, it is available in the current working space (or environment).

This makes it possible to store and work with a variety of informaiton simultaneously.

### Defining objects
Objects are defined using the `<-` operator:

In [7]:
a <- 2 + 5
a

In [8]:
b <- 'Rome'
b

Using `' '` or `" "` denotes that the code should be read as text.

Objects with text (known in programming as `strings`) can be as long as you like.

In [9]:
text <- "The ancient Greeks had several different theories with regard to the origin of the world, but the generally accepted notion was that before this world came into existence, there was in its place a confused mass of shapeless elements called Chaos. "

## Functions

When an object is created, we can use functions on them.

Most functions are written in the syntax of `function(object, option = something)`. A lot of functions only need the object as an arguement.

In [10]:
toupper(text) #Convert to uppercase

In [11]:
nchar(text) #Number of characters

Others take several arguements:

In [12]:
gsub("world", "cheese", text) #Pattern replacement

### Naming objects
Objects can be named almost anything but a good rule of thumb is to use names that are indicative of what the object contains.

#### Restrictions for naming objects
- Most special characters not allowed: `/`, `?`, `*`, `+` and so on (most characters mean something to R and will be read as an expression)
- Already existing names in R (will overwrite the function/object in the environment)

#### Good naming conventions 
- Using '`_`': `my_object`, `room_number`

or:

- Capitalize each word except the first: `myObject`, `roomNumber`

# EXERCISE 1: DEFINING OBJECTS

Below are two text snippets from the book "Myths and Legends of Ancient Greece and Rome" by E.M. Berens.

<u>Snippet 1:</u>

"Themis, who has already been alluded to as the wife of Zeus, was the daughter of Cronus and Rhea, and personified those divine laws of justice and order by means of which the well-being and morality of communities are regulated. She presided over the assemblies of the people and the laws of hospitality."

<u>Snippet 2:</u>

"Athene was universally worshipped throughout Greece, but was regarded with special veneration by the Athenians, she being the guardian deity of Athens. Her most celebrated temple was the Parthenon, which stood on the Acropolis at Athens, and contained her world-renowned statue by Phidias, which ranks second only to that of Zeus by the same great artist."

**1.** Assign each snippet to its own object (make up your own object names or use `mytext1` and `mytext2`).

**2.** The function `nchar()` returns the number of characters. Determine which text snippet has the most characters.

In [13]:
mytext1 <- "Themis, who has already been alluded to as the wife of Zeus, was the daughter of Cronus and Rhea, and personified those divine laws of justice and order by means of which the well-being and morality of communities are regulated. She presided over the assemblies of the people and the laws of hospitality."
mytext2 <- "Athene was universally worshipped throughout Greece, but was regarded with special veneration by the Athenians, she being the guardian deity of Athens. Her most celebrated temple was the Parthenon, which stood on the Acropolis at Athens, and contained her world-renowned statue by Phidias, which ranks second only to that of Zeus by the same great artist."

In [14]:
nchar(mytext1)
nchar(mytext2)

nchar(mytext1) - nchar(mytext2)

# Different types of objects (classes)
R distinguishes between different types of objects.

An objects is stored as a *class*. The class denotes what type of object it is and affects what operations are possible.

## Numeric and character classes
As you work with R, you will encounter a lot of different classes. For now we will be focusing on two of the more common ones:
- Numeric classes
- Character classes

Numbers are automatically stored as a numeric class (or one of the variants: double, integer etc.).

When using `''` or `""` around the information to be stored in the object, R will interpret that as text; meaning it will be stored as a character class. 

*Numbers enclosed in `''` or `""` are therefore stored as a character class, as R interprets it as text!*

R has to be told that something is text as R would otherwise interpret it as an object.

In [15]:
mytext3 <- rome

ERROR: Error in eval(expr, envir, enclos): objekt 'rome' blev ikke fundet


## Coercing classes
The class of an object can be examined with `class(object)`.

Objects can be coerced with specific functions:

- Coerce to character class:`as.character(object)`
- Coerce to numeric class: `as.numeric(object)`

R will always try to "guess" the class. If R guesses wrong, you can tell R what class it should be (if possible).

# R scripts: For reproducability! 
Script files are text files containing code that R can interpret.

It is your "analysis recipe" showing what you have done as well as allowing you to re-run commands easily.

Always make a habit of writing your commands into a script, when you have the command figured out.

- `#` can be used for comments (skipped when run)
- `Ctrl` + `Enter`: Runs the current line or selection
- `Ctrl` + `Alt` + `R`: Runs the whole script

# EXERCISE 2: CLASSES

1. Assign the number of characters of text snippet 1 (`nchar(mytext1)`) to the object `text1_nc`.
2. Check the classes of your object `mytext1` and `text1_nc` with `class()`. What are they?
3. Try changing the class of `text1_nc` to a character class with `as.character()`. Is it possible?
4. Try changing the class of `mytext1` to a numeric class with `as.numeric()`. Is it possible?

**Bonus**
1. Assign the number of characters of text snippet 2 to another object
2. Test if the number of characters of the two texts are the same with the operator `==`:
    - `text1_nc == text2_nc`
3. Assign the test above to the object `mytest`
4. Check the class of `mytest`. What class is it?

In [16]:
text1_nc <- nchar(mytext1)
text2_nc <- nchar(mytext2)

mytest <- text1_nc == text2_nc

In [17]:
as.character(text1_nc)

In [18]:
as.numeric(mytext1)

"NAs introduced by coercion"

In [19]:
class(mytest)

# The logical class
Logicals are *boolean* objects meaning they will either have the value `TRUE` or `FALSE`.

When using the following operators (among others), R will interpret it as a logical class:
- `>`
- `>=`
- `<`
- `<=`
- `==`
- `!=`

Logicals can be used in functions, loops and if-statements to ensure that a certain condition is met before something is run.

# BREAK

![cat_window](https://2.bp.blogspot.com/-C8QgO2Yd3ew/TbMiWtd-VEI/AAAAAAACFu8/AhHEIYkfvnU/s1600/cats_chillin_02.jpg)