# R for History

## Why use R for History?

<u>Data versatility:</u> It can work with numerical data, text data, dates etc.

<u>Complete workflow in one environment:</u> From data loading to cleaning to output (and even reporting, if you want)

<u>Lots of help to get:</u> Very large community offering their help via forums, free educational material, blogs and so on.

## Content of the workshop

- R refresher: objects, classes, functions, logical values
- Creating strings / character objects
- Working with strings with stringR
- Finding patterns with regular expressions
- Texsts as vectors
- Simple text mining with tables (dataframes)

The introduction will combine presenting R and R code in Jupyter Notebook while demonstrating in RStudio. You are encouraged to work and write in RStudio during the workshop.

*Please write along as we go through the different examples.*

# R Refresher

## Objects
A lot of writing in R is about defining objects: A name to use to call up stored data.

Objects can be a lot of things: 
- a word
- a text
- a number
- a series of numbers
- a dataset 
- a corpus of texts
- a URL
- a formula
- a result 
- a filepath
- and so on...

When an object is defined, it is available in the current working space (or environment).

This makes it possible to store and work with a variety of informaiton simultaneously.

### Defining objects
Objects are defined using the `<-` operator:

In [14]:
a <- 2 + 5
print(a)

[1] 7


In [15]:
b <- 'Rome'
print(b)

[1] "Rome"


Using `' '` or `" "` denotes that the code should be read as text.

Objects with text (known in programming as "strings") can be as long as you like.

In [16]:
text <- "When in Rome, do as Romans do"

## Functions

When an object is created, we can use functions on them.

Most functions are written in the syntax of `function(object, option = something)`. A lot of functions only need the object as an arguement.

In [17]:
toupper(text) #Convert to uppercase

In [18]:
nchar(text) #Number of characters

Others take several arguements:

In [19]:
gsub("world", "cheese", text) #Pattern replacement

When you run a commmand that R doesn't know, R will throw an error:

In [20]:
finish_sentece("when in Rome")

ERROR: Error in finish_sentece("when in Rome"): could not find function "finish_sentece"


The commands in R are virtually endless, as you are able to create your own:

In [21]:
finish_sentence <- function(text) {
    output = paste0(text, ", do as Romans do")
    print(output)
}

finish_sentence("when in Rome")

[1] "when in Rome, do as Romans do"


### Naming objects
Objects can be named almost anything but a good rule of thumb is to use names that are indicative of what the object contains.

#### Restrictions for naming objects
- Most special characters not allowed: `/`, `?`, `*`, `+` and so on (most characters mean something to R and will be read as an expression)
- Already existing names in R (will overwrite the function/object in the environment)

#### Good naming conventions 
- Using '`_`': `my_object`, `room_number`

or:

- Capitalize each word except the first: `myObject`, `roomNumber`

## Different types of objects (classes)
R distinguishes between different types of objects.

An objects is stored as a *class*. The class denotes what type of object it is and affects what operations are possible.

### Numeric and character classes
As you work with R, you will encounter a lot of different classes. For now we will be focusing on two of the more common ones:
- Numeric classes
- Character classes

Numbers are automatically stored as a numeric class (or one of the variants: double, integer etc.).

When using `''` or `""` around the information to be stored in the object, R will interpret that as text; meaning it will be stored as a character class. 

*Numbers enclosed in `''` or `""` are therefore stored as a character class, as R interprets it as text!*

R has to be told that something is text as R would otherwise interpret it as an object.

## The logical class
Logicals are *boolean* objects meaning they will either have the value `TRUE` or `FALSE`.

When using the following operators (among others), R will interpret it as a logical class:
- `>`
- `>=`
- `<`
- `<=`
- `==`
- `!=`

Logicals can be used in functions, loops and if-statements to ensure that a certain condition is met before something is run.

Certain functions will also return boolean values:

In [13]:
startsWith("t", text)

## EXERCISE: REFRESHER

1. Find two text snippets and assign each to its own object (fx text snippets from https://gutenberg.org/files/22381/22381.txt)

2. Determine which text snippet is longest using the function `nchar()`

3. Use a logical operator to get R to tell you, whether your text snippets have more than 400 characters.