# Week 04: Getting started with R and RStudio


## Introduction

This tutorial shows how to get started with R and it specifically focuses on R for analyzing language data but it offers valuable information for anyone who wants to get started with R. As such, this tutorial shows how to properly and tidily set up a project before you start coding and exemplify common operations such as loading and manipulation tabular data and generating basic visualization using R. 

### Goals of this tutorial

The goals of this tutorial are:

- How to get started with R 
- How to orient yourself to R and RStudio
- How to create and work in R projects
- How to know where to look for help and to learn more about R
- Understand the basics of working with data: load data, save data, working with tables, create a simple plot
- Learn some best practices for using R scripts, using data, and projects 
- Understand the basics of objects, functions, and indexing


### Preparation

Before you actually open R or RStudio, there things to consider that make working in R much easier and give your workflow a better structure. 

Imagine it like this: when you want to write a book, you could simply take pen and paper and start writing *or* you could think about what you want to write about, what different chapters your book would consist of, which chapters to write first, what these chapters will deal with, etc. The same is true for R: you could simply open R and start writing code *or* you can prepare you session and structure what you will be doing.

### Folder Structure and R projects

Before actually starting with writing code, you should prepare the session by going through the following steps:

#### Create a folder for your project

In that folder, create the following sub-folders (you can, of course, adapt this folder template to match your needs)

  - data (you do not create this folder for the present workshop as you can simply use the data folder that you downloaded for this workshop instead)
  - images
  - tables
  - docs

The folder for your project could look like the the one shown below.

![](https://slcladal.github.io/images/RStudio_newfolder.png)
   

Once you have created your project folder, you can go ahead with RStudio.

#### Open RStudio

This is what RStudio looks like when you first open it: 

![](https://slcladal.github.io/images/RStudio_empty.png)

In RStudio, click on `File` 
  
![](https://slcladal.github.io/images/RStudio_file.png)

You can use the drop-down menu to create a `R project`

#### R Projects

In RStudio, click on `New Project`
  
![](https://slcladal.github.io/images/RStudio_newfile.png)
  
Next, confirm by clicking `OK` and select `Existing Directory`.

Then, navigate to where you have just created the project folder for this workshop.
  
![](https://slcladal.github.io/images/RStudio_existingdirectory.png)
  
Once you click on `Open`, you have created a new `R project` 
  
#### R Notebooks
  
In this project, click on `File`
  
![](https://slcladal.github.io/images/RStudio_file.png)
  
Click on `New File` and then on `R Notebook` as shown below.

![](https://slcladal.github.io/images/RStudio_newnotebook.png) 

This `R Notebook` will be the file in which you do all your work.



#### Getting started with R Notebooks

You can now start writing in this R Notebook. For instance, you could start by changing the title of the R Notebook and describe what you are doing (what this Notebook contains).

Below is a picture of what this document looked like when I started writing it.

![](https://slcladal.github.io/images/RStudio_editMD.png)

When you write in the R Notebook, you use what is called `R Markdown` which is explained below.


#### R Markdown

The Notebook is an [R Markdown document](http://rmarkdown.rstudio.com/): a Rmd (R Markdown) file is more than a flat text document: it's a program that you can run in R and which allows you to combine prose and code, so readers can see the technical aspects of your work while reading about their interpretive significance. 

You can get a nice and short overview of the formatting options in R Markdown (Rmd) files [here](https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf).


R Markdown allows you to make your research fully transparent and reproducible! If a couple of years down the line another researcher or a journal editor asked you how you have done your analysis, you can simply send them the Notebook or even the entire R-project folder. 

As such, Rmd files are a type of document that allows to 

+ include snippets of code (and any outputs such as tables or graphs) in plain text while 

+ encoding the *structure* of your document by using simple typographical symbols to encode formatting (rather than HTML tags or format types such as *Main header* or *Header level 1* in Word).  

Markdown is really quite simple to learn and these resources may help:

+ The [Markdown Wikipedia page](https://en.wikipedia.org/wiki/Markdown) includes a very handy chart of the syntax.

+ John Gruber developed Markdown and his [introduction to the syntax](https://daringfireball.net/projects/markdown/syntax) is worth browsing.

+ This [interactive Markdown tutorial](http://www.markdowntutorial.com/) will teach you the syntax in a few minutes.

Here is an overview of the basic syntax of RMarkdown and what the text looks like once it is rendered into a  notebook.

![](https://github.com/MartinSchweinberger/SLAT7855/blob/master/images/rmarkdown.png?raw=true)

### R and RStudio Basics

RStudio is a so-called IDE - Integrated Development Environment. The interface provides easy access to R. The advantage of this application is that R programs and files as well as a project directory can be managed easily. The environment is capable of editing and running program code, viewing outputs and rendering graphics. Furthermore, it is possible to view variables and data objects of an R-script directly in the interface. 

### RStudio: Panes

The GUI - Graphical User Interface - that RStudio provides divides the screen into four areas that are called **panes**:

1. File editor
2. Environment variables
3. R console
4. Management panes (File browser, plots, help display and R packages).

The two most important are the R console (bottom left) and the File editor (or Script in the top left).
The Environment variables and Management panes are on the right of the screen and they contain: 

* **Environment** (top): Lists all currently defined objects and data sets
* **History** (top): Lists all commands recently used or associated with a project
* **Plots** (bottom): Graphical output goes here
* **Help** (bottom): Find help for R packages and functions.  Don't forget you can type `?` before a function name in the console to get info in the Help section. 
* **Files** (bottom): Shows the files available to you in your working directory

These RStudio panes are shown below.

![](https://slcladal.github.io/images/RStudioscreenshot.png)

#### R Console (bottom left pane)

The console pane allows you to quickly and immediately execute R code. You can experiment with functions here, or quickly print data for viewing. 

Type next to the `>` and press `Enter` to execute. 

For example, you can use R like a calculator.  Try typing `2+8` into the **R console** and press `Enter`.

Here, the plus sign is the **operator**.  Operators are symbols that represent some sort of action.  However, R is, of course, much more than a simple calculator.  To use R more fully, we need to understand **objects**, **functions**, and **indexing** - which we will learn about as we go.

For now, think of *objects as nouns* and *functions as verbs*. 

#### Running commands from a script

To run code from a script, insert your cursor on a line with a command, and press `CTRL/CMD+Enter`.

Or highlight some code to only run certain sections of the command, then press `CTRL/CMD+Enter` to run.

Alternatively, use the `Run` button at the top of the pane to execute the current line or selection (see below).

![](https://slcladal.github.io/images/RStudio_run.png)

#### Script Editor (top left pane)

In contrast to the R console, which quickly runs code, the Script Editor (in the top left) does not automatically execute code. The Script Editor allows you to save the code essential to your analysis.  You can re-use that code in the moment, refer back to it later, or publish it for replication.  


Now, that we have explored RStudio, we are ready to get started with R!

### Getting started with R

This section introduces some basic concepts and procedures that help optimize your workflow in R. 

#### Packages

When using R, most of the functions are not loaded or even installing automatically. Instead, most functions are in contained in what are called **packages**. 

R comes with about 30 packages ("base R").  There are over 10,000 user-contributed packages; you can discover these packages online.  A prevalent collection of packages is the Tidyverse, which includes ggplot2, a package for making graphics. 

Before being able to use a package, we need to install the package (using the `install.packages` function) and load the package (using the `library` function). However, a package only needs to be installed once(!) and can then simply be loaded. When you install a package, this will likely install several other packages it depends on.  You should have already installed tidyverse before the workshop. 

You must load the package in any new R session where you want to use that package.    Below I show what you need to type when you want to install the `tidyverse`, the `tidytext`,  the `quanteda`, the `readxl`, and the `tm` packages (which are the packages that we will need in this workshop).


In [None]:
#install.packages("tidyverse")
#install.packages("here")
#install.packages("flextable")


To load these packages, use the `library` function which takes the package name as its main argument.



In [None]:
library(tidyverse)
library(here)
library(flextable)


The session preparation section of your Rmd file will thus also state which packages a script relies on.

In script editor pane of RStudio, the code blocks that install and activate packages would look like this:

![](https://slcladal.github.io/images/RStudio_packages.png)

#### Getting help

When working with R, you will encounter issues and face challenges. A very good thing about R is that it provides various ways to get help or find information about the issues you face.

#### Finding help within R

To get help regrading what functions a package contains, which arguments a function takes or to get information about how to use a function, you can use the `help` function or the `apropos`. function or you can simply type a `?` before the package or two `??` if this does not give you any answers. 


In [None]:
help(tidyverse) 
apropos("tidyverse")
?require


There are also other "official" help resources from R/RStudio. 

* Read official package documentation, see vignettes, e.g., Tidyverse <https://cran.r-project.org/package=tidyverse>

* Use the RStudio Cheat Sheets at <https://www.rstudio.com/resources/cheatsheets/>

* Use the RStudio Help viewer by typing `?` before a function or package

* Check out the keyboard shortcuts `Help` under `Tools` in RStudio for some good tips 

#### Finding help online

One great thing about R is that you can very often find an answer to your question online.

* Google your error! See <http://r4ds.had.co.nz/introduction.html#getting-help-and-learning-more> for excellent suggestions on how to find help for a specific question online.

## Working with Data

We can now start actually working with R in RStudio. As this is an introduction, we will only do something very basic, namely loading a data set, inspecting it, and doing some very basic visualizations.

### Loading data

So let's start by loading a data set. 


In [None]:
dat <- read.delim(here::here("data/week4data1.txt"), sep = "\t")
# inspect data
dat


### Inspecting data



In [None]:
str(dat)



In [None]:
head(dat)



In [None]:
View(dat)



In [None]:
summary(dat)



In [None]:
ftable(dat$Class, dat$TestScore)



### A first plot



In [None]:
plot(dat$TestScore, 
     col = factor(dat$Class))


### Changing data

We will change the test scores of the students in the Korean class: to all scores in the Korean class, we will add 10 points. 


In [None]:
dat %>%
  dplyr::mutate(TestScores = ifelse(Class == "Korean", TestScores + 10, TestScores)) -> dat2


Visualize new data set



In [None]:
plot(dat2$TestScores, 
     col = factor(dat2$Class))


Now, we will add 20 points to the tests scores of the students in the Chinese class - but only if they have a test score lower than 30!



In [None]:
dat2 %>%
  dplyr::mutate(TestScores = ifelse(Class == "Chinese" & TestScores <= 30, 
                                    TestScores + 20, 
                                    TestScores)) -> dat3


Visualize new data set



In [None]:
plot(dat3$TestScores, 
     col = factor(dat3$Class))


### Saving data



In [None]:
write.table(dat, here::here("data", "dat3.txt"))



[Back to top](#getting_started_with_r_and_rstudio_1)

