In this file, we'll walk through the steps we need to begin programming with R on our own computer. We'll install R, as well as [RStudio](https://rstudio.com/), an extremely popular development environment for R. We'll guide through features of RStudio so that we're ready to start experimenting with R programming on our own.

When we work in RStudio, we'll be able to see our code, notes, objects, and figures organized in multiple panels of a single screen:

![image.png](attachment:image.png)

What exactly is RStudio, and why are we installing it in addition to R?

While the R language dates back to 1995, RStudio was introduced in 2011. RStudio is a free, open-source [Integrated Development Environment (IDE)](https://en.wikipedia.org/wiki/Integrated_development_environment), a software suite designed to organize all our programming tasks in one place and improve the efficiency of our workflow. While there are other IDEs available for use with R ([Microsoft Visual Studio for R](https://docs.microsoft.com/en-us/visualstudio/rtvs/?view=vs-2017) and [Eclipse](https://www.eclipse.org/)), RStudio is by far the most popular.

Using RStudio for our data science workflow provides many advantages:

* An intuitive interface that lets us keep track of saved objects, scripts, and figures
* A text editor with features like color-coded syntax that helps us write clean scripts
* An autocomplete feature
* Tools for creating documents containing a project's code, notes, and visuals

The RStudio team is also behind many new packages that expand R's data science capabilities.

We'll be following this guide to install R and RStudio on our own computer. If we run into issues while working through the setup process, we can try:

* Search Google and StackOverflow for the error message we received. Try debugging the issue ourself.


Before we can install RStudio, we'll need to have a recent version of R installed on our computer. Navigate to https://cran.r-project.org/.

We should see the following section when arriving on the home page:

![image.png](attachment:image.png)

The version of R that we download will depend on our operating system. Below, we include installation instructions for Mac OS X, Windows, and Linux (Ubuntu).

### OS X

1. Select the `Download R for (Mac) OSX` option.
2. Select the most up-to-date version of R (new versions are released frequently).
3. Follow the standard instructions for installing applications on OS X.
4. Drag and drop the R application into our `Applications` folder.

### Windows

1. Select the `Download R for Windows` option.
2. Select `base`, since this is our first installation of R on our computer.
3. Follow the standard instructions for installing programs for Windows. If we are asked to select: `Customize Startup` or `Accept Default Startup` Options, choose the default options.

### Linux/Ubuntu

1. Select the `Download R for Linux` option.
2. Select the `Ubuntu` option.

After we've installed R, open the console and try writing a few expressions.

Now that we've installed R, we can install RStudio. Navigate to the [RStudio downloads page](https://rstudio.com/products/rstudio/download/).

The screenshots we use to illustrate steps were taken on a machine running Ubuntu and RStudio version 1.1.453. Users of different operating systems will generally follow the same steps, but we'll let know if there are specific differences.

When we reach the RStudio downloads page, select the `RStudio Desktop Open Source License FREE` option. This should take us to the following page:

![image.png](attachment:image.png)

If we're running OS X or Windows, select the corresponding option. For Ubuntu, we'll see 32 bit or 64 bit options, which refer to a computer's [instruction set](https://en.wikipedia.org/wiki/Instruction_set_architecture).

If we're unsure whether we should install the 32 or 64 bit option for our computer, [here's how to check](https://askubuntu.com/questions/41332/how-do-i-check-if-i-have-a-32-bit-or-a-64-bit-os).

Download RStudio from http://rstudio.org/download/desktop. Choose `RStudio Desktop (FREE)`.

Then, follow the appropriate installation instructions for our computer's operating system:

### OS X:

1. Select the Mac OSX version.
2. Install RStudio by dragging into Applications folder.

### Windows:

1. Select the Windows version.
2. Download and open RStudio from programs.

### Ubuntu:

1. Select the Ubuntu 32 or 64 bit version.
2. Download and open RStudio.

Once we've got RStudio installed and running, take some time to familiarize ourself with its layout.

For now, experiment with resizing the windows and clicking through the tabs.

Let's start off by introducing some features of the console. In RStudio, the console is located in the bottom left corner:

![image.png](attachment:image.png)

We may notice the highlighted portion of the RStudio interface contains two tabs: `console` and `terminal`. We'll focus on the console for now, which is where we'll be doing most of our programming work.

When we open RStudio, the console contains information about the version of R we're working with. Scroll down, and try typing a few expressions.

We can use the console to test code immediately. When we type an expression like 45 + 5, we'll see the output below after hitting enter.

If we type code to create data visualization plots into the console, the plots will appear in the plots tab in the window at the bottom right of the interface.

Try typing the expression below. Note what happens if we forget to close a pair of parentheses:

**3 * (5+6+10**

We can either type the second half of the pair of parentheses after the `+` symbol in the console, or escape the expression by hitting our `Escape` key.

One nice feature from RStudio is a keyboard shortcut for typing the assignment operator `<-`.

* Windows/Linux: "Alt" + "-"
* Mac: "Option" + "-"

Try creating a vector :

**vector_1 <- c(1,3,5,7,8,9)**

Type `vector_1` into the console, and we'll see the output.

Calculate the mean of `vector_1`:

**mean(vector_1)**

Notice that as we begin typing `mean()` into the console, a list of functions beginning with "m" pops up. We can either use our touchpad, mouse, or the arrow keys on our keyboard to select a function, and it will appear in the console.

![image.png](attachment:image.png)

When we create a vector in RStudio, it is saved as an object in the **R global environment**.

We can think of the global environment as our workspace. During a programming session in R, any variables we define or data we import and save in a data frame are stored in our global environment. In RStudio, we can see the objects in our global environment in the `Environment` tab at the top right of the interface:

![image.png](attachment:image.png)

We'll see any objects we created, like `vector_1`, under `values` in the `Environment` tab. Notice that the data type (`[num]`) and the values stored in the vector are also displayed.

Sometimes, having too many named objects in the global environment creates confusion. Maybe we'd like to remove all or some of the objects. To remove all objects, click the broom icon at the top of the window:

![image.png](attachment:image.png)

If we want to remove selected objects from the workspace, select the `grid` view from the dropdown menu:

![image.png](attachment:image.png)

Next, check the boxes of the objects we'd like to remove and use the broom icon to clear them from our global environment.

When we write code to import a data set into RStudio, it's best to make sure that we have the [working directory](https://en.wikipedia.org/wiki/Working_directory) set to the location where our data files are stored. Otherwise, we'll have to specify the file location.

There are a few ways to set the working directory in RStudio.

* Use the `setwd()` function. For example, if we wanted to set my working directory to my desktop, we would type: `setwd("~/Desktop")`.
* Navigate to and choose the working directory from the `Session` menu:

![image.png](attachment:image.png)

To print our working directory in the console, we can type `getwd()`.

We'll often want to import data into R. Data sets we import into R as data frames will be displayed in the `Environment` tab as well.

As an alternative to typing code in the console, we can import data sets into RStudio using the `Import Dataset` feature in the `Environment` tab:

![image.png](attachment:image.png)

We recommend using the `readr` option, which uses functions, like `read_csv()`. Using the readr package will import our data as a specialized data frame called a `tibble`, which has many advantages for improving our data science workflow.

If we import data into R using the `Import Dataset` feature, the data will be automatically displayed in a tab in the window in the top left of the interface.

If we import our data into R by typing functions `like read_csv()` in the `console`, R does not automatically open the data file. To display our data, use the function `View()`.

**View(recent_grads)**

There are two additional tabs in this window: `History` and `Connections`. We'll hold off on discussing the **Connections tab**, which is used for working with databases in RStudio. The `History` tab contains a saved record of commands we have ever typed into the console. This can be a useful feature if we forgot something we did a few weeks ago and need to access our code.

Searching through our code history can be frustrating, though. It's helpful to organize and save our code using **scripts**.

As our projects become more complex and we write longer blocks of code, it will be helpful to organize our code into a **script**. This allows us to keep track of our work on a project, write clean code with plenty of notes, repeatedly run code, and share it with others.

In RStudio, we can write scripts in the text editor window at the top left of the interface:

![image.png](attachment:image.png)

To create a new script, we can use the commands in the file menu:

![image.png](attachment:image.png)

We can also use the keyboard shortcut `Ctrl+Shift+N`.

When we save a script, it has the file extension `.R`.

Create a new script, and try typing an expression. To run the line of code we typed into our script in the console below, we can either click `run` at the top right of the script, or use the following keyboard commands:

* OS X: `Cmd + Enter`
* Windows and Linux: `Ctrl + Enter`

We can also highlight multi-line chunks of our code that we want to run.

Sometimes, we'll write a script we want to run all at once. To highlight all the code in our script, we can use the command:

* OS X: `Cmd + a`
* Windows and Linux: `Ctrl + a`

When writing a script, it's good practice to begin at the top by writing code to load the packages we'll need to run the script:

* **library(readr)**
* **library(dplyr)**

If we need to check which packages we have loaded, we can refer to the Packages tab in the window at the bottom right of the console. We can search for packages, and checking the box next to a package will load it (the code will appear in the console).

![image.png](attachment:image.png)

As we write scripts, it's good practice to get used to adding comments to them to explain our code (`# like this`). Often, in our career as a data scientist, we'll likely share our code with colleagues and collaborators. Ensuring they understand our methods will be very important.

**Accessing Saved Scripts**

We'll know when a script has been saved, because the name text will be black:

![image.png](attachment:image.png)


When a script has unsaved changes, it will show up as red and will be followed by an asterisk:

![image.png](attachment:image.png)

After We've written some practice code in our script, save it, and then close the script.

To open a saved script, we can use either of the following methods:

* Go to `File > Open File` and navigate to the script.

    or
    
* Navigate to the script in the `Files` tab in the window in the bottom right screen of the interface:

![image.png](attachment:image.png)