## Taking a look around in R and RStudio

Notice the various windows. 

1. The Console allows you to enter commands and view results
2. The Environment window will show you the various objects R is keeping track of (vectors, datasets, lists, etc.)
3. The bottom right window will show you files, figures/graphs, help files, and more. 
4. If you create/open a script, it will appear in the top-right by default. I encourage using an R project and R script for nearly all work. 

## Start an RStudio project

Steps:

- File &rarr; New Project
- Choose a name for the project and folder location.^[I have a folder called "labs" located in the my main PUBG 511 folder. If you are primarily going to be using RStudio through the computer lab or the Virtual App, then place this folder in your H:\ network space]
- Now start a new script using the sheet with a plus sign icon from the toolbar or using the file menu. On the right side of your window, it should show the name of your project.
- RStudio projects make it so you don't need to worry about setting a working directory - it is defined in the project. Just make sure all files written to or read by R are in the same folder or in a nested subfolder. 

## Installing and loading packages

R is open-source, and, frankly, kinda stinks on its own.^[This is called base R.] But there are many, many user-generated packages that improve R's functionality. We'll be using these packages all the time, especially a group of packages called [the tidyverse](https://www.tidyverse.org/packages/). 

You only need to install the package once and then you're good to go (until it needs updating). But you also need to load the package in every R session if you want to use those commands. 

In [None]:
# Install the required packages if not already installed 
# By the way, the hashtag/pound symbol will comment out a line in your script

# install.packages(c('tidyverse', 'haven'))

# let's load your packages in the R session

library(tidyverse)
library(haven)

## Some basic built-in functions

R can handle a great diversity of *objects* including lists, variables, names, vectors, data frames, scalars, and plots. Let's create a vector of data using a random draw from a normal distribution and then use two functions to describe the variable.

In [None]:
x <- rnorm(2500, mean = 50, sd = 10)
mean(x)
summary(x)

We can check the distribution of our variable using a histogram.

In [None]:
hist(x)

We could change the number of bins used by the histogram if we want and change the y-axis from a frequency to a probability (much better). R options will typically require a value (like the 30 in the breaks option below) or a logical entry of TRUE or FALSE to toggle settings on or off.

In [None]:
hist(x, breaks = 30, probability = TRUE)

Let's run hist() once more, this time suppressing the data and asking R to give use the output in the console.

In [None]:
hist(x, breaks = 30, probability = TRUE, plot = FALSE)

## Opening datasets

R has various formats for datasets, typically called a data frame. Before you run the following code, download the ABH_full_district file from [The Journal of Politcs' Dataverse](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VR12G4) using your web browser. Download it in two formats: RData and Stata 13 Binary. Be sure to place your files in the same folder that you have set as your project location; that way you will not need to specify working directories or complete file paths; your computer will already know were to look for the data.

We can load an Rdata workspace using the *load()* function. To open the Stata .dta file, we'll need to use functionality from the **haven** package. The *ls()* function will list the defined datasets and functions. 

In [None]:
load("ABH_full_district.Rdata")
ls()

So we have a new dataset called "table". Let's remove it from our R environment:

In [None]:
remove(table)

Now let's use **haven**'s *read_dta* function to import the dataset from the Stata .dta file. Let's also assign that new file to the name *ABH.data*. 

In [None]:
ABH.data <- read_dta("ABH_full_district.dta")
ls()