# A.1 Working with data in R

### A.1.1 What are the other advantages of using R?
- We can be lazy and use the thousands of free libraries to easily:
    - Easily manipulate data (Today's topic)
    - Download data directly from the internet
    - Viusualize our data (graphing etc.)
    - Build models (Regression, Machine learning, Neural Networks)
    
    
- You have all used libraries before, perhaps without knowing it!
    - This is done in R in two steps: 
        1. install.packages("Package name") Downloads package
        2. library(Package name) Imports package

### A.1.2 What is a Data Frame? 
- Think of it as an excel sheet with data
- In many cases:
    - Rows are observations (e.g. people, households, countries, time)
    - Columns are variables (e.g. GDP, life expectancy)
        


## A.2 What applications can I use to run R?
R is not a software, it is a coding language! So there are multiple applications which can run R in

### A.2.1 R-studio cloud: RECOMMENDED
- Many students get frustrated because there are sometimes bugs which prevent the software from running smoothly
- R-studio cloud takes the hassle out of the setup of r-studio and allows for us to focus on learning R!
- Register for r-studio for free [here](https://rstudio.cloud/)

### A.2.2 R-studio local on your computer
- R and r-studio can be downloaded at the link
    - [link]((https://rstudio.com/products/rstudio/download/))

### A.2.3 R in Jupyter notebook (what this tutorial is written in).

- Like R-studio Jupyter can be either run on your computer or the cloud:
- Jupyter on your computer: 
    - [Anaconda software](https://www.anaconda.com/)
    
- Jupyter on the cloud 
    - [R-studio cloud](https://rstudio.cloud/)
    - [Azure cloud](https://notebooks.azure.com/)
        

# B. How to use R-studio cloud¶

## B.1 Create a new project

![title](Images/Cloud_step_1.png)

# R-studio. The very basics
- Note this tutorial is not written in R-studio
    - The r-studio code can be found in the same folder labeled Basic.r


#### Starting R: Create new project and the screen below should show up

## R.1 Launching R-studio
- Always click the button to write code in an r file.
- This will allow you to save your work
- For those of you familiar with Stata it is similar to a do-file



![title](Images/R_studio_1.png)

## R.2 Running code
- The code you run in the top will display below in the console
- You run the code by highlighting it and clicking the run button
    - The shortcut on mac is <kbd>Cmd</kbd>+<kbd>Return</kbd>


![title](Images/R_studio_2.png)

In [1]:
print('ALWAYS WRITE YOUR CODE HERE')
print('RUN THE CODE BY HIGHLIGHT THIS LINE OF CODE AND CLICKING THE RUN BUTTON TO THE UPPER RIGHT')
print('THE SHORTCUT ON MAC IS COMMAND,SHIFT')

[1] "ALWAYS WRITE YOUR CODE HERE"
[1] "RUN THE CODE BY HIGHLIGHT THIS LINE OF CODE AND CLICKING THE RUN BUTTON TO THE UPPER RIGHT"
[1] "THE SHORTCUT ON MAC IS COMMAND,SHIFT"


## R.3 Writing comments


In [2]:
#You can write comments by using the hashtag
#Commented code will not be written

print('The commented code will display in the console but will not be read as code')

[1] "The commented code will display in the console but will not be read as code"


# 1. How to save data


## 1.1 Saving text (a.k.a. string objects)

- When creating any new object in R it cannot contain spaces
- When you save a new object it will NOT display in the console!
    - Instead it will display in the upper right section "Environment"

![title](Images/R_studio_3.png)

In [3]:
new_string = "This is a string"

### 1.1.1 Now print our object "new_string"

In [4]:
#Use either print or simply type in the name of the object
print(new_string)

[1] "This is a string"


In [5]:
new_string

# 2. Import (save) excel data

- The data should be in the same folder as you r-studio project
    - Check the lower right coroner to see if a csv file is in the same folder
- You can also import from other folders which we will discuss below

## 2.A.1 Import data into r-studio cloud
![title](Images/R_studio_4_1.png)
![title](Images/R_studio_4_2.png)
![title](Images/R_studio_4_3.png)

## 2.1 Read csv file: On your computer
- We will create a new object like we did for our string!
- Remember that the object can be named anything you like as long as it does not have spaces!
- The df means dataframe and is often used in R
- The import will be succesfull if you can see a new object in the upper right environment panel

Common mistake to avoid:
- The name of the file must be in quotes and must have a .csv extension

`Function:` read.csv("NAMEOFFILE.csv")

In [7]:
df = read.csv('vote.csv')

### 2.1.1 Common mistake
- If you do not set the name of the object it will read the file but not save!

In [8]:
#Notice there is no = sign as we have in 2.1
read.csv('vote.csv')

## 2.2 Read csv file in sub-folder
- read.csv("FOLDER_NAME/FILE_NAME.csv")
    - In the example below we import "arrests.csv" from the "Sub_folder" folder

In [10]:
df_arrests = read.csv("Sub_folder/arrests.csv")

## 2.3 Read csv file: From github

In [11]:
url = 'https://raw.githubusercontent.com/corybaird/PLCY_610_public/master/Discussion_sections/Disc1_Intro/vote.csv'
df = read.csv(url)

# 3. Basics of the dataframe

## 3.1 Display: first lines of the dataframe

`Function:` head(DF_NAME, # of lines you want to display)

In [12]:
# Shows the first 3 lines
head(df, 3)

Unnamed: 0_level_0,state,vote,income,education,age,sex
Unnamed: 0_level_1,<chr>,<int>,<int>,<int>,<int>,<int>
1,AR,1,9,2,73,0
2,AR,1,11,2,24,0
3,AR,0,12,2,24,1


## 3.2 Display: column names

`Function:` names(DF_NAME)

In [13]:
names(df)

## 3.3 Display: summary stats
`Function:` summary(DF_NAME)

In [14]:
summary(df)

    state                vote            income        education   
 Length:1502        Min.   :0.0000   Min.   : 4.00   Min.   :1.00  
 Class :character   1st Qu.:1.0000   1st Qu.: 9.00   1st Qu.:2.00  
 Mode  :character   Median :1.0000   Median :13.00   Median :3.00  
                    Mean   :0.8555   Mean   :12.06   Mean   :2.65  
                    3rd Qu.:1.0000   3rd Qu.:16.00   3rd Qu.:4.00  
                    Max.   :1.0000   Max.   :17.00   Max.   :4.00  
      age              sex        
 Min.   :  5.00   Min.   :0.0000  
 1st Qu.: 36.00   1st Qu.:0.0000  
 Median : 49.00   Median :1.0000  
 Mean   : 49.28   Mean   :0.5593  
 3rd Qu.: 62.00   3rd Qu.:1.0000  
 Max.   :120.00   Max.   :1.0000  

## 3.4 Selecting a column
- Select the dataframe column with a '$\mathit{\$}$'
    - DATEFRAME$\mathit{\$}$COLUMNNAME
    
- Use the mean function to find average age

`Function:` mean(DF_NAME$\mathit{\$}$Col_Name)

`Function:` sd(DF_NAME$\mathit{\$}$Col_name)

In [15]:
#Shows mean of age column
mean(df$age)

In [16]:
#Shows standard deviation of age column
sd(df$age)