# Session 1--Rstudio and downloading (September 10th)


## Session 1 is composed of 3 parts
1. Module 1 covers Rstudio basics 
    - Current page
2. Module 2 covers downloading 
    - [Link](https://github.com/corybaird/SPP_Data_Seminar/blob/main/R/Session_1/Module_2_Download.ipynb)
3. Excercise to practice material covered in Modules 1 & 2
    - [Link](https://github.com/corybaird/SPP_Data_Seminar/blob/main/R/Session_1/Session_1_Excercise.ipynb)



# Module 1: Rstudio basics

            




# A.1 Working with data in R

### A.1.1 What are the other advantages of using R?
- We can be lazy and use the thousands of free libraries to easily:
    - Easily manipulate data (Today's topic)
    - Download data directly from the internet
    - Viusualize our data (graphing etc.)
    - Build models (Regression, Machine learning, Neural Networks)
    
    
- You have all used libraries before, perhaps without knowing it!
    - This is done in R in two steps: 
        1. install.packages("Package name") Downloads package
        2. library(Package name) Imports package

### A.1.2 What is a Data Frame? 
- Think of it as an excel sheet with data
- In many cases:
    - Rows are observations (e.g. people, households, countries, time)
    - Columns are variables (e.g. GDP, life expectancy)
        


### A.1.3 What applications can I use to run R?
R is not a software, it is a coding language! So there are multiple applications which can run R in



#### A.1.3.1 R with Rstudio on your computer 
- [link]((https://rstudio.com/products/rstudio/download/))

#### A.1.3.2 R with Jupyter notebook (what this tutorial is written in)

- Jupyter on your computer
    - [Anaconda software](https://www.anaconda.com/)
- Jupyter on the cloud
    - [Google colab](https://colab.to/r)
    - [R-studio cloud](https://rstudio.cloud/)
    - [Azure cloud](https://notebooks.azure.com/)
        

# B. How to use R-studio cloud

## B.1 After you sign up for an account and are added to the class this is what your R-studio cloud should look like

![title](Images/Cloud_step_1.png)

## B.2 Click the start button to launch your own version of the assignment
- Note: You can make as many changes to these files as you want! 
    - Changes won't change the master file for other studnets

![title](Images/Cloud_step_2.png)

## B.3 You can return to the assignment at a later time with your code saved

![title](Images/Cloud_step_3.png)

## B.4 If you have a problem with your code then I can go into your files directly and check what is wrong
- Note: Other students will not be able to see your work. Only the TAs

![title](Images/Cloud_step_4.png)

# R-studio. The very basics
- Note this tutorial is not written in R-studio
    - The r-studio code can be found in the same folder labeled Basic.r


#### Starting R: Create new project and the screen below should show up

## R.1 Launching R-studio
- Always click the button to write code in an r file.
- This will allow you to save your work
- For those of you familiar with Stata it is similar to a do-file



![title](Images/R_studio_1.png)

## R.2 Running code
- The code you run in the top will display below in the console
- You run the code by highlighting it and clicking the run button
    - The shortcut on mac is <kbd>Cmd</kbd>+<kbd>Return</kbd>


![title](Images/R_studio_2.png)

## R.3 Writing comments


In [1]:
#You can write comments by using the hashtag
#Commented code will not be written

print('The commented code will display in the console but will not be read as code')

[1] "The commented code will display in the console but will not be read as code"


# 1. How to save data


## 1.1 Saving text (a.k.a. string objects)

- When creating any new object in R it cannot contain spaces
- When you save a new object it will NOT display in the console!
    - Instead it will display in the upper right section "Environment"

![title](Images/R_studio_3.png)

In [2]:
new_string = "This is a string"

### 1.1.1 Now print our object "new_string"

In [3]:
#Use either print or simply type in the name of the object
print(new_string)

[1] "This is a string"


In [4]:
new_string

# 2. Import (save) excel data

- The data should be in the same folder as you r-studio project
    - Check the lower right coroner to see if a csv file is in the same folder
- You can also import from other folders which we will discuss below

![title](Images/R_studio_4.png)

## 2.1 Read csv file
- We will create a new object like we did for our string!
- Remember that the object can be named anything you like as long as it does not have spaces!
- The df means dataframe and is often used in R
- The import will be succesfull if you can see a new object in the upper right environment panel

Common mistake to avoid:
- The name of the file must be in quotes and must have a .csv extension

`Function:` read.csv("NAMEOFFILE.csv")

In [2]:
df = read.csv('vote.csv')

### 2.1.1 Common mistake
- If you do not set the name of the object it will read the file but not save!

In [3]:
#Notice there is no = sign as we have in 2.1
read.csv('vote.csv')

state,vote,income,education,age,sex
AR,1,9,2,73,0
AR,1,11,2,24,0
AR,0,12,2,24,1
AR,1,16,4,40,0
AR,1,10,4,85,1
AR,1,12,3,78,1
AR,0,14,4,31,0
AR,1,10,1,75,0
AR,1,17,2,54,0
AR,1,8,1,78,0


## 2.2 Read csv file in sub-folder
- read.csv("FOLDER_NAME/FILE_NAME.csv")
    - In the example below we import "arrests.csv" from the "Sub_folder" folder

In [4]:
df_arrests = read.csv("Sub_folder/arrests.csv")

# 3. Basics of the dataframe

## 3.1 Display: first lines of the dataframe

`Function:` head(DF_NAME, # of lines you want to display)

In [5]:
# Shows the first 3 lines
head(df, 3)

state,vote,income,education,age,sex
AR,1,9,2,73,0
AR,1,11,2,24,0
AR,0,12,2,24,1


## 3.2 Display: column names

`Function:` names(DF_NAME)

In [6]:
names(df)

## 3.3 Display: summary stats
`Function:` summary(DF_NAME)

In [7]:
summary(df)

 state          vote            income        education         age        
 AR: 501   Min.   :0.0000   Min.   : 4.00   Min.   :1.00   Min.   :  5.00  
 SC:1001   1st Qu.:1.0000   1st Qu.: 9.00   1st Qu.:2.00   1st Qu.: 36.00  
           Median :1.0000   Median :13.00   Median :3.00   Median : 49.00  
           Mean   :0.8555   Mean   :12.06   Mean   :2.65   Mean   : 49.28  
           3rd Qu.:1.0000   3rd Qu.:16.00   3rd Qu.:4.00   3rd Qu.: 62.00  
           Max.   :1.0000   Max.   :17.00   Max.   :4.00   Max.   :120.00  
      sex        
 Min.   :0.0000  
 1st Qu.:0.0000  
 Median :1.0000  
 Mean   :0.5593  
 3rd Qu.:1.0000  
 Max.   :1.0000  

# 4. Inidvidual column manipulation

- Select the dataframe column with a '$\mathit{\$}$'
    - DATEFRAME$\mathit{\$}$COLUMNNAME
    
- Use the mean function to find average age

`Function:` mean(DF_NAME$\mathit{\$}$Col_Name)

`Function:` sd(DF_NAME$\mathit{\$}$Col_name)

In [8]:
#Shows mean of age column
mean(df$age)

In [9]:
#Shows standard deviation of age column
sd(df$age)

# Review of functions used in Disc1_intro

- `Function:` read.csv("NAMEOFFILE.csv")
    - Reads csv
- `Function:` ls()
    - Shows saved objects
- `Function:` head(DF_NAME, # of lines you want to display)
    - Show first lines of data frame
- `Function:` names(DF_NAME)
    - Shows column names
- `Function:` summary(DF_NAME)
    - Shows summary stats
- `Function:` mean(DF_NAME$\mathit{\$}$Col_Name)
    - Shows mean of certain column
- `Function:` sd(DF_NAME$\mathit{\$}$Col_name)
    - Shows standard deviation of certain column
