# R Tutorial with Anaconda-Navigator and Jupyter Notebook

## Step 1: Open Anaconda-Navigator
## Step 2: Click 'Environments' then 'Create'
## Step 3: Insert filename, check mark 'Python' and 'R' then 'Create'
<img src='instructions.png'/>

## Step 4: Open the environment with the R package using the Open with Jupyter Notebook option.

#### Source: https://docs.anaconda.com/anaconda/navigator/tutorials/r-lang/

# Quick Introduction to R:

### Arithmetic operators in R:

* Addition: + 
* Subtraction: - 
* Multiplication: * 
* Division: / 
* Exponentiation: ^ 
* Modulo: %%
* Comment code use: # (Insert your comment here)

### Comparison Operators in R:

* Less than: <
* Greater than: >
* Less than or equal to: <=
* Greater than or equal to: >=
* Is equal to: ==
* Is NOT equal to: !=

### Logical Operators in R: 
* the AND operator (&)
* the OR operator (|)
* the NOT operator, otherwise known as the bang operator (!)

In [17]:
# Example: 
25 * 4 + 9 / 3

### Data Types in R: 
1. Numeric: Any number with or without a decimal point: 23, 0.03 and the numeric null value NA.
2. Character: Any grouping of characters on your keyboard (letters, numbers, spaces, symbols, etc.) surrounded by either 'single' or "double" quotes.
3. Logical: This data type only has two possible values— either TRUE or FALSE (without quotes). 
4. Vectors: A list of related data that is all the same type. Example: favorite_food <- c("Sushi", "Tacos","Lasagna","Tofu Soup")

** Note: In R, variables are assigned by an arrow sign (<-)

In [8]:
# example of variable assignmnet:
favorite_food <- c("Sushi", "Tacos","Lasagna","Tofu Soup") # c() separates each with a comma.
print(favorite_food)

[1] "Sushi"     "Tacos"     "Lasagna"   "Tofu Soup"


In [18]:
# In R, you start counting elements at position one, not zero.
print(favorite_food[1]) # "Sushi"

[1] "Sushi"


### Important Packages in R:
* dplyr: dplyr is a package used to clean, process, and organize data
* readr: The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf')
* to use the packages, you must use the R library function as so: library(dplyr)

# For this tutorial we will be examining the Summer Olympic Data
* Useful R cheatsheets found here: https://rstudio.com/resources/cheatsheets/

In [40]:
library(readr) # we need this function because it will allow us to read the data file
library(dplyr) # we need this function to filter data

In [26]:
df = read.csv('summer.csv') # add the data and save it as a variable (df for data file)

In [27]:
# Use head() to view the first few data entries
head(df)

Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold
1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver
1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze
1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold
1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver
1896,Athens,Aquatics,Swimming,"CHOROPHAS, Efstathios",GRE,Men,1200M Freestyle,Bronze


In [30]:
# Use summary() to produce result summaries of the results
summary(df)

      Year               City              Sport             Discipline   
 Min.   :1896   London     : 3567   Aquatics  : 4170   Athletics  : 3638  
 1st Qu.:1948   Athens     : 2149   Athletics : 3638   Rowing     : 2667  
 Median :1980   Los Angeles: 2074   Rowing    : 2667   Swimming   : 2628  
 Mean   :1970   Beijing    : 2042   Gymnastics: 2307   Artistic G.: 2103  
 3rd Qu.:2000   Sydney     : 2015   Fencing   : 1613   Fencing    : 1613  
 Max.   :2012   Atlanta    : 1859   Football  : 1497   Football   : 1497  
                (Other)    :17459   (Other)   :15273   (Other)    :17019  
                 Athlete         Country        Gender     
 PHELPS, Michael     :   22   USA    : 4585   Men  :22746  
 LATYNINA, Larisa    :   18   URS    : 2049   Women: 8419  
 ANDRIANOV, Nikolay  :   15   GBR    : 1720                
 MANGIAROTTI, Edoardo:   13   FRA    : 1396                
 ONO, Takashi        :   13   GER    : 1305                
 SHAKHLIN, Boris     :   13   ITA    : 1

# Let's clean up the data so we only have data for the USA.
## We can do that by using a filter() function and saving our data in a new variable:
1. Filter data by USA
2. Filter the USA data by the year 2012
3. Filter the 2012 USA data by gender women

In [61]:
# filter(data file name, column name == 'specific name in that column')
usa_data = filter(df, Country == 'USA')

In [52]:
usa_data

Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
1896,Athens,Athletics,Athletics,"LANE, Francis",USA,Men,100M,Bronze
1896,Athens,Athletics,Athletics,"BURKE, Thomas",USA,Men,100M,Gold
1896,Athens,Athletics,Athletics,"CURTIS, Thomas",USA,Men,110M Hurdles,Gold
1896,Athens,Athletics,Athletics,"BLAKE, Arthur",USA,Men,1500M,Silver
1896,Athens,Athletics,Athletics,"BURKE, Thomas",USA,Men,400M,Gold
1896,Athens,Athletics,Athletics,"JAMISON, Herbert",USA,Men,400M,Silver
1896,Athens,Athletics,Athletics,"GARRETT, Robert",USA,Men,Discus Throw,Gold
1896,Athens,Athletics,Athletics,"CLARK, Ellery",USA,Men,High Jump,Gold
1896,Athens,Athletics,Athletics,"CONNOLLY, James",USA,Men,High Jump,Silver
1896,Athens,Athletics,Athletics,"GARRETT, Robert",USA,Men,High Jump,Silver


In [54]:
usa2012 = filter(usa_data, Year == '2012')

In [55]:
usa2012

Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
2012,London,Aquatics,Diving,"BOUDIA, David",USA,Men,10M Platform,Gold
2012,London,Aquatics,Diving,"BOUDIA, David",USA,Men,Synchronized 10M,Bronze
2012,London,Aquatics,Diving,"MCCRORY, Nicholas",USA,Men,Synchronized 10M,Bronze
2012,London,Aquatics,Diving,"DUMAIS, Troy",USA,Men,Synchronized 3M,Bronze
2012,London,Aquatics,Diving,"IPSEN, Kristian",USA,Men,Synchronized 3M,Bronze
2012,London,Aquatics,Diving,"BRYANT, Kelci",USA,Women,Synchronized 3M,Silver
2012,London,Aquatics,Diving,"JOHNSTON, Abigail",USA,Women,Synchronized 3M,Silver
2012,London,Aquatics,Marathon swimming,"ANDERSON, Haley",USA,Women,10KM,Silver
2012,London,Aquatics,Swimming,"GREVERS, Matthew",USA,Men,100M Backstroke,Gold
2012,London,Aquatics,Swimming,"THOMAN, Nick",USA,Men,100M Backstroke,Silver


In [57]:
filter(usa2012, Gender == 'Women')

Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
2012,London,Aquatics,Diving,"BRYANT, Kelci",USA,Women,Synchronized 3M,Silver
2012,London,Aquatics,Diving,"JOHNSTON, Abigail",USA,Women,Synchronized 3M,Silver
2012,London,Aquatics,Marathon swimming,"ANDERSON, Haley",USA,Women,10KM,Silver
2012,London,Aquatics,Swimming,"FRANKLIN, Missy",USA,Women,100M Backstroke,Gold
2012,London,Aquatics,Swimming,"SONI, Rebecca",USA,Women,100M Breaststroke,Silver
2012,London,Aquatics,Swimming,"VOLLMER, Dana",USA,Women,100M Butterfly,Gold
2012,London,Aquatics,Swimming,"FRANKLIN, Missy",USA,Women,200M Backstroke,Gold
2012,London,Aquatics,Swimming,"BEISEL, Elizabeth",USA,Women,200M Backstroke,Bronze
2012,London,Aquatics,Swimming,"SONI, Rebecca",USA,Women,200M Breaststroke,Gold
2012,London,Aquatics,Swimming,"SCHMITT, Allison",USA,Women,200M Freestyle,Gold


# Now let's see the total number of medals won by the USA by each gender:
## We can use the table() function
* table(name of data_column, name of data_column) 
* replace each underscore with a $

Example: table(usa_data$Medal)

In [64]:
table(usa_data$Medal)


Bronze   Gold Silver 
  1098   2235   1252 

In [63]:
table(usa_data$Medal, usa_data$Gender)

        
          Men Women
  Bronze  785   313
  Gold   1562   673
  Silver  861   391

# In this tutorial we learned:
* How to use R with Anaconda-Navigator and Jupyter Notebook
* Quick tutorial on R
* How to import data
* How to filter the data
* How to create a table 