# Reviewing Tables in R
## Political Science 3 Discussion Week 3 - Clara Hu

Today, we will play around with data in R. We can work with data in tables.

### Coding in Style

If you are interested in conventions that are often used in formatting R code (spacing, naming, etc.), and would like to make your code easier to read, check out the [tidyverse style guide](https://style.tidyverse.org/syntax.html). 

## Review of Extracting and Subsetting from Datasets

Remember that we can extract specific columns in a table using the `$` operator. We can also use the `subset` function to extract specific observations with values that satisfy a specific condition.

In [1]:
# We'll be looking at data about Iris flowers 
# Run this cell! Ignore what I'm doing below. 

iris$setosa <- ifelse(iris$Species == "setosa", 1, 0) # Making dummy variables
iris$versicolor <- ifelse(iris$Species == "versicolor", 1, 0)
iris$virginica <- ifelse(iris$Species == "virginica", 1, 0)

head(iris)

Unnamed: 0_level_0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species,setosa,versicolor,virginica
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<fct>,<dbl>,<dbl>,<dbl>
1,5.1,3.5,1.4,0.2,setosa,1,0,0
2,4.9,3.0,1.4,0.2,setosa,1,0,0
3,4.7,3.2,1.3,0.2,setosa,1,0,0
4,4.6,3.1,1.5,0.2,setosa,1,0,0
5,5.0,3.6,1.4,0.2,setosa,1,0,0
6,5.4,3.9,1.7,0.4,setosa,1,0,0


In [3]:
table(iris$Petal.Width)


0.1 0.2 0.3 0.4 0.5 0.6   1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9   2 2.1 2.2 2.3 
  5  29   7   7   1   1   7   3   5  13   8  12   4   2  12   5   6   6   3   8 
2.4 2.5 
  3   3 

In [None]:
# Remember we can get values in a column by using $
head(iris$Sepal.Length) 
# I'm using the head function to only show the first few values
# instead of all 150+

In [None]:
# Remember, we can also give things names and reference that name
# For example -- what's the median petal length for all flowers in the dataset?
all_petal_lengths <- iris$Petal.Length
median(all_petal_lengths)

## Quick Check

Using the dummy variable columns, find the proportion of flowers in the dataset that are either the species `Iris setosa` or `Iris versicolor`. Save that value as the name `s_v_prop`.

In [None]:
# Use as many lines as you need!

s_v_prop <- ...
s_v_prop # Tell R to print the value

## Subsetting

Let's focus on only the `Iris virginica` flowers in the dataset. We can do this by using the `subset` function, which takes the following arguments:

`subset(table, column_logical)`

A logical in R is the same thing as a Boolean in Python. In other words, it is a value (or set of values) that is either `TRUE` or `FALSE`. You can also think of them as 1 and 0.

We can get these by doing something called a Boolean comparison, where we compare a value to another, and if that condition is True, it will return `TRUE`. Here are some common comparisons:

| Logical Operator | R Code |
| - | - |
| does x equal y? | x == y |
| does x NOT equal y? | x != y |
| is x less than y? | x < y |
| is x greater than y? | x > y |
| is x less than or equal to y? | x <= y |
| is x greater than or equal to y? | x >= y |


In [None]:
# Logical example in R
x <- 5
y <- 10

x == y
x < y

In [None]:
# Let's practice using this to subset:
# subset(table, column_name <comparison> <value>)
virginica <- subset(iris, virginica == 1)
head(virginica)

In [None]:
# Not a quick check, but good practice:
# How can we get the virginica flowers using a different column?
subset(..., ...)

## Subset of a subset

Using the `subset` function and some other R code, create a table that only has rows of Iris setosa flowers that have a sepal length smaller than the mean sepal length of ALL virginica flowers. Call this new table `small_setosas`.

*Hint:* You should use the `virginica` table to help find the mean sepal length of Iris virginica flowers.

In [None]:
avg_virginica_sepal_length <- ...
setosa <- ...
small_setosas <- ...

head(small_setosas)

## One-way and Two-way Tables

We can use the `table` function to create one and two way tables. One and two way tables are used to summarize the counts of each category in a table. To use the `table` function, just plug in the column that we want to check.

| One way | Two way |
| - | - | 
| table(data\$var1) | table(data\\$var1, data\$var2) |

In [None]:
# Creating a one-way table
# Let's see how many flowers are in each category!
table(iris$Species)

In [None]:
# Creating a two-way table
#