# Data Frame Selection and Indexing

We've seen how to call build-in data frames and how to create the using the data.frame() function along with vectors. Let's revisit our weather data frame and learn to select elements from within the data frame usng the bracket notation.

In [1]:
# lets make a dataframe!
days <- c('Mon','Tues','Wed','Thur','Fri','Sat','Sun')
temp <- c(78,79,65,68,65,70,71)
rain <- c(TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE)
weather <- data.frame(days,temp,rain)
weather

days,temp,rain
<fct>,<dbl>,<lgl>
Mon,78,True
Tues,79,False
Wed,65,False
Thur,68,False
Fri,65,True
Sat,70,False
Sun,71,True


We can use the bracket notation we used for matrices:

dataframe[rows,columns]

In [4]:
# We can pull the first row the same as we would pull the row from a matrix
weather[1,] 

Unnamed: 0_level_0,days,temp,rain
Unnamed: 0_level_1,<fct>,<dbl>,<lgl>
1,Mon,78,True


In [11]:
# Let's pull the 2nd column
weather[,2]

In [12]:
# Let's grab Fridays data
weather[5,]

Unnamed: 0_level_0,days,temp,rain
Unnamed: 0_level_1,<fct>,<dbl>,<lgl>
5,Fri,65,True


## Selecting Data using Column Names

Here's where data frames become very powerful. We can use column names to select data for the columns instead of having to remember the index number. Here's an example

In [13]:
# Let's pul all the rain values
weather[,'rain']

In [14]:
# We can pull data using a vector. Now we can get only days and temps. 
weather[1:5,c('days','temp')] 

Unnamed: 0_level_0,days,temp
Unnamed: 0_level_1,<fct>,<dbl>
1,Mon,78
2,Tues,79
3,Wed,65
4,Thur,68
5,Fri,65


If you want all the values of a particular column, you can use the dollar sign $ directly after the data frame name. Here's an example

df.name$column.name

In [15]:
weather$days #we can pull the entire column by using the $ 
weather$temp
weather$rain

## Filtering with a Subset Condition

We can use the subset() function to  grab a subset of values from our data frame based off some conditions. For example, imagine we wanted to grab the days where it rained (rain = TRUE), we can use the subset() function as follows

In [16]:
subset(weather,subset = rain == TRUE)

Unnamed: 0_level_0,days,temp,rain
Unnamed: 0_level_1,<fct>,<dbl>,<lgl>
1,Mon,78,True
5,Fri,65,True
7,Sun,71,True


See ow the condition uses some sort of comparison operator? In the case above we use == . Let's grab days where it was above 70 degrees

In [18]:
subset(weather,subset = temp > 70)

Unnamed: 0_level_0,days,temp,rain
Unnamed: 0_level_1,<fct>,<dbl>,<lgl>
1,Mon,78,True
2,Tues,79,False
7,Sun,71,True


Something worth noting is that we didn't pass in the column name as a string. The subset identified the column we were referring to in the data frame.

## Ordering a Data Frame

We can sort the order of our data frame by using the order function. You can pass in the column you want to sort by into the order() function, then you can use that vector to select from the data frame. Let's try to sort our data frame by temperature.

In [19]:
sorted.by.temp <- order(weather['temp'])

In [20]:
# Now we can use this vector to sort our data frame by passing it through the data frame
weather[sorted.by.temp,]

Unnamed: 0_level_0,days,temp,rain
Unnamed: 0_level_1,<fct>,<dbl>,<lgl>
3,Wed,65,False
5,Fri,65,True
4,Thur,68,False
6,Sat,70,False
7,Sun,71,True
1,Mon,78,True
2,Tues,79,False


What does sorted.by.temp look like?

In [21]:
sorted.by.temp

That makes sense. We are just asking to return the elements in that order. (by default it is set to ascending. We can pass a negative sign to do a descending order)

In [26]:
descending.temp <- order(-weather['temp'])
weather[descending.temp,] # Don't forget we are calling the row number when doing this!

Unnamed: 0_level_0,days,temp,rain
Unnamed: 0_level_1,<fct>,<dbl>,<lgl>
2,Tues,79,False
1,Mon,78,True
7,Sun,71,True
6,Sat,70,False
4,Thur,68,False
3,Wed,65,False
5,Fri,65,True


We can also use the subset notation since it is easier and requires less brackets. 

In [30]:
sort.temperature <- order(weather$temp)
weather[sort.temperature,]

Unnamed: 0_level_0,days,temp,rain
Unnamed: 0_level_1,<fct>,<dbl>,<lgl>
3,Wed,65,False
5,Fri,65,True
4,Thur,68,False
6,Sat,70,False
7,Sun,71,True
1,Mon,78,True
2,Tues,79,False


We can add a negative sign again in front of the column name

In [31]:
sort.temperature <- order(-weather$temp)
weather[sort.temperature,]

Unnamed: 0_level_0,days,temp,rain
Unnamed: 0_level_1,<fct>,<dbl>,<lgl>
2,Tues,79,False
1,Mon,78,True
7,Sun,71,True
6,Sat,70,False
4,Thur,68,False
3,Wed,65,False
5,Fri,65,True


That's it for data frames! We will definitely revisit this and explore data frames A LOT more, but we should test you understanding first! Up next an exercise!