# Data Frame Selection and Indexing

In [1]:
# Some made up weather data
days <- c('mon','tue','wed','thu','fri')
temp <- c(22.2,21,23,24.3,25)
rain <- c(TRUE, TRUE, FALSE, FALSE, TRUE)

# Pass in the vectors:
df <- data.frame(days,temp,rain)

In [2]:
df

days,temp,rain
mon,22.2,True
tue,21.0,True
wed,23.0,False
thu,24.3,False
fri,25.0,True


In [3]:
# We can use the same bracket notation we used for matrices:
# df[rows,columns]

# Everything from first row
df[1,]

days,temp,rain
mon,22.2,True


In [4]:
# Grab Friday data
df[5,]


Unnamed: 0,days,temp,rain
5,fri,25,True


## Selecting using column names

In [5]:
# Here is where data frames become very powerful, we can use column names to select data for the columns instead of having to remember numbers. 
# All rain values
df[,'rain']

In [6]:
# First 5 rows for days and temps
df[1:5,c('days','temp')]

days,temp
mon,22.2
tue,21.0
wed,23.0
thu,24.3
fri,25.0


In [7]:
# If you want all the values of a particular column you can use the dollar sign directly after the dataframe as follows:
# df.name$column.name
df$rain

In [8]:
df$days

In [9]:
# You can also use bracket notation to return a data frame format of the same information:
df['rain']

rain
True
True
False
False
True


In [10]:
df['days']

days
mon
tue
wed
thu
fri


## Filtering with a subset condition


We can use the **subset()** function to grab a subset of values from our data frame based off some condition. So for example, imagin we wanted to grab the days where it rained (rain=True), we can use the subset() function as follows:

In [11]:
subset(df,subset=rain==TRUE)

Unnamed: 0,days,temp,rain
1,mon,22.2,True
2,tue,21.0,True
5,fri,25.0,True


In [12]:
# Notice how the condition uses some sort of comparison operator, in the above case ==. Let's grab days where the temperature was greater than 23:
subset(df,subset= temp>23)

Unnamed: 0,days,temp,rain
4,thu,24.3,False
5,fri,25.0,True


## Odering a Data Frame


We can sort the order of our data frame by using the order function. You pass in the column you want to sort by into the order() function, then you use that vector to select from the dataframe. Let's see an example of sorting by the temperature:

In [13]:
sorted.temp <- order(df['temp'])
df[sorted.temp,]

Unnamed: 0,days,temp,rain
2,tue,21.0,True
1,mon,22.2,True
3,wed,23.0,False
4,thu,24.3,False
5,fri,25.0,True


In [14]:
sorted.temp

In [15]:
# Ok, so we are just asking for those index elements in that order (by default ascending, we can pass a negative sign to do descending order):
desc.temp <- order(-df['temp'])
df[desc.temp,]

Unnamed: 0,days,temp,rain
5,fri,25.0,True
4,thu,24.3,False
3,wed,23.0,False
1,mon,22.2,True
2,tue,21.0,True


In [16]:
# We could have also used the other column selection methods we learned:
sort.temp <- order(df$temp)
df[sort.temp,]

Unnamed: 0,days,temp,rain
2,tue,21.0,True
1,mon,22.2,True
3,wed,23.0,False
4,thu,24.3,False
5,fri,25.0,True
