# Dataframes - Row Indexing

![alt](http://www.scatter.com/images/DataLab_logo.jpg)

## Table of Contents
1. Numeric Row Indexing
2. Logical Row Indexing
3. Use of subset() function for retrieving rows 
4. Use of rbind() for adding rows

There are a two basic methods to retrieve a subset of the rows of a dataframe:

1. Numeric indexing: supply the row numbers to be retrieved
2. Logical indexing: supply a `TRUE` or `FALSE` value for __every__ row of the dataframe

There is another way to index rows when the rows are named, but is less common and so will not be covered here. 

We will continue to look at the `iris` dataframe.

### Numeric Indexing

Recall that to retrieve the cell of the `iris` dataframe in the 2nd row and in the 3rd columns, use square brackets like this:

In [7]:
%r
iris[2, 3]

To retrieve the entire second row, omit the column number after the comma (",").

In [9]:
%r
iris[2,]

To retrieve several rows place an integer vector of row numbers before the comma. Recall that `2` is a vector of length `1`.

In [11]:
%r
iris[c(1,2,3),]

Notice that the order of the row numbers in the row number vector determines the order the rows are displayed.

In [13]:
%r
iris[5:3,]

### Logical Indexing

Suppose we are interesting in retrieving all rows from the `iris` dataframe where the `Sepal.Length` variable/column is less than `5.0`.

First create a vector called `log_ndx` which is `TRUE` when the corresponding value in`iris$Sepal.Length` is less than `5.0`.

In [16]:
%r
log_ndx <- iris$Sepal.Length < 5.0
log_ndx

Compare the first five elements of `log_ndx` to the first five elements in the `Sepal.Length` column.

In [18]:
%r
log_ndx[1:5]

In [19]:
%r
iris$Sepal.Length[1:5]

Notice that the 2nd, 3rd and 4th rows have a value for `Sepal.Length` which is less than `5.0`.

The following displays these three rows (and the others where `Sepal.Length` is less than `5.0`.)

In [21]:
%r
head(iris, 5)

In [22]:
%r
iris[log_ndx==FALSE,]

### __subset() function__

The subset command is another way of retreiving rows of a dataframe that satisfy a particular condition. The following command uses the subset() function to retreive the observations of the species Versicolor only.

In [24]:
%r
versi_species=subset(iris, Species=="versicolor", Petal.Length < 5.0)
versi_species

### __rbind() function__

rbind() is a function that can be used to add rows to the dataframe. Let's test this function on a sample dataframe:

In [26]:
%r
dummydf1=head(iris)
dummydf2=tail(iris)
dummydf3=rbind(dummydf1,dummydf2)
dummydf3

__Exercise__: Find and display the 25 records from the `iris` dataframe where both of the following are true:
- `Sepal.Width` is greater than the average value of all `Sepal.Width` values
- `Sepal.Length` is greater than the average value of all `Sepal.Length` values

Work through this one step at a time:
1. find the average/mean of the `Sepal.Width` column
1. find the average/mean of the `Sepal.Length` column
1. create a logical index vector that is `True` only when the `Sepal.Width` value is greater than the average `Sepal.Width` value
1. create a logical index vector that is `True` only when the `Sepal.Length` value is greater than the average `Sepal.Length` value
1. create a logical index vector that is `True` only when both of these logical index vectors are `True`
1. use this last logical index vector to retrieve these 25 records

Check your work at each step.

The End