

{% include toc title="In This Lesson" icon="file-text" %}



This lesson introduces the data.frame which is very similar to working with

a spreadsheet in `R`.



<div class='notice--success' markdown="1">



## <i class="fa fa-graduation-cap" aria-hidden="true"></i> Learning Objectives

At the end of this activity, you will be able to:



* Open .csv or text file containing tabular (spreadsheet) formatted data in R.

* Quickly plot the data using the GGPLOT2 function qplot()



## <i class="fa fa-check-square-o fa-2" aria-hidden="true"></i> What you need



You need R and RStudio to complete this tutorial. Also we recommend have you

have an `earth-analytics` directory setup on your computer with a `/data`

directory with it.



* [How to Setup R / R Studio](/course-materials/earth-analytics/week-1/setup-r-rstudio/)

* [Setup your working directory](/course-materials/earth-analytics/week-1/setup-working-directory/)





</div>



In the homework from week 1, we used the code below to create a report with `knitr`

in `RStudio`.




In [1]:
```{r open-file }



# load the ggplot2 library for plotting

library(ggplot2)



# turn off factors

options(stringsAsFactors = FALSE)



# download data from figshare

# note that we are downloaded the data into your

download.file(url = "https://ndownloader.figshare.com/files/7010681",

              destfile = "data/boulder-precip.csv")

```




Let's break the code above down. First, we use the `download.file` function to

download a datafile. In this case, the data are housed on

<a href="http://www.figshare.com" target="_blank">Figshare</a> - a

popular data repository that is free to use if your data are cumulatively

smaller than 20gb.



Notice that download.file() function has two **ARGUMENTS**:



1. **url**: this is the path to the data file that you wish to download

2. **destfile**: this is the location on your computer (in this case: `/data`) and name of the

file when saved (in this case: boulder-precip.csv). So we downloaded a file from

a url on figshare do our data directory. We named that file `boulder-precip.csv`.



Next, we read in the data using the function: `read.csv()`.




In [1]:
```{r import-data }

# import data

boulder_precip <- read.csv(file="data/boulder-precip.csv")



# view first few rows of the data

head(boulder_precip)



# view the format of the boulder_precip object in R

str(boulder_precip)

```


<div class="notice--warning" markdown="1">



## <i class="fa fa-pencil-square-o" aria-hidden="true"></i> Challenge

What is the format associated with each column for the `boulder_precip`

data.frame? Describe the attributes of each format. Can you perform math

on each column? Why or why not?



<!--

integer - numbers without decimal points,

character: text strings

number: numeric values (can contain decimals places)

 -->



</div>



## Introduction to the Data.Frame



When we read data into R using read.csv() it imports it into a data frame format.

Data frames are the _de facto_ data structure for most tabular data, and what we

use for statistics and plotting.



A data frame is a collection of vectors of identical lengths. Each vector

represents a column, and each vector can be of a different data type (e.g.,

characters, integers, factors). The `str()` function is useful to inspect the

data types of the columns.



A data frame can be created by hand, but most commonly they are generated when

you important a text file or spreadsheet into R using the

functions `read.csv()` or `read.table()`.



## Extracting / Specifying "columns" By Name



You can extract just one single column from your data.frame using the `$` symbol

followed by the name of the column (or the column header):




In [1]:
```{r view-column }

# when we download the data we create a dataframe

# view each column of the data frame using its name (or header)

boulder_precip$DATE



# view the precip column

boulder_precip$PRECIP

```






## View Structure of a Data Frame



We can explore the format of our data frame in a similar way to how we explored

vectors in the third lesson of this module. Let's take a look.






In [1]:
```{r view-structure }

# when we download the data we create a dataframe

# view each column of the data frame using its name (or header)

# how many rows does the data frame have

nrow(boulder_precip)



# view the precip column

boulder_precip$PRECIP

```




## Plotting our Data



We can quickly plot our data too. Note that we are using the `ggplot2` function

qplot() rather than the R base plot functionality. We are doing this because

`ggplot2` is generally more powerful and efficient to use for plotting.




In [1]:
```{r quick-plot, fig.cap="plot precipitation data" }

# q plot stands for quick plot. Let's use it to plot our data

qplot(x=boulder_precip$DATE,

      y=boulder_precip$PRECIP)



```
