# 1. Authoring R Markdown Reports

Begin with a real life case study written in R code and then learn to narrate the code, adding interpretations, explanations, and descriptions with Markdown, the text syntax at the heart of R Markdown.

The R Markdown Exercise interface
100xp

For this course, DataCamp has developed a new kind of interface that looks like the R Markdown pane in RStudio. You have a space (my_document.Rmd) to write R Markdown documents, as well as the buttons to compile the R Markdown document. To keep things simple, we'll stick with making html and pdf documents, although it is also possible to create Microsoft Word documents with R Markdown.

When you click "Knit HTML", DataCamp will compile your R Markdown document and display the finished, formatted results in a new pane.

To give you a taste of the things you'll learn in this course, we've prepared two documents in the editor on the right:

    my_document.Rmd containing the actual R Markdown code;
    faded.css, a supplementary file that brands your report.

Instructions

    Change the title of the Markdown Document from "Ozone" to "Hello R Markdown".
    Click the "Knit HTML" button to see the compiled version of your sample code.


In [None]:
---
title: "Ozone"
output:
  html_document:
    css: faded.css
---

## Data

The `atmos` data set resides in the `nasaweather` package of the *R* programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the [2006 ASA Data Expo](http://stat-computing.org/dataexpo/2006/).

Some of the variables in the `atmos` data set are:

* **temp** - The mean monthly air temperature near the surface of the Earth (measured in degrees kelvin (*K*))

* **pressure** - The mean monthly air pressure at the surface of the Earth (measured in millibars (*mb*))

* **ozone** - The mean monthly abundance of atmospheric ozone (measured in Dobson units (*DU*))

You can convert the temperature unit from Kelvin to Celsius with the formula

$$ celsius = kelvin - 273.15 $$

And you can convert the result to Fahrenheit with the formula

$$ fahrenheit = celsius \times \frac{9}{5} + 32 $$

```{r, echo = FALSE, results = 'hide'}
example_kelvin <- 282.15
```

For example, `r example_kelvin` degrees Kelvin corresponds to `r example_kelvin - 273.15` degrees Celsius.




Prepare the workspace for preliminary analysis
100xp

During this course, we will examine a data set that comes in the nasaweather package. The data set is called atmos, and it contains meteorological data about the western hemisphere.

We'll also use the dplyr package to manipulate our data and the ggvis package to visualize it.

For the next set of exercises, you will use the traditional DataCamp interface: you have an editor where you can write and submit R code, as well as a console where you can experiment with R code without doing a formal submission.
Instructions

    Load the nasaweather, dplyr, and ggvis packages. These packages have already been installed in the DataCamp R session.
    After submitting the correct code, open the help page for the atmos data set by executing ?atmos in the console. Before proceeding to the next exercise, read the help page to familiarize yourself with the data.


In [6]:
#install.packages('nasaweather', repos='http://cran.us.r-project.org')
#install.packages('ggvis', repos='http://cran.us.r-project.org')

package 'ggvis' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\mohammads6\AppData\Local\Temp\Rtmp2LuWoA\downloaded_packages


In [7]:
# Load the nasaweather package
library(nasaweather)

# Load the dplyr package
library(dplyr)

# Load the ggvis package
library(ggvis)

"package 'ggvis' was built under R version 3.4.2"

Now that the workspace is ready for some analysis, head over to the next exercise to prepare your data for the report. Don't forget to consult and read the help page on the atmos data set. 

Prepare your data
100xp

We will use some of the data in atmos to explore the relationship between ozone and temperature. But before we do, let's transform the data into a more useful form.

The sample code uses dplyr functions to aggregate the data. It computes the mean value of temp, pressure, ozone, cloudlow, cloudmid, and cloudhigh for each latitude/longitude grid point.

You can learn more about dplyr in DataCamp's dplyr course.

Don't get confused by the pipe operator (%>%) from the magrittr package that is used often in combination with dplyr verbs. It is used to chain your code in case there are several operations you want to do without the need to save intermediate results.
Instructions

    Set the year variable to 1995. This will cause the code to retain just observations from the year 1995.
    At the end of the sample code, add a command to print the resulting means data frame and examine its output.


In [8]:
# The nasaweather and dplyr packages are available in the workspace

# Set the year variable to 1995
year <- 1995

means <- atmos %>%
  filter(year == year) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
         pressure = mean(pressure, na.rm = TRUE),
         ozone = mean(ozone, na.rm = TRUE),
         cloudlow = mean(cloudlow, na.rm = TRUE),
         cloudmid = mean(cloudmid, na.rm = TRUE),
         cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()

# Inspect the means variable
means

long,lat,temp,pressure,ozone,cloudlow,cloudmid,cloudhigh
-113.8000,-21.200000,296.1083,1000.0000,268.2500,37.17361,5.777778,1.9930556
-113.8000,-18.704348,296.3069,1000.0000,265.7500,39.36111,4.055556,1.0416667
-113.8000,-16.208696,296.7736,1000.0000,262.7778,40.22222,3.819444,0.6875000
-113.8000,-13.713043,297.2722,1000.0000,260.4722,38.09028,3.472222,0.6597222
-113.8000,-11.217391,297.7347,1000.0000,258.6667,34.59722,3.125000,0.8472222
-113.8000,-8.721739,298.1514,1000.0000,257.7222,31.29861,3.222222,1.5833333
-113.8000,-6.226087,298.5028,1000.0000,256.6667,27.79861,3.986111,2.7708333
-113.8000,-3.730435,298.5028,1000.0000,256.3333,28.05556,5.006944,3.3194444
-113.8000,-1.234783,298.2208,1000.0000,256.8056,26.04167,5.298611,3.0694444
-113.8000,1.260870,298.6375,1000.0000,256.3611,30.94444,7.236111,4.2291667


Can you see that each combination of latitude and longitude only appears once in means? atmos records multiple values for multiple dates at each location. means only records the mean value of all of the dates for each location. Now that we have the data we'll use, let's visualize it! 

Experiment with plot generation
100xp

The sample code on the right uses ggvis functions to visualize the data. It displays a plot of pressure vs. ozone.

We'll use ggvis to create several graphs for our R Markdown reports.

You can learn more about ggvis in DataCamp's ggvis course.
Instructions

    Run the code and take a look at the graph that it makes. See how straightforward it is to plot the data from the previous exercise?
    Change the code to plot the temp variable vs the ozone variable, both in the means data set. We will write an R Markdown report that analyzes the relationship between temp and ozone.


In [9]:
# The nasaweather, dplyr and ggvis packages are loaded in the workspace.

# Code for the previous exercise - do not change this
means <- atmos %>%
  filter(year == 1995) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
         pressure = mean(pressure, na.rm = TRUE),
         ozone = mean(ozone, na.rm = TRUE),
         cloudlow = mean(cloudlow, na.rm = TRUE),
         cloudmid = mean(cloudmid, na.rm = TRUE),
         cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()

# Change the code to plot the temp variable vs the ozone variable
means %>%
  ggvis(x = ~temp, y = ~ozone) %>%
  layer_points()

Don't worry if you don't understand how the code does what it does in this exercise. For this course, we're just going to focus on how to place a graph in a report. All you need to know is that the code will create a nice graph when you run it.

Prepare a model component
100xp

We've now loaded data, cleaned it, and visualized it. Our analysis will have one more component: a model.

The code on the right creates a linear model that predicts ozone based on pressure and cloudlow; all three are variables of the means data frame you created earlier.

You can learn more about building models with R in DataCamp's Introduction to Statistics course.
Instructions

    Change the model so that it predicts ozone based on temp and nothing else.
    Generate a summary of the model using the summary() function. Can you interpret the results? Test yourself by looking for the model's estimates for the intercept and temp coefficients, as well as the p-value associated with each coefficient and the model's overall Adjusted R-squared.


In [10]:
# The nasaweather and dplyr packages are already at your disposal
means <- atmos %>%
  filter(year == 1995) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
         pressure = mean(pressure, na.rm = TRUE),
         ozone = mean(ozone, na.rm = TRUE),
         cloudlow = mean(cloudlow, na.rm = TRUE),
         cloudmid = mean(cloudmid, na.rm = TRUE),
         cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()

# Change the model: base prediction only on temp
mod <- lm(ozone ~ temp, data = means)

# Generate a model summary and interpret the results
summary(mod)


Call:
lm(formula = ozone ~ temp, data = means)

Residuals:
    Min      1Q  Median      3Q     Max 
-38.886  -7.149  -2.422   4.432  32.880 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 746.1550    37.8721   19.70   <2e-16 ***
temp         -1.6204     0.1274  -12.72   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.35 on 574 degrees of freedom
Multiple R-squared:  0.2199,	Adjusted R-squared:  0.2186 
F-statistic: 161.8 on 1 and 574 DF,  p-value: < 2.2e-16


You're now in a familiar position: you've done some preliminary analysis, and you're ready to report your findings. Remember what your code does, as you will work with it again soon. In the next video, Garrett will show you how to write the narrative sections of your report in R Markdown. 

## Section 3 - Markdown - Video

Styling narrative sections
100xp

You can use Markdown to embed formatting instructions into your text. For example, you can make a word italicized by surrounding it in asterisks, bold by surrounding it in two asterisks, and monospaced (like code) by surrounding it in backticks:

*italics*
**bold**
`code`

You can turn a word into a link by surrounding it in hard brackets and then placing the link behind it in parentheses, like this:

[RStudio](www.rstudio.com)

To create titles and headers, use leading hastags. The number of hashtags determines the header's level:

# First level header
## Second level header
### Third level header

Instructions

The paragraph to the right describes the data that we'll use in our report.

    Turn the line that begins with "Data" into a second level header.
    Change the words atmos and nasaweather into a monospaced font suitable for code snippets.
    Make the letter R italicized.
    Change "2006 ASA Data Expo" to a link that points to http://stat-computing.org/dataexpo/2006/

The paragraph to the right describes the data that you'll use in your report. Try rendering it both before and after you make the changes below.

## Data

The `atmos` data set resides in the `nasaweather` package of the _R_ programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the [2006 ASA Data Expo](http://stat-computing.org/dataexpo/2006/).

You're learning how to write in Markdown, the syntax language that the R Markdown package uses to create text. Head over to the next exercise to find out how to generate lists in Markdown syntax. 

Lists in R Markdown
100xp

To make a bulleted list in Markdown, place each item on a new line after an asterisk and a space, like this:

* item 1
* item 2
* item 3

You can make an ordered list by placing each item on a new line after a number followed by a period followed by a space, like this

1. item 1
2. item 2
3. item 3

In each case, you need to place a blank line between the list and any paragraphs that come before it.
Instructions

We've added some text to your description on the right.

    Turn the text into a bulleted list with three bullets. Temp, pressure, and ozone should each get their own entry.
    Make temp, pressure, and ozone bold at the start of each entry.
    Make K, mb, and DU italicized at the end of each entry.

Then render your results to see the final format.

## Data

The `atmos` data set resides in the `nasaweather` package of the *R* programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the [2006 ASA Data Expo](http://stat-computing.org/dataexpo/2006/).

Some of the variables in the `atmos` data set are:

* **temp** - The mean monthly air temperature near the surface of the Earth (measured in degrees kelvin (_K_))
* **pressure** - The mean monthly air pressure at the surface of the Earth (measured in millibars (_mb_))
* **ozone** - The mean monthly abundance of atmospheric ozone (measured in Dobson units (_DU_))



LaTeX equations
100xp

You can also use the Markdown syntax to embed latex math equations into your reports. To embed an equation in its own centered equation block, surround the equation with two pairs of dollar signs like this,

$$1 + 1 = 2$$

To embed an equation inline, surround it with a single pair of dollar signs, like this: $1 + 1 = 2$.

You can use all of the standard latex math symbols to create attractive equations.
Instructions

The text on the right contains a formula that converts degrees Celsius to degrees Fahrenheit. Where the comment is, write another formula that converts degrees Kelvin to degrees Celsius. You can convert any temperature in degrees Kelvin to a temperature in degrees Celsius by subtracting 273.15 from it. Do not capitalize Kelvin or Celsius when writing the formula. Then render your results to see the final format.

## Data

The `atmos` data set resides in the `nasaweather` package of the *R* programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the [2006 ASA Data Expo](http://stat-computing.org/dataexpo/2006/).

Some of the variables in the `atmos` data set are:

* **temp** - The mean monthly air temperature near the surface of the Earth (measured in degrees kelvin (*K*))

* **pressure** - The mean monthly air pressure at the surface of the Earth (measured in millibars (*mb*))

* **ozone** - The mean monthly abundance of atmospheric ozone (measured in Dobson units (*DU*))

You can convert the temperature unit from Kelvin to Celsius with the formula

<!-- Insert the conversion formula here -->

And you can convert the result to Fahrenheit with the formula

$$ fahrenheit = celsius \times \frac{9}{5} + 32 $$

$$ kelvin = celsius - 273.15$$

# 2. Embedding Code

Weave your code and narration into a single document and then render the document to create a finished report that includes both the code and its output. Understand how to customize this process, and open the door for automated, targeted reporting.

## Section 4 - Knitr - Video

R code chunks
100xp

You can embed R code into your R Markdown report with the knitr syntax. To do this, surround your code with two lines: one that contains ```{r} and one that contains ```. The result is a code chunk that looks like this:

```{r}
# some code
```

When you render the report, R will execute the code. If the code returns any results, R will add them to your report.
Instructions

The first file in the editor pane on the right contains the next section of your R Markdown report. This section will explain how you cleaned your data. The second file (my_code.R) on the right is an R Script that contains the actual code that we used to clean the data. Use the knitr syntax to embed this code into the .Rmd file.

Then render the file to see the results.

In [None]:
## Cleaning

For the remainder of the report, we will look only at data from the year 1995. We aggregate our data by location, using the *R* code below.

```{r}
library(nasaweather)
library(dplyr)

year <- 1995

means <- atmos %>%
  filter(year == year) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
         pressure = mean(pressure, na.rm = TRUE),
         ozone = mean(ozone, na.rm = TRUE),
         cloudlow = mean(cloudlow, na.rm = TRUE),
         cloudmid = mean(cloudmid, na.rm = TRUE),
         cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()
```

Customize R code chunks
100xp

You can customize each R code chunk in your report by providing optional arguments after the r in ```{r}, which appears at the start of the code chunk. Let's look at one set of options.

R functions sometimes return messages, warnings, and even error messages. By default, R Markdown will include these messages in your report. You can use the message, warning and error options to prevent R Markdown from displaying these. If any of the options are set to FALSE R Markdown will not include the corresponding type of message in the output.

For example, R Markdown would ignore any errors or warnings generated by the chunk below.

```{r warning = FALSE, error = FALSE}
"four" + "five"
```

Instructions

    Packages often generate messages when you first load them with library(). To make sure that these messages do not appear in your report, separate library(nasaweather), library(dplyr), and library(ggvis) into their own code chunk in the document to the right. Be sure to make this the first code chunk in your document (so other code chunks will have access to the data sets and functions that come in those libraries).
    Arrange for the new code chunk to ignore any messages that are generated when loading the packages.

Then render the file to see the results.

In [None]:
# Cleaning

For the remainder of the report, we will look only at data from the year 1995. We aggregate our data by location, using the *R* code below.

```{r message=FALSE}
library(nasaweather)
library(dplyr)
library(ggvis)
```

```{r}
year <- 1995

means <- atmos %>%
  filter(year == year) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
         pressure = mean(pressure, na.rm = TRUE),
         ozone = mean(ozone, na.rm = TRUE),
         cloudlow = mean(cloudlow, na.rm = TRUE),
         cloudmid = mean(cloudmid, na.rm = TRUE),
         cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()
```

Popular chunk options
100xp

Three of the most popular chunk options are echo, eval and results.

If echo = FALSE, R Markdown will not display the code in the final document (but it will still run the code and display its results unless told otherwise).

If eval = FALSE, R Markdown will not run the code or include its results, (but it will still display the code unless told otherwise).

If results = 'hide', R Markdown will not display the results of the code (but it will still run the code and display the code itself unless told otherwise).
Instructions

The R Markdown file to the right contains a complete report with two figures. It is common to display figures without the code that generates them (the code is a distraction). Modify each code chunk that generates a graph so that it does not display the code that makes the graph. Notice how the document controls the size of the figures with the fig.height and fig.width arguments.

Popular chunk options
100xp

Three of the most popular chunk options are echo, eval and results.

If echo = FALSE, R Markdown will not display the code in the final document (but it will still run the code and display its results unless told otherwise).

If eval = FALSE, R Markdown will not run the code or include its results, (but it will still display the code unless told otherwise).

If results = 'hide', R Markdown will not display the results of the code (but it will still run the code and display the code itself unless told otherwise).
Instructions

The R Markdown file to the right contains a complete report with two figures. It is common to display figures without the code that generates them (the code is a distraction). Modify each code chunk that generates a graph so that it does not display the code that makes the graph. Notice how the document controls the size of the figures with the fig.height and fig.width arguments.

In [None]:
## Data

The `atmos` data set resides in the `nasaweather` package of the *R* programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the [2006 ASA Data Expo](http://stat-computing.org/dataexpo/2006/).

Some of the variables in the `atmos` data set are:

* **temp** - The mean monthly air temperature near the surface of the Earth (measured in degrees kelvin (*K*))

* **pressure** - The mean monthly air pressure at the surface of the Earth (measured in millibars (*mb*))

* **ozone** - The mean monthly abundance of atmospheric ozone (measured in Dobson units (*DU*))

You can convert the temperature unit from Kelvin to Celsius with the formula

$$ celsius = kelvin - 273.15 $$

And you can convert the result to Fahrenheit with the formula

$$ fahrenheit = celsius \times \frac{9}{5} + 32 $$

## Cleaning

For the remainder of the report, we will look only at data from the year 1995. We aggregate our data by location, using the *R* code below.

```{r message = FALSE}
load(url("http://assets.datacamp.com/course/rmarkdown/atmos.RData")) # working with a subset
library(dplyr)
library(ggvis)
```

```{r}
year <- 1995
means <- atmos %>%
  filter(year == year) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
         pressure = mean(pressure, na.rm = TRUE),
         ozone = mean(ozone, na.rm = TRUE),
         cloudlow = mean(cloudlow, na.rm = TRUE),
         cloudmid = mean(cloudmid, na.rm = TRUE),
         cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()
```

## Ozone and temperature

Is the relationship between ozone and temperature useful for understanding fluctuations in ozone? A scatterplot of the variables shows a strong, but unusual relationship.

```{r fig.height = 4, fig.width = 5, echo = FALSE}
means %>%
  ggvis(~temp, ~ozone) %>%
  layer_points()
```

We suspect that group level effects are caused by environmental conditions that vary by locale. To test this idea, we sort each data point into one of four geographic regions:

```{r}
means$locale <- "north america"
means$locale[means$lat < 10] <- "south pacific"
means$locale[means$long > -80 & means$lat < 10] <- "south america"
means$locale[means$long > -80 & means$lat > 10] <- "north atlantic"
```

### Model

We suggest that ozone is highly correlated with temperature, but that a different relationship exists for each geographic region. We capture this relationship with a second order linear model of the form

$$ ozone = \alpha + \beta_{1} temperature + \sum_{locales} \beta_{i} locale_{i} + \sum_{locales} \beta_{j} interaction_{j} + \epsilon$$

This yields the following coefficients and model lines.

```{r}
lm(ozone ~ temp + locale + temp:locale, data = means)
```

```{r fig.height = 4, fig.width = 5, echo=FALSE}
means %>%
  group_by(locale) %>%
  ggvis(~temp, ~ozone) %>%
  layer_points(fill = ~locale) %>%
  layer_model_predictions(model = "lm", stroke = ~locale) %>%
  hide_legend("stroke") %>%
  scale_nominal("stroke", range = c("darkorange", "darkred", "darkgreen", "darkblue"))
```

### Diagnostics

An anova test suggests that both locale and the interaction effect of locale and temperature are useful for predicting ozone (i.e., the p-value that compares the full model to the reduced models is statistically significant).

```{r}
mod <- lm(ozone ~ temp, data = means)
mod2 <- lm(ozone ~ temp + locale, data = means)
mod3 <- lm(ozone ~ temp + locale + temp:locale, data = means)

anova(mod, mod2, mod3)
```



In [None]:
Inline R code
100xp

You can embed R code into the text of your document with the `r ` syntax. Be sure to include the lower case r in order for this to work properly. R Markdown will run the code and replace it with its result, which should be a piece of text, such as a character string or a number.

For example, the line below uses embedded R code to create a complete sentence:

The factorial of four is `r factorial(4)`.

When you render the document the result will appear as:

The factorial of four is 24.

Inline code provides a useful way to make your reports completely automatable.
Instructions

The report to the right has been reorganized to make it more automatable.

    Change the value of year to 2000 in line 28.
    Complete lines 31 and 52 so that the blank space, ___, shows the value of the year object when the report is rendered.
    Render the document and notice how everything updates to use the new year's worth of data. Even the sentences in lines 31 and 52 update to reflect the new year.


In [None]:
---
output: html_document
---

## Data

The `atmos` data set resides in the `nasaweather` package of the *R* programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the [2006 ASA Data Expo](http://stat-computing.org/dataexpo/2006/).

Some of the variables in the `atmos` data set are:

* **temp** - The mean monthly air temperature near the surface of the Earth (measured in degrees kelvin (*K*))

* **pressure** - The mean monthly air pressure at the surface of the Earth (measured in millibars (*mb*))

* **ozone** - The mean monthly abundance of atmospheric ozone (measured in Dobson units (*DU*))

You can convert the temperature unit from Kelvin to Celsius with the formula

$$ celsius = kelvin - 273.15 $$

And you can convert the result to Fahrenheit with the formula

$$ fahrenheit = celsius \times \frac{9}{5} + 32 $$

## Cleaning

```{r echo = FALSE}
year <- 2000
```

For the remainder of the report, we will look only at data from the year `r year`. We aggregate our data by location, using the *R* code below.

```{r message = FALSE}
library(nasaweather)
library(dplyr)
library(ggvis)
```

```{r}
means <- atmos %>%
  filter(year == year) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
         pressure = mean(pressure, na.rm = TRUE),
         ozone = mean(ozone, na.rm = TRUE),
         cloudlow = mean(cloudlow, na.rm = TRUE),
         cloudmid = mean(cloudmid, na.rm = TRUE),
         cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()
```

where the `year` object equals `r year`.


## Ozone and temperature

Is the relationship between ozone and temperature useful for understanding fluctuations in ozone? A scatterplot of the variables shows a strong, but unusual relationship.

```{r echo = FALSE, fig.height = 4, fig.width = 5}
means %>%
  ggvis(~temp, ~ozone) %>%
  layer_points()
```

We suspect that group level effects are caused by environmental conditions that vary by locale. To test this idea, we sort each data point into one of four geographic regions:

```{r}
means$locale <- "north america"
means$locale[means$lat < 10] <- "south pacific"
means$locale[means$long > -80 & means$lat < 10] <- "south america"
means$locale[means$long > -80 & means$lat > 10] <- "north atlantic"
```

### Model

We suggest that ozone is highly correlated with temperature, but that a different relationship exists for each geographic region. We capture this relationship with a second order linear model of the form

$$ ozone = \alpha + \beta_{1} temperature + \sum_{locales} \beta_{i} locale_{i} + \sum_{locales} \beta_{j} interaction_{j} + \epsilon$$

This yields the following coefficients and model lines.

```{r}
lm(ozone ~ temp + locale + temp:locale, data = means)
```

```{r echo = FALSE, fig.height = 4, fig.width = 5}
means %>%
  group_by(locale) %>%
  ggvis(~temp, ~ozone) %>%
  layer_points(fill = ~locale) %>%
  layer_model_predictions(model = "lm", stroke = ~locale) %>%
  hide_legend("stroke") %>%
  scale_nominal("stroke", range = c("darkorange", "darkred", "darkgreen", "darkblue"))
```

### Diagnostics

An anova test suggests that both locale and the interaction effect of locale and temperature are useful for predicting ozone (i.e., the p-value that compares the full model to the reduced models is statistically significant).

```{r}
mod <- lm(ozone ~ temp, data = means)
mod2 <- lm(ozone ~ temp + locale, data = means)
mod3 <- lm(ozone ~ temp + locale + temp:locale, data = means)

anova(mod, mod2, mod3)
```



In [None]:
Know your options
50xp

Which of the following code chunks would you use to display example code that should not be run?

Option A:

```{r echo = FALSE}
# For example, you could look up today's date:
Sys.Date()
```

Option B:

```{r eval = FALSE}
# For example, you could look up today's date:
Sys.Date()
```

Option C:

```{r results = FALSE}
# For example, you could look up today's date:
Sys.Date()
```

Option D:

```{r results = 'hide'}
# For example, you could look up today's date:
Sys.Date()
```

Possible Answers

    A
    1
    B (Correct)
    2
    C
    3
    D
    4

    Take Hint (-15xp)

In [None]:
Labeling and reusing code chunks
100xp

Apart from the popular code chunk options you have learned by now, you can define even more things in the curly braces that follow the triple backticks.

An interesting feature available in knitr is the labeling of code snippets. The code chunk below would be assigned the label simple_sum:

```{r simple_sum, results = 'hide'}
2 + 2
```

However, because the results option is equal to hide, no output is shown. This is what appears in the output document:

2 + 2

What purpose do these labels serve? knitr provides the option ref.label to refer to previously defined and labeled code chunks. If used correctly, knitr will copy the code of the chunk you referred to and repeat it in the current code chunk. This feature enables you to separate R code and R output in the output document, without code duplication.

Let's continue the example; the following code chunk:

```{r ref.label='simple_sum', echo = FALSE}
```

produces the output you would expect:

 ## [1] 4

Notice that the echo option was explicitly set to FALSE, suppressing the R code that generated the output.
Instructions

In the sample code on the right, you see a rather large code chunk that contains R code to load packages dplyr and ggvis and functions to create a ggvis graph. Separate the code chunks in three parts such that:

    The first code chunk contains the library() functions, for which no messages are shown in the output document.
    The second code chunk contains the ggvis and dplyr functions, for which no output is shown; give this code chunk the label chained.
    The third and last code chunk shows the output of the second code chunk, without the code that generated it. Use the ref.label option.

Finally, move the sentence "The ggvis plot gives us a nice visualization of the mtcars data set:" in between the second and third code chunk.

In [None]:
## Exploring the mtcars data set

Have you ever wondered whether there is a clear correlation between the gas consumption of a car and its weight?
To answer this question, we first have to load the `dplyr` and `ggvis` packages. The `ggvis` plot gives us a nice visualization of the `mtcars` data set:

```{r message=FALSE}
library(dplyr)
library(ggvis)
```

```{r chained, results = 'hide'}
mtcars %>%
  group_by(factor(cyl)) %>%
  ggvis(~mpg, ~wt, fill = ~cyl) %>%
  layer_points()
```

The `ggvis` plot gives us a nice visualization of the `mtcars` data set:

```{r ref.label='chained', echo = FALSE}
```

There is a myriad of options available for code chunks, you can discover them at [Yihui Xie's website](http://yihui.name/knitr/options/). So far, we have been compiling your R Markdown reports into HTML, but you can also compile your reports into pdf documents, Microsoft word documents, and slideshows. You can also customize the output of your documents. The next video will show you how. 

# 3. Compiling Reports

This chapter will show you how to generate an HTML, pdf, or Microsoft Word version of your report, as well as a pdf or HTML slideshow. Discover how to customize details of the output, and how to combine R Markdown with Shiny and ggvis.

In [None]:
Specify knitr and pandoc options
0xp
Each R Markdown output template is a collection of knitr and pandoc options. You can customize your output by overwriting the default options that come with the template.

For example, the YAML header below overwrites the default code highlight style of the pdf_document template to create a document that uses the zenburn style:

---
title: "Demo"
output:
  pdf_document:
    highlight: zenburn
---
The YAML header below overwrites the default bootstrap CSS theme of the html_document template.

---
title: "Demo"
output:
  html_document:
    theme: spacelab
---
Pay close attention to the indentation of the options inside the YAML header; if you do not do this correctly, pandoc will not correctly understand your specifications. As an example, notice the difference between only specifying the output document to be HTML:

---
output: html_document
---
and specifying an HTML output document with a different theme:

---
output:
  html_document:
    theme: spacelab
---
You can learn more about popular options to overwrite in the R Markdown Reference Guide.

Instructions
Add a table of contents to the Ozone report by setting its toc option to true and its number_sections option to true. Note that in both cases, true should be lower case.
Re-render the document.

In [None]:
---
title: "Ozone"
author: "Anonymous"
date: "January 1, 2015"
output:
  html_document:
    toc: true
    number_sections: true
---

## Data

The `atmos` data set resides in the `nasaweather` package of the *R* programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the [2006 ASA Data Expo](http://stat-computing.org/dataexpo/2006/).

Some of the variables in the `atmos` data set are:

* **temp** - The mean monthly air temperature near the surface of the Earth (measured in degrees kelvin (*K*))

* **pressure** - The mean monthly air pressure at the surface of the Earth (measured in millibars (*mb*))

* **ozone** - The mean monthly abundance of atmospheric ozone (measured in Dobson units (*DU*))

You can convert the temperature unit from Kelvin to Celsius with the formula

$$ celsius = kelvin - 273.15 $$

And you can convert the result to Fahrenheit with the formula

$$ fahrenheit = celsius \times \frac{9}{5} + 32 $$

## Cleaning

```{r echo = FALSE}
year <- 2000
```

For the remainder of the report, we will look only at data from the year `r year`. We aggregate our data by location, using the *R* code below.

```{r message = FALSE}
library(nasaweather)
library(dplyr)
library(ggvis)
```

```{r}
means <- atmos %>%
  filter(year == year) %>%
  group_by(long, lat) %>%
  summarize(temp = mean(temp, na.rm = TRUE),
         pressure = mean(pressure, na.rm = TRUE),
         ozone = mean(ozone, na.rm = TRUE),
         cloudlow = mean(cloudlow, na.rm = TRUE),
         cloudmid = mean(cloudmid, na.rm = TRUE),
         cloudhigh = mean(cloudhigh, na.rm = TRUE)) %>%
  ungroup()
```

where the `year` object equals `r year`.


## Ozone and temperature

Is the relationship between ozone and temperature useful for understanding fluctuations in ozone? A scatterplot of the variables shows a strong, but unusual relationship.

```{r echo = FALSE, fig.height = 4, fig.width = 5}
means %>%
  ggvis(~temp, ~ozone) %>%
  layer_points()
```

We suspect that group level effects are caused by environmental conditions that vary by locale. To test this idea, we sort each data point into one of four geographic regions:

```{r}
means$locale <- "north america"
means$locale[means$lat < 10] <- "south pacific"
means$locale[means$long > -80 & means$lat < 10] <- "south america"
means$locale[means$long > -80 & means$lat > 10] <- "north atlantic"
```

### Model

We suggest that ozone is highly correlated with temperature, but that a different relationship exists for each geographic region. We capture this relationship with a second order linear model of the form

$$ ozone = \alpha + \beta_{1} temperature + \sum_{locales} \beta_{i} locale_{i} + \sum_{locales} \beta_{j} interaction_{j} + \epsilon$$

This yields the following coefficients and model lines.

```{r}
lm(ozone ~ temp + locale + temp:locale, data = means)
```

```{r echo = FALSE, fig.height = 4, fig.width = 5}
means %>%
  group_by(locale) %>%
  ggvis(~temp, ~ozone) %>%
  layer_points(fill = ~locale) %>%
  layer_model_predictions(model = "lm", stroke = ~locale) %>%
  hide_legend("stroke") %>%
  scale_nominal("stroke", range = c("darkorange", "darkred", "darkgreen", "darkblue"))
```

### Diagnostics

An anova test suggests that both locale and the interaction effect of locale and temperature are useful for predicting ozone (i.e., the p-value that compares the full model to the reduced models is statistically significant).

```{r}
mod <- lm(ozone ~ temp, data = means)
mod2 <- lm(ozone ~ temp + locale, data = means)
mod3 <- lm(ozone ~ temp + locale + temp:locale, data = means)

anova(mod, mod2, mod3)
```



Brand your reports with style sheets
100xp
In the last exercise, we showed a way to change the CSS style of your HTML output: you can set the theme option of html_document to one of default, cerulean, journal, flatly, readable, spacelab, united, or cosmo. (Try it out).

But what if you want to customize your CSS in more specific ways? You can do this by writing a .css file for your report and saving it in the same directory as the .Rmd file. To have your report use the CSS, set the css option of html_document to the file name, like this

---
title: "Demo"
output:
  html_document:
    css: styles.css
---
Custom CSS is an easy way to add branding to your reports.

Instructions
The faded.css file to the right contains some example CSS that will change the appearance of your report. Modify the header of the .Rmd report to use the CSS, and then render the report.

In [None]:
---
title: "Ozone"
author: "Anonymous"
date: "January 1, 2015"
output: 
  html_document:
    css: faded.css
---

## Data

The `atmos` data set resides in the `nasaweather` package of the *R* programming language. It contains a collection of atmospheric variables measured between 1995 and 2000 on a grid of 576 coordinates in the western hemisphere. The data set comes from the [2006 ASA Data Expo](http://stat-computing.org/dataexpo/2006/).

Some of the variables in the `atmos` data set are:

* **temp** - The mean monthly air temperature near the surface of the Earth (measured in degrees kelvin (*K*))

* **pressure** - The mean monthly air pressure at the surface of the Earth (measured in millibars (*mb*))

* **ozone** - The mean monthly abundance of atmospheric ozone (measured in Dobson units (*DU*))

You can convert the temperature unit from Kelvin to Celsius with the formula

$$ celsius = kelvin - 273.15 $$

And you can convert the result to Fahrenheit with the formula

$$ fahrenheit = celsius \times \frac{9}{5} + 32 $$


## Section 6 - Shiny - Video

Shiny to make your reports interactive
100xp
Shiny is an R package that uses R to build interactive web apps such as data explorers and dashboards. You can add shiny components to an R Markdown file to make an interactive document.

When you do this, you must ensure that

You add runtime: shiny to the file's YAML header
You use an HTML output format (like html_document, ioslides_presentation, or slidy_presentation).
To learn more about interactivity with Shiny and R, visit shiny.rstudio.com.

Instructions
Read the raw R Markdown file to the right.

Fix the YAML header to make this a reactive HTML document.
Knit the result.
Visit the report hosted at DataCamp's Shiny Server to see the interactive report that it would make when you render it.

In [None]:
---
title: "Shiny Demo"
author: "DataCamp"
output: html_document
runtime: shiny
---

This R Markdown document is made interactive using Shiny. Unlike the more traditional workflow of creating static reports, you can now create documents that allow your readers to change the assumptions underlying your analysis and see the results immediately.

To learn more, see [Interactive Documents](http://rmarkdown.rstudio.com/authoring_shiny.html).

## Inputs and Outputs

You can embed Shiny inputs and outputs in your document. Outputs are automatically updated whenever inputs change.  This demonstrates how a standard R plot can be made interactive by wrapping it in the Shiny `renderPlot` function. The `selectInput` and `sliderInput` functions create the input widgets used to drive the plot.

```{r, echo=FALSE}
inputPanel(
  selectInput("n_breaks", label = "Number of bins:",
              choices = c(10, 20, 35, 50), selected = 20),

  sliderInput("bw_adjust", label = "Bandwidth adjustment:",
              min = 0.2, max = 2, value = 1, step = 0.2)
)

renderPlot({
  hist(faithful$eruptions, probability = TRUE, breaks = as.numeric(input$n_breaks),
       xlab = "Duration (minutes)", main = "Geyser eruption duration")

  dens <- density(faithful$eruptions, adjust = input$bw_adjust)
  lines(dens, col = "blue")
})
```

## Embedded Application

It is also possible to embed an entire Shiny application within an R Markdown document using the `shinyAppDir` function. This example embeds a Shiny application located in another directory:

```{r, echo=FALSE}
shinyAppDir(
  system.file("examples/06_tabsets", package = "shiny"),
  options = list(
    width = "100%", height = 550
  )
)
```

Note the use of the `height` parameter to determine how much vertical space the embedded application should occupy.

You can also use the `shinyApp` function to define an application inline rather then in an external directory.

In all of R code chunks above the `echo = FALSE` attribute is used. This is to prevent the R code within the chunk from rendering in the document alongside the Shiny components.




Interactive ggvis graphics
100xp
You can also use R Markdown to create reports that use interactive ggvis graphics. ggvis relies on the shiny framework to create interactivity, so you will need to prepare your interactive document in the same ways:

You need to add runtime: shiny to the YAML header
You need to ensure that your output is a HTML format (like html_document, ioslides_presentation, or slidy_presentation)
You do not need to wrap your interactive ggvis plots in a render function. They are ready to use as is in an R Markdown document.

Instructions
The .Rmd file to the right contains a ggvis plot that updates as a user moves a slider.

Fix the YAML header of the document; create a reactive HTML page.
Render the document; you will see that the rendered document will be static. The interactive report is hosted at Datacamp's Shiny server.

In [None]:
---
title: "ggvis"
author: "DataCamp"
output: html_document
runtime: shiny
---

ggvis provides a number of ways to enhance plots with interacticity. For example, the density plot below allows users to set the kernel and bandwidth of the plot.

```{r echo = FALSE, message = FALSE}
library(ggvis)

mtcars %>% ggvis(x = ~wt) %>%
    layer_densities(
      adjust = input_slider(.1, 2, value = 1, step = .1, label = "Bandwidth adjustment"),
      kernel = input_select(
        c("Gaussian" = "gaussian",
          "Epanechnikov" = "epanechnikov",
          "Rectangular" = "rectangular",
          "Triangular" = "triangular",
          "Biweight" = "biweight",
          "Cosine" = "cosine",
          "Optcosine" = "optcosine"),
        label = "Kernel")
    )
```




# 4. Configuring R Markdown (optional)

### Software for R Markdown
50xp
Before you can start off with R Markdown on your local system, there are several programs you need. Which of the following pieces of software do you need to compile R Markdown documents into HTML, pdf, and MS Word documents?

Technically, you only need Latex if you want to compile a document into a pdf.

A. RStudio
B. R
C. The rmarkdown R package
D. The knitr R package
E. Pandoc
F. Latex
G. Microsoft Word
H. A web browser

Possible Answers
 - A - RStudio suffices.
 - A and B.
 - B, C, D, E, and F. (Correct)
 - All of the above.

### Prepare your system to use R Markdown
50xp

Which of the following pieces of software does RStudio automatically install when you install RStudio?

A. R
B. The rmarkdown R package
C. The knitr R package
D. pandoc
E. LaTeX
F. Microsoft Word

Possible Answers
 - A
 - B
 - All of the above, except for Microsoft Word.
 - B, C, and D

# Thank You