In [1]:
# Lab settings - please ingnore
options(repr.plot.width=7, repr.plot.height=4, repr.plot.res=250 ) # Make plots a resonable size

<div class="big_title">LAB 11: Scientific reports with Rmarkdown</div>

BIO3782: Biologist's Toolkit (Dalhousie University)

----------------------------------------

# Setup of workspace

<span class="important"></span> Make sure all required files are in the working directory:

* Create a **folder** on the "Desktop" and name it <span class='file'>Lab11</span>.
* From Brightspace, download the following files into your folder <span class='file'> Desktop\Lab11\ </span> folder:
   * <span class="file">References.bib</span> 
   * <span class="file">apa-6th-edition.csl</span>
   * <span class="file">edidiv.csv</span>
   
* In RStudio, change the "working directory" to: <span class='file'>Desktop\Lab11</span>. Click here if you need a [refresher on the **working directory**](https://diego-ibarra.github.io/biol3782/week1/index.html#RStudio's-%22Working-Directory%22)

As in previous labs, we'll try simulate "real-life" coding, by using the tags below to indicate when to use RStudio's <span class="editor"></span> or <span class="console"></span>

<br>
<div class="use_editor"></div>    
<br>
<br>
<div class="use_console"></div>    
<br> 
----------------------------------


# Data analysis reports

A "data analysis report" is a document that includes text, graphs, equations, and code. Data analysts tend to write a lot of reports, describing their analyses and results for their collaborators, or to document their work for future reference.

Many new users begin by first writing a single R script containing all of their work, and then share the analysis by emailing the script and various graphs as attachments. But this can be cumbersome, requiring a lengthy discussion to explain which attachment was which result.

Writing formal reports with MS Word or [LaTeX](https://en.wikipedia.org/wiki/LaTeX) can simplify this process by incorporating both the analysis report and output graphs into a single document. But tweaking formatting to make figures look correct and fixing obnoxious page breaks can be tedious and lead to a lengthly “whack-a-mole” game of fixing new mistakes resulting from a single formatting change.

Creating a web page (as an html file) using [RMarkdown](https://rmarkdown.rstudio.com/) makes things easier. The report can be one long stream, so tall figures that wouldn’t ordinarily fit on one page can be kept at full size and easier to read, since the reader can simply keep scrolling. Additionally, the formatting of and RMarkdown document is simple and easy to modify, allowing you to spend more time on your analyses instead of writing reports.

## LaTex

Computer programmer Leslie Lamport created [LaTeX](https://en.wikipedia.org/wiki/LaTeX) in the 1980's as a typsetting language that - in stark contrast to the [WYSIWYG](https://en.wikipedia.org/wiki/WYSIWYG) fussing of MS Word - separates the design and layout of documents from the writing itself. This separation of content and presentation design philosophy allows you to write stuff quickly without any care as to how it looks until the end. LaTeX is popular among physicists and statisticians, in part because LaTeX documents are ideal for version control (e.g. via Git), but also because LaTeX does an excellent job rendering equations relatively quickly. So well in fact that Word now accepts LaTeX notation in its equation editor, and it is available in R-Markdown as well to typeset equations within the text body. Check out this [LaTeX](https://wch.github.io/latexsheet/) cheat sheet for some usueful commands and syntax.

## Rmarkdown and Knitr

Analysis reports made with Rmarkdown/Knitr are reproducible documents: If an error is discovered, or if some additional subjects are added to the data, you can just re-compile the report and get the new or corrected results rather than having to reconstruct figures, paste them into a Word document, and hand-edit various detailed results.

The key R package here is `knitr`. It allows you to create a document that is a mixture of text and chunks of code. When the document is processed by `knitr`, chunks of code will be executed, and graphs or other results will be inserted into the final document.

`knitr` allows you to mix basically any type of text with code from different programming languages, but we recommend that you use [RMarkdown](https://rmarkdown.rstudio.com/), which mixes Markdown with R. [Markdown](https://en.wikipedia.org/wiki/Markdown) is one of many light-weight markup language for creating web pages. Other light-weight markup languages include [reStructuredText](https://en.wikipedia.org/wiki/ReStructuredText) (used in Python), [MediaWiki](https://www.mediawiki.org/wiki/Help:Formatting) (used in Wikipedia), and many [others](https://en.wikipedia.org/wiki/Lightweight_markup_language).

Here is a useful [Rmarkdown cheat sheet](https://rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf).

To see the poer of Rmarkdown, here is an [undergraduate dissertation](https://github.com/ourcodingclub/CC-2-RMarkdown/blob/master/UnderGrad_Dissertation_Rmd.pdf) written using R.


<br><br><br>
<div class="startTASK"></div>

<b>Create an Rmarkdown file</b>

To create a new RMarkdown file (.Rmd), select `File` -> `New File` -> `R Markdown` in RStudio, then choose the file type you want to create. For now we will focus on a `.html` Document, which can be easily converted to other file types later. Enter a Title (<span class="file">Lab11</span>) and Author Name (your name). Then click OK.
Save the file using the following format: <span class="file">Lab11.rmd,</span> 

<span class="note"></span>: The document title is not the same as the file name.

<img src="New_R_Markdown.png">

The newly created `.Rmd` file comes with basic instructions, but we want to create our own RMarkdown script, so go ahead and delete everything in the example file.

<div class="endTASK"></div>
<br><br><br>

# Knitting (i.e. Rendering) a .Rmd file

The <span class="file">.Rmd</span> file that you created contains the instructions to make a beautiful website (i.e. an <span class="file">.html</span> file that can be seen using a web browser). The process of rendering an <span class="file">.html</span> from the instructions contained in the <span class="file">.Rmd</span> is called **"knitting"**. To "knit", click on the `Knit` icon and then select `Knit to HTML` to create your html file.
<br>
<img src="knit.png">
<br>
Take a look at the <span class="file">.html</span> file produced by `knitr`. Your output should won't look like much at this point. In the following section we will be addint text, code and graphs to your <span class="file">.Rmd</span> file, which then you can **"knit"** into a incrementally more complex website.

Note that you can also knit to PDS and to a Word file.

# Structure of an Rmarkdown file

There are three parts to an .Rmd file:

1. Header: The text at the top of the document, written in YAML format.
1. Markdown sections: Text that describes your workflow written using markdown syntax.
1. Code chunks: Chunks of R code that can be run and also can be rendered using knitr to an output document.
<br>
<img src="structure.png">
<br>

## YAML Header

An R Markdown file always starts with a header written using [YAML](https://en.wikipedia.org/wiki/YAML) syntax. This header is sometimes referred to as the front matter.

There are four default elements in the RStudio YAML header:

* title: The title of your document. Note, this is not the same as the file name.
* author: Who wrote the document.
* date: By default this is the date that the file is created.
* output: What format will the output be in. You will use html.
<br>
<img src="YAML_default.png">
<br>
<span class="note"></span> A YAML header begins and ends with three dashes ---. Also notice that the value for each element, title, author, etc, is in quotes "value-here" next to the element. A YAML header may be structured differently depending upon how your are using it. 

You can also specify more complicated YAML options for citation and document styles like in the example below
<br>
<img src="YAML_complicated.png"> <img src="YAML_complicated_Word.png">
<br>
Let's edit our header by specifying the reference bibtex list, setting font style and size, and setting Rmarkdown to use the current date and time that the document is created. You can check out this [guide](https://bookdown.org/yihui/rmarkdown/html-document.html) for more info. 

<br>
<div class="use_editor"></div>
<br>
<img src="YAML_task.png">
<br>

## Markdown syntax

Markdown is a human readable syntax for formatting text documents. Markdown can be used to produce nicely formatted documents including pdfs, web pages and more. When you format text using markdown in a document, it is similar to using the format tools (bold, heading 1, heading 2, etc) in a word processing tool like Microsoft Word or Google Docs.

An R Markdown file can contain text written using the markdown syntax. Markdown text, can be whatever you want. It may describe the data that you are using, how it’s being processed and what the outputs are. You may even add some text that interprets or discusses the outputs.

When you render your document to html, this markdown will appear as text on the output html document.

<span class="note"></span> Below we explain the **basic markdown syntax**, however it is a good idea for you to a look at the following website to get a very well crafted overview basic markdown syntax: https://www.markdownguide.org/basic-syntax/ 

Markdown is simple plain text, that is styled using special characters, including:

* #: a header element.
* **: bold text.
* *: italic text.

When you type text in a markdown document with not additional syntax, the text will appear as paragraph text. You can add additional syntax to that text to format it in different ways.

For example, if we want to highlight a function or some code within a plain text paragraph, we can use one backtick on each side of the text (').

To add emphasis to other text you can use **bold** or *italics*.

<br>
<div class="use_editor"></div>
<br>
<img src="text.png">
<br>
Your output would look like this in html
<br>
<img src="text_out.png">
<br>

Click on the `Knit` icon to create your html file.
<br>
<img src="knit.png">
<br>
Take a look at the <span class="file">.html</span> file produced by `knitr`. Your output should look like this after knitting.
<br>
<img src="fontstyle.png">
<br>
You can also add the following:

* Unordered list items 
  * Unordered list item
  
* Ordered list items
  1. Ordered list item
  
* Website links <br>
  [Google](https://www.google.com)
 
* Equations using [LaTeX equations syntax](http://reu.dimacs.rutgers.edu/Symbols.pdf) <br>
  $A = \pi \times r^{2}$
  
<br>
<div class="use_editor"></div>
<br>
<img src="links.png">
<br>

You can also add in-text citations. When you knit your file, Rmarkdown will automatically generate a references section at the end of your document. 

<br>
<div class="use_editor"></div>
<br>
<img src="references.png">
<br>
Again, click on the `Knit` icon and re-create your html file. Your html output should look like this:

<br>
<img src="ref_html.png">
<br>

### Section Headings

We create a heading using the pound (#) sign. For the headers to render properly there must be a space between the # and the header text. We can create subheading by adding more pound signs. For example:

<br>
<div class="use_editor"></div>
<br>
<img src="Headings.png">
<br>

The output (after knitting) should look like this:

<br>
<img src="Headings2.png" width="300px">
<br>


## Code Chunks

Code chunks in an R Markdown document contain your R code. All code chunks start and end with three backticks or graves. A code chunk would look like this:

<br>
<img src="codechunk_basic.png">
<br>
The first line: `{r setup}` contains the language (r) in this case, and the name of the chunk. Specifying the language is mandatory. Next to the {r}, there is a chunk name. The chunk name is not necessarily required however, it is good practice to give each chunk a unique name to support more advanced knitting approaches.

You can add new chunks by clikcing on the `Insert` icon.

<br>
<img src="insert_chunk.png">
<br>

### Code Chunk options

You can add options to each code chunk. These options allow you to customize how or if you want code to be processed or appear on the rendered output (pdf document, html document, etc). Code chunk options are added on the first line of a code chunk after the name, within the curly brackets.

3 Common Chunk Options:

* `eval = FALSE`: Do not evaluate (or run) this code chunk when knitting the RMD document. The code in this chunk will still render in our knitted html output, however it will not be evaluated or run by R.
* `echo=FALSE`: Hide the code in the output. The code is evaluated when the Rmd file is knit, however only the output is rendered on the output document.
* `results=hide`: The code chunk will be evaluated but the results or the code will not be rendered on the output document. This is useful if you are viewing the structure of a large object (e.g. outputs of a large data.frame which is the equivalent of a spreadsheet in R).

Multiple code chunk options can be used for the same chunk. Below is a table with more code chunk options.
<br>
<img src="chunk_options.png">
<br>

# Inserting figures

By default, RMarkdown will place graphs by maximising their height, while keeping them within the margins of the page and maintaining aspect ratio. If you have a particularly tall figure, this can mean a really huge graph. In the following example we modify the dimensions of the figure we created above. To manually set the figure dimensions, you can insert an instruction into the curly braces:

<br>
<div class="use_editor"></div>
<br>
<img src="fig_in.png">
<br>
Your output should look like this after you knit the document.
<br>
<img src="fig_out.png">
<br>

# Inserting Tables

## Standard Rmarkdown

Rmarkdown can print the contents of a data frame easily by enclosing the name of the data frame in a code chunk.

<br>
<div class="use_editor"></div>
<br>
<img src="tables_basic.png">
<br>
Although complete, this might not be the best way to display data. Including a formal table requires more effort.

## kable() and knitr

The most aesthetically pleasing and simple table formatting function I have found is `kable()` in the [knitr](https://cran.r-project.org/web/packages/knitr/knitr.pdf) package. The first argument tells `kable` to make a table out of the object dataframe and that numbers should have two significant figures.

<br>
<div class="use_editor"></div>
<br>
<img src="tables_kable.png">
<br>

## pander()

If you want a bit more control over the content of your table you can use `pander()` in the [pander](https://www.r-project.org/nosvn/pandoc/pander.html) package. Imagine we want the 3rd column to appear in italics:

<br>
<div class="use_editor"></div>
<br>
<img src="tables_pander_in.png">
<br>
<span class="note"></span> Your Rmarkdown output will look different from your html. The html output for the code above is:
<br>
<img src="tables_panderOut.png">

# Data exploration and analysis exercise

Now that you have a basic handle on Rmarkdown, let's take a look at some biological data from the [NBN Gateway](https://data.nbn.org.uk/)

First, let's create a chunk that will read in out data.

<br>
<div class="use_editor"></div>
<br>
<img src="load_data.png">
<br>

Next, let's add some information about the dataset.

<br>
<div class="use_editor"></div>
<br>
<img src="data_desc.png">
<br>

We can also examine species richness across groups.


<br>
<div class="use_editor"></div>
<br>
<img src="richness.png">
<br>

Your output should look like this:
<br>
<img src="richness_out.png">
<br>

Let's analyze the data graphically. 

<br>
<div class="use_editor"></div>
<br>
<img src="richness_plot.png">
<br>

Your output should look like this:
<br>
<img src="richness_plot_out.png">
<br>

What would the most common species in each taxonomic group be?

<br>
<div class="use_editor"></div>
<br>
<img src="abund.png">
<br>

Your output should look like this:
<br>
<img src="abund_out.png">
<br>

To 

<div class="Q"><br><br>
    
What are three parts of a .Rmd file?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>
    
What are the default components of a YAML header?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>
    
What format would the <b>output</b> argument in the header be to create a word document?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>
    
TRUE or FALSE: The title of your markdown document and file name should be the same
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>
    
What is the output of Markdown text? 
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>
    
What symbol would you use to italicize text?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>
    
What symbol would you use to highlight text?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>
    
What symbol would you use to change text face to bold?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What language/syntax would you use to add equations?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What symbol would you use to add only the date after an author name in in-text citations?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What symbols would you use to add in-text citations?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What code syntax symbol separates references when creating a mutli-reference in-text citation (i.e. (Wickham, 2011, Wicham, 2012))?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What would I use to create a heading?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

TRUE or FALSE: An empty space is needed after a pound symbol to specify a new heading level
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What does a code chunk contain?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

Do I always need a chunk name?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What is the benefit of having a chunk name?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What icon do you use to add a new chunk?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

How do we note the beginning and end of a YAML header?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What does <b>eval=FALSE</b> do in a chunk?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What chunk option would I use to stop my code from displaying in the output?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

Would <b>results=hide</b> stop my R code from running?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What chunk option specifies figure height?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What chunk option stops error messages from being displayed?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What does the warnings chunk option specify?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

Will the Rstudio output and html output look exactly the same?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

Can you knit to formats other than html?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What YAML option specifies the bibliography file?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What YAML option specifies the citation style (i.e. MLA, APA)?
    
<br><br><div class="qhelp"></div></div>

<div class="Q"><br><br>

What does <b>kable()</b> do?
    
<br><br><div class="qhelp"></div></div>

<div class="big_title">This is the end of lab</div>

*******************
*******************

Code below is for formatting of this lab. Do not alter!

In [2]:
cssFile <- '../css/custom.css'
IRdisplay::display_html(readChar(cssFile, file.info(cssFile)$size))

IRdisplay::display_html("<style>.Q::before {counter-increment: question_num;
    content: 'QUESTION ' counter(question_num) ': '; white-space: pre; }.T::before {counter-increment: task_num;
    content: 'Task ' counter(task_num) ': ';</style>")