### <font color="#FF6600">(expand for tip) </font> <font color="#445555">A note on opening notebooks in shared workspaces</font><a class="tocSkip">

<font color="#445555">Master copies of notebooks should not be run or edited unless you intend to improve the code. As a general rule, it is good to be cautious when editing a notebook in a shared workspace, because you don't want to overwrite the work of your collaborators. Best practices is to test in a cloned workspace with an easily identifiable name.</font>

# Setup notebook overview

In this notebook we install a collection of R packages and Jupyter extensions above and beyond what is installed by default. These are packages used in one or more of the notebooks in this workspace. 

This notebook needs to be rerun when you recreate the "Cloud Environment" (which may be a VM or cluster). Note that this is rare -- when you stop using a notebook, your VM is paused, not deleted. Your runtime only recreates after you click the "Delete environment options" button.

## Directions <a class="tocSkip">  
Run this notebook in your workspace before running any other R notebooks. For more information about Jupyter notebooks, and practice running a notebook in Terra, see the **0_Intro_to_Jupyter** notebook in this workspace.    
    
## Documentation key </font><a class="tocSkip">     
    
Below is an outline of how to identify different documentation components throughout the notebook     
    
|Type of information|Documentation cue | Action |   
|----|----|----|   
| Step by step instructions | **black font** | These are steps you shouldd follow to make sure the notebook runs properly |   
| Addditional details | **<font color="#FF6600">"expand for tip"</font>** | <font color="#445555">Expand the section for additional (optional) information and resources by clicking on the downward-facing gray arrow (at top left)  </font> |  
| **<font color="#445555">Optional information**</font> | <font color="#445555">**gray font** | You can read or not read these details, as you wish</font> |     
|Full code blocks | purple **<font color="purple"> <-></font>**<font color="#445555"> (at the right of code cell) | Click on the arrow at top left of cell to expand and see all the code.  <font color="#445555">**Note that you have to run the cell, but you don't have to unfold the code to do so**</font> |    

## <font color="#FF6600">(expand for tip) </font> <font color="#445555">How to customize your Cloud Environment</font><a class="tocSkip">

To expose the customization menu, select the gear icon at the top right of your workspace page.    

**<font color="#FF6600">Try clicking the gear icon at the top right to see!</font>**   

 <font color="#445555">You can customize at any time. **Note that changing these values will recreate your Cloud Environment**. To learn more about customizing your notebook Cloud Environment, see [this article](https://support.terra.bio/hc/en-us/articles/360038125912).

### <font color="#445555">Application configuration</font><a class="tocSkip">    
<font color="#445555">These are the packages and libraries installed on the VM that runs your notebook. There are several pre-configured environments with popular packages installed in Terra by default.      
    
**To use a custom Docker** to customize and standardize your Cloud Environment, choose the "Custom Environment" option in the dropdown menu and insert the URL for your Docker image in the "Container image" field. To learn more about making custom Dockers on Terra, see [this article](https://support.terra.bio/hc/en-us/articles/360037143432).    

**You can also specify a startup script in the Cloud Compute section to customize your Cloud Environment.** To learn more about using a startup script, see [this article](https://support.terra.bio/hc/en-us/articles/360058193872)</font></font>   

### <font color="#445555">Cloud compute</font><a class="tocSkip">    
<font color="#445555">This is where you can choose the VM CPU and memory (RAM). You can choose a standard VM, Spark Master Node, or Spark Cluster in the dropdown.    


### <font color="#445555">Detachable Persistent Disk size</font><a class="tocSkip">    
<font color="#445555">Terra attaches a persistent disk (PD) to your cloud compute in order to provide an option to keep the data on the disk after you delete your compute. PDs also act as a safeguard to protect your data in the case that something goes wrong with the compute. You can choose the size of your Persistent Disk. To learn more about Persistent disks and where your disk is mounted, see [this article](https://support.terra.bio/hc/en-us/articles/360047318551)</font>

## <font color="#FF6600">(expand for tip) </font> <font color="#445555"> Useful notebook extensions<a class="tocSkip">  
This notebook uses two extensions to make it cleaner and easier to follow.Expand these sections if you aren't familiar with the codefolding and collapsible headings extensions. 

### <font color="FF6600">(expand for tip)</font> <font color="#445555">About Jupyter notebook extensions</font><a class="tocSkip">

<font color="#445555">Jupyter notebook extensions are useful add-ons, developed by the community, to extend notebook functionality. You can read more about notebook extentions and how they work [here](http://www.blog.pythonlibrary.org/2018/10/02/jupyter-notebook-extension-basics/). The extensions we will use in this notebook include:   

* **Codefolding** - Makes the notebook cleaner by compressing large blocks of code    
You'll know this extension is in place by the small gray triangle at the top left of a code cell. Folded code cells have a right-facing triangle at the top left of the cell and a purple <font color=#7433FF>**<->**</font> to the right of the first line of code. To see the full code, you can expand the cell by clicking either the triangle or the arrow at the top of the code cell.    

**Note that you still need to run folded code cells but you will not need to unfold them to do so.** 

* **Collapsible headers** - Also makes the notebook tidier, by collapsing the cells under a header.  

### To use the notebook extensions, run the code, then refresh the browser before proceeding <a class="tocSkip"></font>

### <font color="FF6600">(expand for tip)</font> <font color="#445555"> Test collapsible header<a class="tocSkip">
You can uncollapse the header markdown cell in this section by clicking the right-facing arrow beside the header... Try it and see! 

<font color="purple">Congratulations!! You've discovered the hidden markdown beneath the collapsed header.</font>

In [None]:
# Code cells are also collapsed under the header... 
# To run code cells collapsed under a header, you need to first uncollapse the header!! 

**Try it out!** Once you implement the extensions, you can collapse and uncollapse any code cells and headings

### <font color="FF6600">(expand for tip)</font> <font color="#445555"> Test codefolding extension<a class="tocSkip">
Expand this section to practice using the codefolding extension. 

In [None]:
# This code cell is unfolded 
# Notice the small arrow at the top left of the cell is pointing DOWN
# There is no purple two-way arrow to the right of the first line
# You can click on the small arrow to fold the cell

In [None]:
# This code cell is folded. Notice the triangle (left) is pointing RIGHT and the purple arrow (right) 
# If you clicked on the DOWN arrow, you will expand the cell and be able to read this text. 

# CONGRATULATIONS!!!

## <font color="#FF6600">(expand for tip) </font> <font color="#445555">How to know your code cell is running/complete</font><a class="tocSkip">

**<font color="#445555">Running code cells</font>**    
<font color="#445555">1. There will be an `*` in between the brackets to the left at the top of the code cell (`In [*] </font>   
<font color="#445555">2. The section with the code cell will be highlighted in red in the Table of Contents</font>    

**<font color="#445555">Completed code cells</font>**    
<font color="#445555">1. The `*` will turn into a number (`IN [*]` will become `In [3]` for example)</font>
<font color="#445555">2. You may get some output information (in black text on either white or pink background). Note that a pink warning message does not necessarily mean your notebook is broken!</font>

# Install R packages



## Generally useful R packages

In [None]:
# This code defines a time-saving command, that checks to see if the needed libraries are already installed and
# installs them only if they're missing. The `install_if_missing,` command will be used below.

install_if_missing <- function(packages) {
    if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
        install.packages(setdiff(packages, rownames(installed.packages())))
    }
}

In [None]:
# Use the command defined above to install the nine R packages (in parentheses) below 
install_if_missing(c('tidyverse','viridis', 'ggthemes', 'pryr', 'skimr',
                     'testthat', 'reticulate', 'data.table', 'RCurl'))

# There may be a warning in pink that says 'lib' is unspecified, which you can ignore. 

### <font color="FF6600">(expand for tip)</font> <font color="#445555">What about those pink output warnings?</font><a class="tocSkip">

<font color="#445555">**If you get a warning in pink after running a code cell**   
These warnings are simply standard output from the virtual machine. They give useful information, but are usually not anything that will break the rest of the notebook. Often they refer to versions of libraries or library commands that are masked. Feel free to read through to get a sense of what is going on behind-the-scene. <font color="#445555">

## Leonardo R package

Leonardo is a service that provides access to interactive tools like Jupyter, RStudio, and Hail running in the cloud inside the Terra security boundary.

In [None]:
# Install the package of libraries that Leonardo needs, which are hosted on github 
devtools::install_github('DataBiosphere/ronaldo')

**Warning note**    
Note that after running this cell you may get a warning that reads    

`Skipping install of 'Ronaldo' from a github remote, the SHA1 (426459ff) has not changed since last install.     
Use 'force = TRUE' to force installation`   

**Plain English**   
This warning is telling you that this package has already been installed, and to save time, it's skipping this step. The code `force = TRUE` allows you to override this and re-install the package.    


# Confirm that the R packages loaded properly

In [None]:
# Warnings that objects are masked between R packages are to be expected and you can ignore them 
library(viridis)    # A nice color scheme for plots.
library(ggthemes)   # Common themes to change the look and feel of plots.
library(scales)     # Graphical scales map data to aesthetics in plots.
library(testthat)   # Testing functions.
library(assertthat) # Assertion functions.
library(pryr)       # Memory usage functions.
library(skimr)      # Summary statistics for dataframes.
library(bigrquery)  # BigQuery R client.
library(tidyverse)  # Data wrangling packages.
library(reticulate) # Python R client.
library(Ronaldo)    # Leonardo R package.

# Troubleshooting tips and tricks

This notebook installs the most recent versions of R packages from [CRAN](https://cran.r-project.org/) and Python packages from [pip](https://pypi.org/project/pip/) on to your VM. Additionally, some packages come from [GitHub](https://github.com/) or [Cloud Source Repositories](https://cloud.google.com/source-repositories/).

1. If you encounter any errors, first try restarting the kernel and running all: `Kernel -> Restart & Run All`.
1. If an R package still fails to install:
 - Open a terminal by clicking on the terminal icon next to 'Notebook Runtime' in the upper top right corner of the window
 - Type `R` to start R in the terminal
 - Type `install.packages("qwraps2")` to get a more detailed error message. Replace `qwraps2` with the name of which ever package is failing to install.
1. If that error message tells you what you need to do to resolve the issue, great! If not, copy and paste the error message into Google Search for more help. 

# Provenance

Provenance is a record of exactly the environment used to run the notebook. It's useful for collaborating, and also helpful when you return to a notebook months after your initial analysis. It's also Best Practices for reproducible research.

In [None]:
# Output all session information
devtools::session_info()

Copyright 2019 The Broad Institute, Inc., Verily Life Sciences, LLC All rights reserved.

This software may be modified and distributed under the terms of the BSD license. See the LICENSE file for details.