
# Using bash and loading R packages in R notebooks
***

This notebook is delivered "As-Is". Notwithstanding anything to the contrary, DNAnexus will have no warranty, support, liability or other obligations with respect to Materials provided hereunder.

[MIT License](https://github.com/dnanexus/OpenBio/blob/master/LICENSE.md) applies to this notebook.


***

## Introduction <a name="Introduction" />
This R notebook highlights tips and tricks for using bash from the R kernel and for loading R packages.

## Jupyterlab app details (launch configuration) <a name="spec"/>

### Recommended configuration
- runtime: < 20 min
- cluster configuration: `single node`
- recommended instance: `mem1_ssd1_v2_x4`
- cost: < £0.05


### Performance comparison
- **mem1_ssd1_v2_x4, single node**:    
    - runtime: < 20 min
    - cost: < £0.05
- mem1_ssd1_v2_x16, single node:
    - runtime: < 20 min
    - cost: < £0.15


## Installing `R` packages <a name="install"/>


Example of a few R packages for statistical genetics analysis: 
[devtools](https://cran.r-project.org/web/packages/devtools/index.html), 
[ggplot2](https://cran.r-project.org/web/packages/ggplot2/index.html), 
[tidyverse](https://cran.r-project.org/web/packages/tidyverse/index.html), 
[mclust](https://cran.r-project.org/web/packages/mclust/index.html), 
[RNOmni](https://cran.r-project.org/web/packages/RNOmni/index.html), 
[ISLR](https://cran.r-project.org/web/packages/ISLR/index.html), 
[xgboost](https://cran.r-project.org/web/packages/xgboost/index.html), 
[pacman](https://cran.r-project.org/web/packages/pacman/index.html), 
[ivpack](https://cran.r-project.org/web/packages/ivpack/index.html), 
[meta](https://cran.r-project.org/web/packages/meta/index.html), 
[MendelianRandomization](https://cran.r-project.org/web/packages/MendelianRandomization/index.html), 
[TwoSampleMR](https://github.com/MRCIEU/TwoSampleMR), 
[randomForest](https://cran.r-project.org/web/packages/randomForest/randomForest.pdf), 
[Ggrepel](https://cran.r-project.org/web/packages/ggrepel/index.html), 
[reshape2](https://cran.r-project.org/web/packages/reshape2/index.html)

Many packages are installed in the base image of Jupyterlab and can be checked with `installed.packages()` 


### List already installed R packages on UKB RAP

In [None]:
installed.packages()

### Check if a package is already installed

In [None]:
pkg = c(
    "remotes",
    "tidyverse",
    "mclust",
    "RNOmni",
    "ISLR",
    "xgboost",
    "pacman",
    "ivpack",
    "meta",
    "MendelianRandomization",
    "randomForest",
    "ggrepel",
    "reshape2"
)

# List out packages to be installed
pkg[!(pkg %in% installed.packages()[,"Package"])]

### Install additonal packages

Uncomment the install commands if you are comfortable with the library license and want to install and run the parts notebook that depend on the library.

In [None]:
#install.packages(c("tidyverse"), repos = "https://cloud.r-project.org")

### Load libraries (installed packages)

In [None]:
library(tidyverse, quietly = TRUE)

### Install packages from Github repositories

Uncomment the install commands if you are comfortable with the library license and want to install and run the parts notebook that depend on the library.

In [None]:
#remotes::install_github("rstudio/shiny")
library(shiny, quietly = TRUE)

### Make a SNAPSHOT of your installed R packages 

Once you have installed the packages you want to reuse in your next session, you can create snapshots which can be loaded as startup of Jupyter and will carry any additional packages installed on this worker. Please look into [documentation](https://documentation.dnanexus.com/user/jupyter-notebooks#environment-snapshots) for more details.

## Using `bash` from the R kernel <a name="bash"/>

`system` lets you execute bash commands through the R kernel

In [None]:
# View current directory in the UKB RAP project
system("dx pwd", intern = TRUE)

### Create a test file and read it, as if it was from a bash terminal

In [None]:
system("echo 'This is a test' > test.txt", intern = TRUE)
system("head test.txt", intern = TRUE)

In [None]:
# Upload file to UKB RAP
system("dx upload test.txt")

In [None]:
# List all files and folders in the current directory in the UKB RAP project
system("dx ls", intern = TRUE)

In [None]:
# Remove test.txt file from UKB RAP
system("dx rm test.txt")

## Uploading data to the UKB RAP <a name="upload"/>

Using public data from the `MendelianRandomization` R package

In [None]:
library(MendelianRandomization)

betas <- cbind(ldlc, hdlc, trig) %>% as.data.frame()
betases <- cbind(ldlcse, hdlcse, trigse) %>% as.data.frame()

snp_df <- cbind(betas,betases)
snp_df$id <- paste0("snp_",1:nrow(snp_df))

head(snp_df)

In [None]:
snp_df %>% write_csv("snp_df.csv")

In [None]:
# Remove any previous version of snp_df from UKB RAP if it exists
system("dx rm snp_df.csv")

In [None]:
system("dx upload snp_df.csv")

## Read data into R <a name="read"/>

### Option 1: Download a file from a RAP project storage to a JupyterLab environment storage and load into the session

In [None]:
system("dx download '/Showcase metadata/field.tsv'", intern = TRUE)

In [None]:
field_info <- read.table("field.tsv", sep = "\t", header = TRUE, fill = TRUE)
head(field_info)

### Option 2: Stream file from the project directly to be read in R dataframe with `dxfuse`

[`dxfuse`](https://github.com/dnanexus/dxfuse) is filesystem that allows users access to the DNAnexus storage system.

When there is no need to download files to the local environment of this worker. Recommended for larger files.

*Notes*:
- `dxfuse` is for read-only. 
- After mounting, the file system structure remains fixed. Any changes made externally in the project (e.g. a new file is uploaded in the project) are not reflected in the local worker.

In [None]:
field_info <- read.table("/mnt/project/Showcase metadata/field.tsv", sep = "\t", header = TRUE, fill = TRUE)
head(field_info)