vignettes/rocker_setup.Rmd

---
title: "Connecting to R Studio Server via a docker on a remote server"
author: 
- name: "David Zhang"
  affiliation: UCL
date: "`r format(Sys.time(), '%d %B %Y')`"
output: 
  bookdown::html_document2:
    figure_caption: yes
    code_folding: show
    theme: spacelab
    highlight: kate
    df_print: paged
    toc: true
    toc_float: true
    number_sections: true
---

```{r setup, include = FALSE}

library(knitr)

knitr::opts_chunk$set(echo = TRUE, message = FALSE)
```

> Aim: Connect to R Studio server using docker

<br><br>

# Background

- [R Studio server](https://www.rstudio.com/products/rstudio/download-server/) allows user's to code using a convenient IDE, whilst harnessing the processing power of a computing cluster/server. 
- However, the installation and maintenance of the R Studio server software can be time-consuming. Also, the free version of R Studio server only permits a single version of `R` to be used. Assuming you are not willing to pay, this can limit analyses that depend on packages across multiple versions of R. 
- Using an docker image from the [rocker project](https://hub.docker.com/r/rocker/rstudio) with R/R Studio server pre-installed, you can easily bypass the two limitations above. A user can connect to the `rocker` image via an `ssh` tunnel, accessing R Studio server through their local web browser. Each docker image can have a unique installation of R Studio server, thus any number of `R` versions can be used. 
- The following guide with demonstrate how to install and connect to a R Studio server via a `rocker` image on a remote server. 
- This guide does **not** cover the fundamentals of docker itself and it is recommended that anyone using this guide should already have a basic proficiency with docker.  

<br><br>

# Guide

## Docker installation

You must have `docker` installed on your system. To check you have `docker` installed, you can use: 

```{bash check-docker-version, eval = FALSE}

# based on: https://www.digitalocean.com/community/questions/how-to-check-for-docker-installation
docker -v
echo $?

```

If you don't, install `docker`. A guide to installing `docker` can be found [here](https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-18-04).

<br><br>

## Download a docker image with R Studio server pre-installed

In order to use R Studio server, a `docker` image with R Studio server pre-installed must be downloaded. [Bioconductor](https://www.bioconductor.org/help/docker/) releases it's own image based on the `rocker project`, with other useful resources for analyses of biological data pre-installed, such as core Bioconductor packages. You can download the Bioconductor docker image using:

```{bash download-rocker, eval = FALSE}

# the release 3.13 version of Bioconductor rocker image is used here
# be sure to check for an updated version as and when you use this guide
sudo docker pull bioconductor/bioconductor_docker:RELEASE_3_13

```

<br><br>

## Start a R Studio server process 

Next, we will create a process running R Studio server on the Bioconductor docker image downloaded above. To do this, you need to use various flags together with the `docker` command. For convenience, I have created a `R` wrapper function to run these docker commands within the function [rutils::docker_run_rserver](https://github.com/dzhang32/rutils/blob/master/R/docker_shell.R). This calls the relevant docker commands from within `R` with arguments for the relevant flags, which are explained below. 

First, open an `R` terminal and install the `rutils` package from GitHub via:

```{r install-rutils, eval = FALSE}

# this requires R version >= 4.0
devtools::install_github("dzhang32/rutils")
```

Next, run the `rutils::docker_run_rserver()` function within the `R` terminal. At a minimum, you should set the `image`, `port` and `name` arguments explained below. Setting the `verbose` argument to TRUE will print the flags that were used within the `docker` command and can be useful for debugging or logging your session. 

```{r docker-run-rserver-ex1, eval = FALSE}

rutils::docker_run_rserver(
  image = "bioconductor/bioconductor_docker:RELEASE_3_13", # rocker image
  port = 8787, # port on which the host will have present R Studio server
  name = "example", # name of docker process
  verbose = TRUE # whether to print out the flags passed to the docker command
)
```

<br><br>

## Connecting to a R Studio server process

Now that the R Studio server process is running, you can now map the `localhost` of your local machine to the `port` on the remote server presenting R Studio server (specified above as 8787). An example `ssh` command is shown below and should be run on your **local** terminal: 

```{bash ssh-tunnel, eval = FALSE}

# make sure the port specified here matches the port you have used above in 
# rutils::docker_run_rserver()
ssh -i path_to_pem.pem \
-X -N -f -L localhost:8787:localhost:8787 \
user@ip

```

If the above `ssh` command has run successfully, you will now be able to access R Studio server by going to the address `localhost:8787` on your local browser. The default login details for the Bioconductor docker are:

Username: **rstudio**

Password: **bioc**

More details of the Bioconductor docker can be found [here](https://www.bioconductor.org/help/docker/).

<br><br>

## Mounting volumes 

Most analyses relies on data that is stored on the original host, therefore not (by default) accessible by the docker process. Therefore, it is often useful to mount the required files, allowing them to be accessible by the `docker` process. Mounting can be configured using `rutils::docker_run_rserver()` via the arguments `volumes`, `volumes_ro`. 

The user permissions for accessing the mounted `volumes` are dictated by the `USERID` and `GROUPID` arguments. These should be set matching the user you would like to mirror the permissions of. On linux, the `USERID` and `GROUPID` of the current user can be obtained via the `bash` command `id`. 

Below is an example of running `rutils::docker_run_rserver()` whilst mounting volumes: 

```{r docker-run-rserver-ex2, eval = FALSE}

# volumes - paths will be mounted with user permissions
# matching user specified by the USERID and GROUPID arguments
# volumes_ro - paths will be mounted with read-only access
rutils::docker_run_rserver(
  image = "bioconductor/bioconductor_docker:RELEASE_3_13",
  port = 8787,
  name = "example_2",
  verbose = TRUE,
  volumes = c(
    "/path/to/mounted/dir"
  ),
  volumes_ro = c(
    "/path/to/mounted/dir"
  ),
  permissions = "match",
  USERID = 1000,
  GROUPID = 1000
)
```

<br><br>

# Reproducibility

```{r reproducibility, echo = FALSE}

# Session info
library("sessioninfo")

options(width = 120)

session_info()
```