# nbdocker: quick demo
 
### By: Jiaming Hu, Ling-Hong Hung and Ka Yee Yeung. 

Institute of Technology, University of Washington Tacoma, WA 98402, USA

This is a quick demo of nbdocker, showing the feasibility of embedding a Docker container inside a Jupyter notebook.

## Introduction

In this quick demo, we will download fastq files from NCBI Short Read Archive (SRA) and then perform quality assessment using a R/Bioconductor package called ShortRead.  

**This demo shows the following:**
1. Docker containers can be embedded inside Jupyter notebooks using nbdocker. In addition, the history of the embedded containers are included in the notebook.  Sharing the notebook (.ipynb) files will include the history of Docker commands.
2. The users can  can run and monitor the progress of the embedded Docker containers by a single click.
3. We can mix and match computing environments and programming langugages. In particular, the download step involves shell commands and Python wrapper, while the quality control step uses R.
4. Allowing multiple computing environment facilitates collaboration and reproducibility.

## 1. Download the fastq input files from SRA 
#### Note: running this container takes ~5 minutes.

We use RNA-seq data generated by Trapnell et al. (Nat Biotechnol. 2013 Jan; 31(1): 10.1038/nbt.2450) in which they studied the differential gene expression of lung fibroblasts in response to loss of the developmental transcription factor HOXA1.  In this case study, we will compare changes in gene expression  in response to HOXA1 knockdown. 

The fastq files are publicly available from the Short Read Archive (SRA) with the following link:
https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR493366

To download the files we use the parallel-fastq-dump container. This is a python wrapper around the fastq-dump utility that allows it to download separate chunks of the files separately.

We assume you have created the "demo/fastq" directories and the fastq files are downloaded to '/home/jovyan/work/demo/fastq'

**Click on the whale button** below to run this container

{nbdocker#0}

## 2. Perform quality assessment using a R/Bioconductor package

In [43]:
# Install the "ShortRead" Bioconductor package
source('http://bioconductor.org/biocLite.R')
biocLite("ShortRead")
library (ShortRead)

Bioconductor version 3.6 (BiocInstaller 1.28.0), ?biocLite for help
A new version of Bioconductor is available after installing the most recent
  version of R; see http://bioconductor.org/install
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.6 (BiocInstaller 1.28.0), R 3.4.3 (2017-11-30).
Installing package(s) ‘ShortRead’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Old packages: 'cluster', 'curl', 'foreign', 'MASS', 'Matrix', 'nlme', 'pbdZMQ',
  'RcppArmadillo', 'repr', 'robustbase', 'survival'


In [44]:
# list the fastq files already downloaded
list.files('/home/jovyan/work/demo/fastq')

In [46]:
# perform quality assessment on the downloaded fastq files
qaSummary <- qa('/home/jovyan/work/demo/fastq')
qaSummary

class: FastqQA(10)
QA elements (access with qa[["elt"]]):
  readCounts: data.frame(2 3)
  baseCalls: data.frame(2 5)
  readQualityScore: data.frame(1024 4)
  baseQuality: data.frame(190 3)
  alignQuality: data.frame(2 3)
  frequentSequences: data.frame(100 4)
  sequenceDistribution: data.frame(93 4)
  perCycle: list(2)
    baseCall: data.frame(1009 4)
    quality: data.frame(7338 5)
  perTile: list(2)
    readCounts: data.frame(0 4)
    medianReadQualityScore: data.frame(0 4)
  adapterContamination: data.frame(2 1)

In [48]:
# generate the assessment report
report(qaSummary, dest="/home/jovyan/work/demo")
getwd()

“'/home/jovyan/work/demo' already exists”

#### The assessment report is generated as an index.html file in the demo directory. You can click on the following link to view it in the browser.

[View FastQC Report](./demo/index.html)
