# Differential Expression of RNA-Seq data in GenePattern Notebook

Compute differentially expressed genes or transcripts and visualize the results

## Before you begin

* Sign in to GenePattern by entering your username and password into the form below. 
* Gene expression data must be in a [GCT or RES file](https://genepattern.broadinstitute.org/gp/pages/protocols/GctResFiles.html).
    * Example file: [all_aml_test.gct](https://software.broadinstitute.org/cancer/software/genepattern/data/all_aml/all_aml_test.gct).
* The class of each sample must be identified in a [CLS file](https://genepattern.broadinstitute.org/gp/pages/protocols/ClsFiles.html).
    * Example file: [all_aml_test.cls](https://software.broadinstitute.org/cancer/software/genepattern/data/all_aml/all_aml_test.cls).
* Learn more by reading about [file formats](http://www.broadinstitute.org/cancer/software/genepattern/file-formats-guide#GCT).


In [3]:
# Requires GenePattern Notebook: pip install genepattern-notebook
import gp
import genepattern

# Username and password removed for security reasons.
genepattern.GPAuthWidget(genepattern.register_session("https://genepattern.broadinstitute.org/gp", "", ""))

## BMS Workshop data and differential expression
Because you have already used DESeq2 to compute differentially expressed genes, it is only necessary to visualize the differentially expressed genes in this notebook.

We will skip the steps of PreprocessReadCounts and DESeq2. However, when you wish to perform differential expression analysis of RNA-seq data, this notebook presents a complete workflow starting with a file of merged read counts.

<div class="alert alert-info">
**To continue, go to [Visualizing Differential Expression Results](#visualizing)**

## Merge read counts from RNA-seq quantitation methods
- RNA-seq quantitation methods create tab-delimited files consisting of transcript identifiers, e.g., ENSG00000000003, and their read counts.
- The MergeHTSeqCounts module combines these files into a single [GCT](http://software.broadinstitute.org/cancer/software/genepattern/file-formats-guide#GCT) format for later analysis in GenePattern.

In [18]:
mergehtseqcounts_task = gp.GPTask(genepattern.get_session(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00354')
mergehtseqcounts_job_spec = mergehtseqcounts_task.make_job_spec()
mergehtseqcounts_job_spec.set_parameter("input.files", "")
mergehtseqcounts_job_spec.set_parameter("output.prefix", "<input.files_basename>")
genepattern.GPTaskWidget(mergehtseqcounts_task)

## Compute differentially expressed transcripts
- Run DESeq2 on the gene expression matrix derived in the above step

In [16]:
deseq2_task = gp.GPTask(genepattern.get_session(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00362')
deseq2_job_spec = deseq2_task.make_job_spec()
deseq2_job_spec.set_parameter("input.file", "")
deseq2_job_spec.set_parameter("cls.file", "")
deseq2_job_spec.set_parameter("confounding.variable.cls.file", "")
deseq2_job_spec.set_parameter("output.file.base", "<input.file_basename>")
deseq2_job_spec.set_parameter("qc.plot.format", "skip")
deseq2_job_spec.set_parameter("fdr.threshold", "0.1")
deseq2_job_spec.set_parameter("top.N.count", "20")
deseq2_job_spec.set_parameter("random.seed", "779948241")
genepattern.GPTaskWidget(deseq2_task)

<a id="visualizing"></a>

# Visualizing Differential Expression Results

The ComparativeMarkerSelectionViewer allows you to view the results of a differential expression analysis as a heatmap, profile of differentially expressed genes, histogram, or list. It also includes features that allow you to filter results, zoom in and out of a section of the gene list, and export results in a number of formats.

Run the ComparativeMarkerSelectionViewer module to view the results. The viewer displays the test statistic score, its p value, and additional statistics as computed by the differential expression method.

* Learn more by reading about the [ComparativeMarkerSelectionViewer](https://genepattern.broadinstitute.org/gp/getTaskDoc.jsp?name=ComparativeMarkerSelectionViewer) module.

<div class="alert alert-info">
<h3>Instructions</h3>
<p>The ComparativeMarkerSelectionViewer requires two files:
<ul>
<li><b>comparative marker selection filename</b> - this is the results file from your differential expression analysis. We have provided the DESeq2 output in the required ODF format: [differential_expression_genes.odf](https://www.broadinstitute.org/personal/michaelr/BMS_bioinformatics_bootcamp_2017/differential_expression_genes.odf).</li>
<li><b>dataset filename</b> - this is the original dataset containing the expression values. We have provided the expression data in the required GCT format: [counts_for_deseq2_genes.gct](https://www.broadinstitute.org/personal/michaelr/BMS_bioinformatics_bootcamp_2017/counts_for_deseq2_genes.gct).
</ul>
</p>
<p>To launch the viewer: <br>
(NOTE: if clicking and dragging the links does not work, right-click the links and save them locally, and upload them by clicking **Upload File...** in the corresponding input box)
<ol>
<li>Click and drag the [differential_expression_genes.odf](https://www.broadinstitute.org/personal/michaelr/BMS_bioinformatics_bootcamp_2017/differential_expression_genes.odf) file to the <b>comparative marker selection filename</b> input box below.</li>
<li>Click and drag the [counts_for_deseq2_genes.gct](https://www.broadinstitute.org/personal/michaelr/BMS_bioinformatics_bootcamp_2017/counts_for_deseq2_genes.gct) file to the <b>dataset filename</b> input box below.</li>
<li>Click *Run* for the analysis below. Once the job downloads the necessary data it will display a visualization of the differential expression results.</li>
</ol>
</p>
</div>

In [4]:
comparativemarkerselectionviewer_task = gp.GPTask(genepattern.get_session(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.visualizer:00045')
comparativemarkerselectionviewer_job_spec = comparativemarkerselectionviewer_task.make_job_spec()
comparativemarkerselectionviewer_job_spec.set_parameter("comparative.marker.selection.filename", "")
comparativemarkerselectionviewer_job_spec.set_parameter("dataset.filename", "")
genepattern.GPTaskWidget(comparativemarkerselectionviewer_task)