<div style="padding-bottom:30px">
<a href="https://github.com/cwbeitel/inquiry"><img src="https://raw.githubusercontent.com/cwbeitel/iqassets/master/logotype_blue_small.png" style="width:100px; margin-left:0px"></img></a>
<p style="color:#9E9E9E">
<a href="https://github.com/cwbeitel/inquiry/tree/master/docs">Getting Started Guide</a> // <a href="https://goo.gl/forms/2cOmuUrQ3n3CKpim1">Documentation Feedback</a></p>
</div>

<h1 style="color:#9E9E9E">Gene expression analysis</h1>

In this analysis we perform differential expression analysis with the Cufflinks toolset which includes [cufflinks](https://cole-trapnell-lab.github.io/cufflinks/), [tophat](https://ccb.jhu.edu/software/tophat/), and [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml). You can brush up on gene expression profiling [here](https://en.wikipedia.org/wiki/Gene_expression_profiling).

To review, here we'll be performing our analyses using the Inquiry analysis toolkit which helps automate and abstract portions of the process. Read more about the toolkit and the project in general in the [Inquiry Toolkit Overview](https://medium.com/projectinquiry/project-overview-f2b9348aef9d).

<h2 style="color:#9E9E9E">Configuration and Run</h2>

The first thing we need to do is parameterize our analysis. See the [Getting Started Guide]() for a review of the different ways workflows can be parameterized and run. We'll use the following configuration:

```json
{
  "dry_run": true,
  "_meta": {
    "workflow": "core:expression"
  },
  "ref_fasta": "gs://cflow-public/data/genomes/Drosophila_melanogaster/Ensembl/BDGP5.25/Sequence/BowtieIndex/genome.fa",
  "genes_gtf": "gs://cflow-public/data/genomes/Drosophila_melanogaster/Ensembl/BDGP5.25/Annotation/Archives/archive-2015-07-17-14-30-26/Genes/genes.gtf",
  "cond_a_pairs": [
      ["gs://cflow-public/data/rnaseq/downsampled_reads/GSM794483_C1_R1_1_small.fq",
       "gs://cflow-public/data/rnaseq/downsampled_reads/GSM794483_C1_R1_2_small.fq"],
      ["gs://cflow-public/data/rnaseq/downsampled_reads/GSM794484_C1_R2_1_small.fq",
       "gs://cflow-public/data/rnaseq/downsampled_reads/GSM794484_C1_R2_2_small.fq"],
      ["gs://cflow-public/data/rnaseq/downsampled_reads/GSM794485_C1_R3_1_small.fq",
       "gs://cflow-public/data/rnaseq/downsampled_reads/GSM794485_C1_R3_2_small.fq"]
      ],
  "cond_b_pairs": [
       ["gs://cflow-public/data/rnaseq/downsampled_reads/GSM794486_C2_R1_1_small.fq",
        "gs://cflow-public/data/rnaseq/downsampled_reads/GSM794486_C2_R1_2_small.fq"],
       ["gs://cflow-public/data/rnaseq/downsampled_reads/GSM794487_C2_R2_1_small.fq",
        "gs://cflow-public/data/rnaseq/downsampled_reads/GSM794487_C2_R2_2_small.fq"],
       ["gs://cflow-public/data/rnaseq/downsampled_reads/GSM794488_C2_R3_1_small.fq",
        "gs://cflow-public/data/rnaseq/downsampled_readsGSM794488_C2_R3_2_small.fq"]
       ]
}
```

With this configuration saved to a file on our local filesystem we can submit a gene expression analysis run using the following command:

In [None]:
%%bash
iqtk run expression config.yaml

<h2 style="color:#9E9E9E">Exploring the data</h2>

We'll start by obtaining the table of the differentially expressed genes from cloud storage and loading it.

In [None]:
%%bash
gsutil cp gs://cflow-runs/output/qexpression-20170410074710/cuffdiff-20170410161930/gene_exp.diff ./

In [2]:
import pandas
t = pandas.read_table('gene_exp.diff')

In [3]:
t

Unnamed: 0,test_id,gene_id,gene,locus,sample_1,sample_2,status,value_1,value_2,log2(fold_change),test_stat,p_value,q_value,significant
0,XLOC_000001,XLOC_000001,CG11023,2L:7528-9484,C1,C2,OK,0.946358,0.946358,0.000000e+00,0.000000e+00,1.00000,1,no
1,XLOC_000002,XLOC_000002,dbr,2L:67043-71390,C1,C2,OK,7.169140,7.169140,0.000000e+00,0.000000e+00,1.00000,1,no
2,XLOC_000003,XLOC_000003,galectin,2L:72387-76211,C1,C2,OK,49.240000,49.240000,0.000000e+00,0.000000e+00,1.00000,1,no
3,XLOC_000004,XLOC_000004,CG11374,2L:76445-77639,C1,C2,NOTEST,0.302101,0.302101,0.000000e+00,0.000000e+00,1.00000,1,no
4,XLOC_000005,XLOC_000005,CG11376,2L:94751-102086,C1,C2,OK,3.410510,3.410510,0.000000e+00,-1.338090e-15,1.00000,1,no
5,XLOC_000006,XLOC_000006,CG11377,2L:102381-106718,C1,C2,OK,31.938800,31.938800,0.000000e+00,3.620090e-15,1.00000,1,no
6,XLOC_000007,XLOC_000007,M(2)21AB,2L:106902-114433,C1,C2,OK,151.149000,151.149000,0.000000e+00,1.027710e-14,1.00000,1,no
7,XLOC_000008,XLOC_000008,Gs1,2L:132059-134472,C1,C2,OK,22.949400,22.949400,0.000000e+00,0.000000e+00,1.00000,1,no
8,XLOC_000009,XLOC_000009,CG11454,2L:143308-144227,C1,C2,OK,33.619100,33.619100,-8.881780e-16,0.000000e+00,0.99035,1,no
9,XLOC_000010,XLOC_000010,"CG11455,CG3436",2L:155332-157666,C1,C2,OK,224.723000,224.723000,0.000000e+00,0.000000e+00,1.00000,1,no


<h3 style="color:#9E9E9E">References</h3>

1. Trapnell, Cole, et al. "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks." Nature protocols 7.3 (2012): 562-578.
2. Trapnell, Cole, Lior Pachter, and Steven L. Salzberg. "TopHat: discovering splice junctions with RNA-Seq." Bioinformatics 25.9 (2009): 1105-1111.
3. Langmead, Ben, and Steven L. Salzberg. "Fast gapped-read alignment with Bowtie 2." Nature methods 9.4 (2012): 357-359.

<h3 style="color:#9E9E9E">Contact</h3>

Want to get in touch? You can [provide feedback](https://goo.gl/forms/2cOmuUrQ3n3CKpim1) regarding this or other documentation,
[reach out to us](https://goo.gl/forms/j8FWdNJqABAoJvcW2) regarding collaboration, or [request a new feature or analytical capability](https://goo.gl/forms/dQm3SDcoNZsV7AAd2). We're looking forward to hearing from you!

<div style="padding-top: 30px">
<p style="color:#9E9E9E; text-align:center">This notebook was prepared for <a href="https://github.com/cwbeitel/inquiry">Project Inquiry</a> in support of the research mission of the Joint BioEnergy Institute (JBEI). Learn more at https://www.jbei.org/.</p>
<p style="color:#9E9E9E; text-align:center">The Joint BioEnergy Institute is a program of the U.S. Department of Energy Office of Science.</p>
<p style="color:#9E9E9E; text-align:center">© Regents of the University of California, 2017. Licensed under a BSD-3 <a href="https://github.com/cwbeitel/inquiry/blob/master/LICENSE">license</a>.</p>
<img src="https://raw.githubusercontent.com/cwbeitel/iqassets/master/logotype_blue_small.png" style="width:100px"></img>
</div>