# Plotting challenge assignment
This will be a challenge assignment for plotting several different facets of RNA-seq data. 
The outline here will still have some of the framework for successfully generating your output plots, but will have several fatal flaws. 
Hint- one flaw will be the install of various modules/libraries and using R within Jupyter.
- Plotly in Jupyter, and making it render. Plotly install guides will help. 
- R in Jupyter (rpy2, IRKernel)
- Anaconda will help make this easier as well

You also will have to utilize your data wrangling skills to parse the appropriate files together as input files. 

## Volcano plot
Volcano plots are a staple for RNA-seq workflows. They show the spread of differentially expressed genes in terms of fold change (or log fold change) as well as the p-value. 

Use R within Jupyter to plot out the log2FoldChange vs pvalue for all genes in the comparisons. There should be one volcano plot per sample that is compared to normals. 

In [None]:
import rpy2.robjects as robjects

%load_ext rpy2.ipython

In [None]:
%%R
res <- read.csv("data.csv", header=TRUE, row.names=1)#Youll need to add in some data for this to work!

# Make a basic volcano plot
with(res, plot(log2FoldChange, -log10(pvalue), pch=20, 
               main="Example Volcano", 
               xlim=c(-10,10), 
               ylim=c(0,100),cex=0.5))#change xlim or ylim to frame it out. cex is dot size  

# Add colored points: red if padj<0.05, orange of log2FC>1, green if both)
with(subset(res, padj<.05 ), points(log2FoldChange, -log10(pvalue), pch=20, col="red",cex=0.5))
with(subset(res, abs(log2FoldChange)>1), points(log2FoldChange, -log10(pvalue), pch=20, col="orange",cex=0.5))
with(subset(res, padj<.05 & abs(log2FoldChange)>1), points(log2FoldChange, -log10(pvalue), pch=20, col="green",cex=0.5))

## Heatmaps
Heatmaps are another essential facet to RNA-seq analysis. This allows you to look at either the raw hit count data in comparisons or again at the log fold change. This also can enable you to focus in on specific genes for local comparisons. 

### Heatmap based on hit-counts
For the first heatmap, use R within Jupyter again. Do a global heatmap showing all of the genes based on hitcounts. 
The best way would be to use the heatmap.2

In [None]:
%%R
library(pheatmap);library(gplots);library(readr);library(ggplot2)
data <- read.csv('data.csv', header=T, row.names=1)#make sure this data is just Gene in the first column followed by the columns of hitcounts
heatmap.2(as.matrix(data), 
          scale=c('row'), 
          trace='none', 
          col='bluered',
          Colv=FALSE,
          Rowv=FALSE,
          cexRow=.75)

### Heatmap based on log2FoldChange and filtered
Now to mix it up again- lets do this only in Python. But not just any python- lets make this an interactive plot using the Plotly module. This will take some special steps to install and have work functionally. 

For this one, since I am providing a full function for plotting, please annotate each step as to what it does using a comment after each blank comment line in the markdown space provided after. I've provided number one as an example.

In [None]:
import csv
import plotly.graph_objects as go

In [None]:
def heatmapper(csvlist, genelist):
    names=[]
    for sheet in csvlist:
        name=str(sheet).split('ID')[1].rstrip('.csv') #1
        names.append(name) 
    #List of gene log2fc values in order of x list per gene, in gene list order     
    bigz=[]

    #List of genes
    bigy=[]

    #Get the information from each sheet and parse it up
    for gene in genelist:
        lilz=[]
        for sheet in csvlist:
            with open(sheet) as chart:
                reader=csv.DictReader(chart, delimiter=',') #2
                for row in reader:
                    if row['GeneID'].strip("[]'")==gene:
                        lilz.append(row['log2FoldChange'])
            chart.close()
        if len(lilz)>(len(csvlist)-1): #3
            bigz.append(lilz) #4
            bigy.append(str(gene)) #5
        else:
            genelist.remove(gene)
            print(str(gene)+' had an error and was removed')
        
    fig=go.FigureWidget(
        data=[
            dict(
            type='heatmap',
            z=bigz, #6
            x=names, #7
            y=bigy, #8
            colorscale='bluered',
            zmid=0, #9
            colorbar=dict(
                title='Log2FC',
                titleside='top',
                ),
            )
        ]
    )
    fig.update_layout(
        title='Log2FC heatmap',
        xaxis_title='Samples',
        yaxis_title='Genes',
        )

    return fig #10

In [None]:
#How about you make a quick function to generate the python list "genelist" as a input for the heatmap. 
#Pull the top 5 significant genes per sample. 
#This can be done manually if you get stuck. 

In [None]:
heatmap=heatmapper(['data1.csv','data2.csv','data3.csv'],genelist)
#Need to then call the heatmap to make it appear- FYI

#### Markdown output for the individual comments
#1 : Since each input csv file has the naming convention "diff-exprID{sample_name}.csv" from my pipeline, I split the string of the input csv file by the ID string, then capture just the part after ID and remove the .csv. This captures just the sample name to parse into a list for later labeling on the heatmap itself. 
#2
#3
#4
#5
#6
#7
#8
#9
#10

## Venn Diagram with matplotlib
Venn diagrams are also a very useful tool for comparing different samples and visualing what is unique or common between them. 
If you have already completed the common/unique notebook, this should be much easier to do, and so it doesnt have as much guidance as the others. 

And to mix it up were using a 4th method- Python using matplotlib. 

In [None]:
import matplotlib.pyplot as plt
from matplotlib_venn import venn3

# Assignment
Generate each of these figures inline and produce this document as a both HTML and PDF. 
Overall this assignment will test several facets we have been reviewing so far. 