# Heatmap of differentially expressed genes in an RNA-seq dataset

This notebook follows closely to [this online tutorial](https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/rna-seq-viz-with-heatmap2/tutorial.html#create-heatmap-of-top-de-genes).  The datasets are available [here](https://zenodo.org/record/2529926#.YwaRPfHMIas) and should be downloaded into the ```data/heatmap``` directory (which you may have to create).


*Note:* this Jupyter notebook was converted to an R markdown file using the following command in R :

```nb_rmd = rmarkdown:::convert_ipynb('heatmap_r.ipynb')```

___
*Author : Aaron M Geller, Aug. 2022*

In [1]:
# import the necessary libraries
#library(dplyr)

library(ggplot2)

## 1. Read in the DE results data using ```pandas```.

In [2]:
DE_results <- read.table('data/heatmap/limma-voom_luminalpregnant-luminallactate.tsv', sep='\t', header=TRUE)
head(DE_results)

"EOF within quoted string"
"number of items read is not a multiple of the number of columns"


Unnamed: 0_level_0,ENTREZID,SYMBOL,GENENAME,logFC,AveExpr,t,P.Value,adj.P.Val
Unnamed: 0_level_1,<int>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,12992,Csn1s2b,casein alpha s2-like B,-8.603611,3.56295,-43.7965,3.83065e-15,6.053959e-11
2,13358,Slc25a1,"solute carrier family 25 (mitochondrial carrier, citrate transporter), member 1",-4.124175,5.7796989,-29.90785,1.758595e-13,1.389642e-09
3,11941,Atp2b2,"ATPase, Ca++ transporting, plasma membrane 2",-7.386986,1.2821431,-27.8195,4.836363e-13,2.4328e-09
4,20531,Slc34a2,"solute carrier family 34 (sodium phosphate), member 2",-4.177812,4.278629,-27.07272,6.157428e-13,2.4328e-09
5,100705,Acacb,acetyl-Coenzyme A carboxylase beta,-4.31432,4.4409137,-25.22357,1.499977e-12,4.741129e-09
6,13645,Egf,epidermal growth factor,-5.362664,0.7359047,-24.5993,2.116244e-12,5.574188e-09


## 2. Filter the results to only include the most significant genes.

### 2.1. Set a threshold for ```adj.P.Val < 0.01``` and ```abs(logFC) > 0.58```.

In [3]:
df <- DE_results[which(DE_results['adj.P.Val'] < 0.01 & abs(DE_results['logFC']) > 0.58),]
head(df)

Unnamed: 0_level_0,ENTREZID,SYMBOL,GENENAME,logFC,AveExpr,t,P.Value,adj.P.Val
Unnamed: 0_level_1,<int>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,12992,Csn1s2b,casein alpha s2-like B,-8.603611,3.56295,-43.7965,3.83065e-15,6.053959e-11
2,13358,Slc25a1,"solute carrier family 25 (mitochondrial carrier, citrate transporter), member 1",-4.124175,5.7796989,-29.90785,1.758595e-13,1.389642e-09
3,11941,Atp2b2,"ATPase, Ca++ transporting, plasma membrane 2",-7.386986,1.2821431,-27.8195,4.836363e-13,2.4328e-09
4,20531,Slc34a2,"solute carrier family 34 (sodium phosphate), member 2",-4.177812,4.278629,-27.07272,6.157428e-13,2.4328e-09
5,100705,Acacb,acetyl-Coenzyme A carboxylase beta,-4.31432,4.4409137,-25.22357,1.499977e-12,4.741129e-09
6,13645,Egf,epidermal growth factor,-5.362664,0.7359047,-24.5993,2.116244e-12,5.574188e-09
