<a href="https://colab.research.google.com/github/djgarayb/RNA-Seq_introduction/blob/master/S5_DGE_Edited.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Link your G drive

In [0]:
from google.colab import drive
drive.mount('/content/drive')

## Install packages

In [0]:
!Rscript -e "options(Ncpus = 2)" -e "install.packages('devtools')" -e "install.packages('BiocManager')" -e "BiocManager::install(c('edgeR','limma','tximport','biomaRt', 'dplyr', 'tidyverse','ensembldb','EnsDb.Hsapiens.v86','rhdf5','genefilter'))" -e "devtools::install_github('pachterlab/sleuth')"

In [0]:
!Rscript -e "options(Ncpus = 2)" -e "install.packages('RVenn')"

In [0]:
!Rscript -e "install.packages('gplots')"


In [0]:
from rpy2.robjects.packages import importr
utils = importr('utils')
%load_ext rpy2.ipython

# Set WD

In [0]:
%%R
setwd("/content/drive/My Drive/kalisto_results")

In [0]:
%ls

# Load packages

In [0]:
%%R
# Load all the R libraries we will be using in the notebook
library(tximport)
library(biomaRt)
library(Biobase)
library(ggplot2)
library(dplyr)
library(tidyverse) 
library(Biostrings)
library(ensembldb)
library(EnsDb.Hsapiens.v86) 
library(rhdf5)
library(genefilter)
library(RColorBrewer) 
library(reshape2)
library(edgeR)
library(matrixStats) 
library(sleuth)
library(RVenn)
library(gplots)

# Sleuth

In [0]:
%ls

## Load or create metadata


### Load your ready metadata 

In [0]:
%%R
metadata <- read.csv ('Metadata_susp.csv', header=TRUE)
#metadata <- as.data.frame(metadata)
metadata

### Create a new one

In [0]:
%%R
metadata <- matrix(c("sample1","SRR6914400","group1","donorA","sample2","SRR6914401",'group1',"donorB","sample3","SRR6914402",'group2',"donorA","sample4","SRR6914403",'group2',"donorB"),ncol=4,byrow=TRUE)
colnames(metadata) <- c("sample","folder",'group','donor')
metadata <- as.data.frame(metadata)
metadata

# Read your saved Sleuth object

In [0]:
%%R
so_GS=readRDS('SleO_Susp_paper.rds')

# Optional: Create a new Sleuth Object (if you need it)

### Get TTG file

In [0]:
%%R
mart <- biomaRt::useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
ttg <- biomaRt::getBM(
  attributes = c("ensembl_transcript_id_version", "transcript_version",
                 "ensembl_gene_id", "external_gene_name", "description",
                 "transcript_biotype"),  mart = mart)
ttg <- dplyr::rename(ttg, target_id = ensembl_transcript_id_version,
                     ens_gene = ensembl_gene_id, ext_gene = external_gene_name)
ttg <- dplyr::select(ttg, c('target_id', 'ens_gene', 'ext_gene'))
head(ttg)

### Generate a Sleuth Object 

In [0]:
%%R
metadata <- dplyr::mutate(metadata,
                          path = file.path(metadata$folder, "abundance.h5"))

In [0]:
%%R
head(metadata)

In [0]:
%%R
so_GS <- sleuth_prep(metadata, target_mapping = ttg,gene_mode=TRUE,
                   aggregation_column = 'ext_gene', extra_bootstrap_summary = TRUE,read_bootstrap_tpm = TRUE)

In [0]:
%%R
plot_pca(so_GS, color_by = 'group', units='scaled_reads_per_base')
new_position_theme <- theme(legend.position = c(0.10, 0.90))
plot_pca(so_GS, color_by = 'group', text_labels = TRUE,units="scaled_reads_per_base") +
  new_position_theme

In [0]:
%%R
plot_loadings(so_GS, pc_input = 1, units='scaled_reads_per_base')

In [0]:
%%R
plot_bootstrap(so_GS, "FTH1", units = "scaled_reads_per_base", color_by = "group")

In [0]:
%%R
saveRDS(so_GS,"So_GS.rds")

# **Differential gene expression**

Create a model for statistical testing,In this case will be a paired design (~donor). For a group comparison change to ~1.

In [0]:
%%R
so_GS <- sleuth_fit(so_GS, ~donor, 'reduced')
so_GS<- sleuth_fit(so_GS, ~donor + group, 'full')
so_GS<- sleuth_lrt(so_GS, "reduced",'full')
models(so_GS)

**1.** **Likelihood ratio test**

The likelihood ratio test (LRT) is a statistical test of the goodness-of-fit between two models. A relatively more complex model is compared to a simpler model to see if it fits a particular dataset significantly better. If so, the additional parameters of the more complex model are often used in subsequent analyses.

In [0]:
%%R
full_results_GS_lrt <- sleuth_results(so_GS, 'reduced:full', "lrt",show_all = FALSE)

In [0]:
%%R
sleuth_significant_GS_lrt <- dplyr::filter(full_results_GS_lrt, pval <= 0.05)

In [0]:
%%R
dim(sleuth_significant_GS_lrt)

In [0]:
%%R
head(sleuth_significant_GS_lrt)

RSS = the residual sum of squares under the "null model"

**2.** **Wald test**

The Wald test (also called the Wald Chi-Squared Test) is a way to find out if explanatory variables in a model are significant. “Significant” means that they add something to the model; variables that add nothing can be deleted without affecting the model in any meaningful way

In [0]:
%%R
so_GS <- sleuth_wt(so_GS, which_beta = 'groupuninf',which_model = "full")
full_results_GS_wt <- sleuth_results(so_GS,"groupuninf",test_type = "wt", which_model = "full",show_all = FALSE, pval_aggregate = F)

sleuth_significant_GS_wt <- dplyr::filter(full_results_GS_wt, pval <= 0.05, abs(b)>0.5)
head(sleuth_significant_GS_wt, 20)

In [0]:
%%R
dim(sleuth_significant_GS_wt)

In [0]:
%%R
write.table(sleuth_significant_GS_wt,file="DEG_Suspension_paper.txt", sep="\t", quote=F)

In [0]:
%%R
plot_bootstrap(so_GS, "APLNR", units = "scaled_reads_per_base", color_by = "group")

# Save final Sleuth Object with statistics

In [0]:
%%R
saveRDS(so_GS,"so_GS_stat_test.rds")

# **Venn diagrams**

### WT LRT communalities

In [0]:
%%R
set1 <- sleuth_significant_GS_wt$target_id
set2 <- sleuth_significant_GS_lrt$target_id
EN = list(WT=set1, LRT=set2)
EN = Venn(EN)

In [0]:
%%R
ggvenn(EN)

## **Heatmaps**

In [0]:
%%R
mypalette <- brewer.pal(11,"RdYlBu")
morecols <- colorRampPalette(mypalette)
Colors<-brewer.pal(3,"Set2")
#colors<-brewer.pal(11,"Paired")

In [0]:
%%R
sleuth_matrix <- sleuth_to_matrix(so_GS, 'obs_norm', 'tpm')
LogData <- log2(sleuth_matrix+1)
var_genes <- apply(LogData, 1, var)
head(var_genes)
select_var <- names(sort(var_genes, decreasing=TRUE))[1:100]
head(select_var)
highly_variable <- LogData[select_var,]


col.cell <- brewer.pal(9,"Set1")[metadata$group]
col.cell
dim(highly_variable)

In [0]:
%%R
heatmap.2(highly_variable,col=rev(morecols(50)),trace="none", main="Top 100 most variable genes across samples",ColSideColors=col.cell,scale="row",hclustfun = hclust )