# Normalise gene expression data and run two samples test

Nuha BinTayyash, 2020

This notebook shows how to run [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html)  R package to normalize ScRNA-seq gene expression data for highly expressed genes in Islet $\alpha$ cell from [GSE8737 single cell RNA-seq ](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE87375) dataset.

#### load ScRNA-seq gene expression data for highly expressed genes in Islet $\alpha$ cell from [GSE8737 single cell RNA-seq ](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE87375) dataset. and normalize it using DESeq2

In [1]:
counts <- read.csv(file = 'GSE87375_Single_Cell_RNA-seq_Gene_Read_Count.csv',row.names = 1, header = TRUE)
dim(counts)
head(counts)

Unnamed: 0,Symbol,GeneLength,bE17.5_1_01,bE17.5_1_02,bE17.5_1_03,bE17.5_1_04,bE17.5_1_05,bE17.5_2_01,bE17.5_2_02,bE17.5_2_03,...,aE17.5_2_22,aE17.5_2_23,aE17.5_4_07,aE17.5_4_08,aP0_2_12,aP0_2_13,aP0_2_14,aP0_3_15,aP0_3_16,aP18_2_14
ENSMUSG00000000001,Gnai3,3262,316,410,186,364,439,60,358,285,...,128,297,320,263,91,151,252,138,358,43
ENSMUSG00000000003,Pbsn,902,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
ENSMUSG00000000028,Cdc45,2143,0,0,0,0,0,0,0,0,...,0,0,33,30,0,0,0,0,0,0
ENSMUSG00000000031,H19,2286,0,0,0,0,0,0,1,0,...,95,0,0,0,0,0,0,0,0,0
ENSMUSG00000000037,Scml2,4847,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
ENSMUSG00000000049,Apoh,1190,0,0,0,0,0,0,0,0,...,0,0,0,8,0,0,0,0,0,0


In [2]:
alpha_col_data <- read.csv(file = 'alpha_time_points.csv',row.names = 1, header = TRUE)
head(alpha_col_data)

Unnamed: 0,pseudotime
aE17.5_2_09,0.0
aE17.5_2_16,0.005363444
aE17.5_1_11,0.022454001
aE17.5_3_07,0.022891405
aE17.5_4_06,0.030221853
aE17.5_3_04,0.037523365


Filter $\alpha$ cells and genes 

In [3]:
alpha_counts <- counts[ , grepl( "a" , names( counts ) ) ]
alpha_counts <- alpha_counts[rownames(alpha_col_data)]
keep <- rowMeans(alpha_counts) >1
alpha_counts <- alpha_counts[keep,]
dim(alpha_counts)
write.csv(alpha_counts, file = "alpha_read_counts.csv")

Normalize data using DESeq2 and run one sample test

In [4]:
library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = alpha_counts,
                              colData = alpha_col_data,
                              design = ~pseudotime)
dds <- estimateSizeFactors(dds)
normalized_alpha_counts <-counts(dds, normalized=TRUE)
write.csv(normalized_alpha_counts, file = "normalized_alpha_counts.csv")

Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min


Attaching package: ‘S4Vectors’

The followin

In [5]:
dds <- DESeq(dds, test="LRT", reduced = ~ 1)
res <- results(dds)
write.csv(as.data.frame(resTC),file="alpha_DESeq2.csv")

using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
