In [1]:
library(splatter)
library(Seurat)
library(conos)
library(pagoda2)

Loading required package: SingleCellExperiment
Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
    colnames, colSums, dirname, do.call, duplicated, eval, evalq,
    Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
    lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
    

Run Seurat 3/Conos integration benchmarking just like the other R methods in the previous benchmarking notebook.

In [2]:
seurat3_time = c()
conos_time = c()
options(future.globals.maxSize = 4096 * 1024^2)

for (i in 10:14)
{
    print(i)
    
    #prepare splatter data
    params = newSplatParams()
    params = setParam(params, "nGenes", 5000)
    params = setParam(params, "de.prob", 1)
    params = setParam(params, "batchCells", c(2^i,2^i))
    params = setParam(params, "group.prob", c(0.5,0.5))
    sim = splatSimulate(params, method="groups", verbose=FALSE)
    
    #Seurat/pagoda prep
    srat = CreateSeuratObject(counts(sim))
    srat@meta.data[,'Batch'] = colData(sim)[,'Batch']
    srat.list = SplitObject(srat, split.by='Batch')
    pagoda.list = list()
    for (j in 1:length(srat.list))
    {
        srat.list[[j]] = NormalizeData(srat.list[[j]], verbose=FALSE)
        VariableFeatures(srat.list[[j]]) = rownames(srat@assays$RNA)
        pagoda.list[[j]] = srat.list[[j]]@assays$RNA@counts
    }
    panel.preprocessed <- lapply(pagoda.list, basicP2proc, n.cores=4, min.cells.per.gene=0, n.odgenes=5e3, get.largevis=FALSE, make.geneknn=FALSE)
    reference.list = srat.list[c('Batch1','Batch2')]
    names(panel.preprocessed) = c('Batch1','Batch2')
    con <- Conos$new(panel.preprocessed, n.cores=8)
    
    #actually running Seurat 3 integration
    t1 = Sys.time()
    srat.anchors = FindIntegrationAnchors(object.list = reference.list, dims = 1:20, verbose=FALSE)
    srat.integrated <- IntegrateData(anchorset = srat.anchors, dims = 1:20, verbose=FALSE)
    t2 = Sys.time()
    seurat3_time = c(seurat3_time, as.numeric(difftime(t2,t1,units='secs')))
    print(tail(seurat3_time,n=1))
    
    #actually running Conos
    t1 = Sys.time()
    con$buildGraph()
    con$embedGraph()
    t2 = Sys.time()
    conos_time = c(conos_time, as.numeric(difftime(t2,t1,units='secs')))
    print(tail(conos_time,n=1))
}
#write the run times out
fid = file('benchmark-times/seurat3.txt')
writeLines(as.character(seurat3_time),fid)
close(fid)
fid = file('benchmark-times/conos.txt')
writeLines(as.character(conos_time),fid)
close(fid)

[1] 10
1024 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 600 overdispersed genes ... 600 persisting ... done.
running PCA using 5000 OD genes .... done
running tSNE using 4 cores:
1024 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 614 overdispersed genes ... 614 persisting ... done.
running PCA using 5000 OD genes .... done
running tSNE using 4 cores:
[1] 25.53425
found 0 out of 1 cached CPCA  space pairs ... running 1 additional CPCA  space pairs . done
inter-sample links using  mNN  . done
local pairs local pairs  done
building graph ..done
Estimating embeddings.
[1] 22.90113
[1] 11
2048 cells, 5000 genes; normalizing ... using plain model winsorizing ... log scale ... done.
calculating variance fit ... using gam 829 overdispersed genes ... 829 persisting ... done.
running PCA using 5000 OD genes .... done
running tSNE us