# Using Splatter to simulate scRNA-seq data

**Authorship:**
Adam Klie, *02/25/2022*
***
**Description:**
Notebook to simulate scRNA-seq data with a desired list of parameters

<div class="alert alert-block alert-warning">
<b>TODOs</b>:
<ul>
    <b><li></li></b>
    <b><li></li></b>
    <b><li></li></b>
    <b><li></li></b>
    </ul>
</div>

## Set-up

In [2]:
# The classics
suppressMessages(library(splatter))

“package ‘Biobase’ was built under R version 4.1.2”


# Simulation framework

In [3]:
simulate <- function(nGroups=2, nGenes=200, batchCells=2000, dropout=3)
{
    if (nGroups > 1) method <- 'groups'
    else             method <- 'single'

    group.prob <- rep(1, nGroups) / nGroups
    
    # new splatter requires dropout.type
    if ('dropout.type' %in% slotNames(newSplatParams())) {
        if (dropout)
            dropout.type <- 'experiment'
        else
            dropout.type <- 'none'
        
        sim <- splatSimulate(group.prob=group.prob, nGenes=nGenes, batchCells=batchCells,
                             dropout.type=dropout.type, method=method,
                             seed=42, dropout.shape=-1, dropout.mid=dropout)

    } else {
        sim <- splatSimulate(group.prob=group.prob, nGenes=nGenes, batchCells=batchCells,
                             dropout.present=!dropout, method=method,
                             seed=42, dropout.shape=-1, dropout.mid=dropout)        
    }

    counts     <- as.data.frame(t(counts(sim)))
    truecounts <- as.data.frame(t(assays(sim)$TrueCounts))

    dropout    <- assays(sim)$Dropout
    #mode(dropout) <- 'integer'

    cellinfo   <- as.data.frame(colData(sim))
    geneinfo   <- as.data.frame(rowData(sim))

    list(counts=counts,
         cellinfo=cellinfo,
         geneinfo=geneinfo,
         truecounts=truecounts)
}

# Simulate two groups

In [15]:
sim <- simulate()

counts <- sim$counts
geneinfo <- sim$geneinfo
cellinfo <- sim$cellinfo
truecounts <- sim$truecounts

Getting parameters...

Creating simulation object...

Simulating library sizes...

Simulating gene means...

Simulating group DE...

Simulating cell means...

Simulating BCV...

Simulating counts...

Simulating dropout (if needed)...

Sparsifying assays...

Automatically converting to sparse matrices, threshold = 0.95

Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix

Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix

Skipping 'BCV': estimated sparse size 1.5 * dense matrix

Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix

Skipping 'TrueCounts': estimated sparse size 2.79 * dense matrix

Skipping 'DropProb': estimated sparse size 1.5 * dense matrix

Converting 'Dropout' to sparse matrix: estimated sparse size 0.7 * dense matrix

Skipping 'counts': estimated sparse size 1.95 * dense matrix

Done!



In [16]:
# Save as {group}.{numCells}.{numGenes}.{dropout}
write.table(counts, "simulated/counts.2.2000.200.3.tsv", sep = "\t")
write.table(truecounts, "simulated/truecounts.2.2000.200.3.tsv", sep = "\t")
write.table(geneinfo, "simulated/geneinfo.2.2000.200.3.tsv", sep = "\t")
write.table(cellinfo, "simulated/cellinfo.2.2000.200.3.tsv", sep = "\t")

# Simulate six groups

In [17]:
sim <- simulate(nGroups=6, dropout=1)

counts <- sim$counts
geneinfo <- sim$geneinfo
cellinfo <- sim$cellinfo
truecounts <- sim$truecounts

Getting parameters...

Creating simulation object...

Simulating library sizes...

Simulating gene means...

Simulating group DE...

Simulating cell means...

Simulating BCV...

Simulating counts...

Simulating dropout (if needed)...

Sparsifying assays...

Automatically converting to sparse matrices, threshold = 0.95

Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix

Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix

Skipping 'BCV': estimated sparse size 1.5 * dense matrix

Skipping 'CellMeans': estimated sparse size 1.5 * dense matrix

Skipping 'TrueCounts': estimated sparse size 2.79 * dense matrix

Skipping 'DropProb': estimated sparse size 1.5 * dense matrix

Converting 'Dropout' to sparse matrix: estimated sparse size 0.32 * dense matrix

Skipping 'counts': estimated sparse size 2.5 * dense matrix

Done!



In [18]:
# Save as {group}.{numCells}.{numGenes}.{dropout}
write.table(counts, "simulated/counts.6.2000.200.1.tsv", sep = "\t")
write.table(truecounts, "simulated/truecounts.6.2000.200.1.tsv", sep = "\t")
write.table(geneinfo, "simulated/geneinfo.6.2000.200.1.tsv", sep = "\t")
write.table(cellinfo, "simulated/cellinfo.6.2000.200.1.tsv", sep = "\t")

# Simulate a GRN

# References

1. [Splatter Github](https://github.com/Oshlack/splatter)