Skip to content
Sequence Generation for Differential Expression Analysis and Beyond
Branch: master
Clone or download
Latest commit 3adfc79 Sep 9, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
R update documentation Sep 9, 2019
inst add bioRxiv citation info Sep 5, 2019
vignettes update documentation Sep 9, 2019
.gitignore update version, massive edits to documentation, run lintr Jun 17, 2019
.travis.yml add a vignette May 23, 2019 adds a lot of usethis stuff Mar 5, 2019
DESCRIPTION adds a lot of usethis stuff Mar 5, 2019
NAMESPACE update documentation Sep 9, 2019
README.Rmd add bioRxiv citation info Sep 5, 2019 add bioRxiv citation info Sep 5, 2019
_pkgdown.yml adds a lot of usethis stuff Mar 5, 2019
appveyor.yml add airway to appveyor install Jul 8, 2019
codecov.yml adds travis, appveyor, and codecov support Jan 31, 2017

RNA-Seq Generation/Modification for Simulation

Travis-CI Build Status AppVeyor Build Status Coverage Status License: GPL v3 Lifecycle: stable CRAN status

This package will take real RNA-seq data (either single-cell or bulk) and alter it by adding signal to it. This signal is in the form of a generalized linear model with a log (base-2) link function under a Poisson / negative binomial / mixture of negative binomials distribution. The advantage of this way of simulating data is that you can see how your method behaves when the simulated data exhibit common (and annoying) features of real data. This is without you having to specify these features a priori. We call the way we add signal “binomial thinning”.

The main functions are:

  • select_counts: Subsample the columns and rows of a real RNA-seq count matrix. You would then feed this sub-matrix into one of the thinning functions below.
  • thin_diff: The function most users should be using for general-purpose binomial thinning. For the special applications of the two-group model or library/gene thinning, see the functions listed below.
  • thin_2group: The specific application of thinning in the two-group model.
  • thin_lib: The specific application of library size thinning.
  • thin_gene: The specific application of total gene expression thinning.
  • thin_all: The specific application of thinning all counts.
  • effective_cor: Returns an estimate of the actual correlation between the surrogate variables and a user-specified design matrix.
  • ThinDataToSummarizedExperiment: Converts a ThinData object to a SummarizedExperiment object.
  • ThinDataToDESeqDataSet: Converts a ThinData object to a DESeqDataSet object.

If you find a bug or want a new feature, please submit an issue.

Check out NEWS for updates.


To install from CRAN, run the following code in R:


To install the latest version of seqgendiff, run the following code in R:


To get started, check out the vignettes by running the following in R:

browseVignettes(package = "seqgendiff")

Or you can check out the vignettes I post online:


If you use this package, please cite:

Gerard D (2019). “Data-based RNA-seq Simulations by Binomial Thinning.” bioRxiv. doi: 10.1101/758524.

A BibTeX entry for LaTeX users is

    author = {Gerard, David},
    title = {Data-based {RNA}-seq Simulations by Binomial Thinning},
    elocation-id = {758524},
    year = {2019},
    doi = {10.1101/758524},
    publisher = {Cold Spring Harbor Laboratory},
    journal = {bioRxiv}

Code of Conduct

Please note that the ‘seqgendiff’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

You can’t perform that action at this time.