cellhashR

An R package designed to demultiplex cell hashing data. Please see our documentation for more detail.

Cell hashing is a method that allows sample multiplexing or super-loading within single-cell RNA-seq platforms, such as 10x genomics, originally developed at New York Genome Center in collaboration with the Satija lab. See here for more detail on the technique. The general idea is that cells are labeled with a staining reagent (such as an antibody) tagged with a short nucleotide barcode. Other staining methods have been published, such as the lipid-based Multi-Seq (https://www.ncbi.nlm.nih.gov/pubmed/31209384). In all methods, the hashtag oligo/barcode is sequenced in parallel with cellular mRNA, creating a separate cell hashing library. After sequencing, the cell barcode and hashing index are parsed using tools like Cite-seq-Count (https://github.com/Hoohm/CITE-seq-Count), creating a count matrix with the total hash tag counts per cell.

Once the count matrix is created, an algorithm must be used to demultiplex cells and assign them to hash tags (i.e. sample). This is where cellhashR comes in. This package provides several functions:

Quality control reports for the cell hashing library, covering read counts and normalization. Think FASTQC, except for cell hashing data.
A single interface to run one or more demutiplexing algorithms, including the novel demultiplexing algorithms BFF_raw and BFF_cluster. Each algorithm has pros and cons, and will perform better or worse under certain conditions (though in our experience, of the algorithms we have tested, the BFF algorithms work most consistently and under the widest variety of conditions). If you select multiple algorithms (our default workflow), cellhashR will score cells using the consensus call from the set. Various QC summaries are produced during this process as well, if debugging is needed. In addition to the BFF demultiplexing algorithms, other algorithms that can be run from cellhashR include:
The workflow produces a unified table with the results of each caller and the consensus call. Final QC plots and summaries are created.

Each step of the workflow can either be run interactively in R (through the terminal or RStudio), or it can be executed as a pipeline that runs all commands and creates the call table and an HTML report.

Click here to view an example QC report

Example Usage

Below are the primary functions of cellhashR needed to QC and score hashing data:

# Example 1: parse CITE-seq-Count output, printing QC
barcodeData <- ProcessCountMatrix(rawCountData = 'myCountDir/umi_count', minCountPerCell = 5)

# Example 2: parse CITE-seq-Count output, providing a barcode whitelist. 
barcodeData <- ProcessCountMatrix(rawCountData = 'myCountDir/umi_count', minCountPerCell = 5, barcodeWhitelist = c('HTO-1', 'HTO-2', 'HTO-3', 'HTO-4', 'HTO-6'))

# Create QC plots of barcode normalization
PlotNormalizationQC(barcodeData)

# Generate the final cell hashing calls
calls <- GenerateCellHashingCalls(barcodeMatrix = barcodeData, methods = c('multiseq', 'htodemux'))

# Inspect negative cells:
SummarizeCellsByClassification(calls = calls, barcodeMatrix = barcodeData)

Or export/save a template RMarkdown file outlining the default workflow, which can be run interactively or headlessly as part of a pipeline:

GetExampleMarkdown(dest = 'cellhashR_template.rmd')

Finally, the workflow can be executed using this wrapper around the Rmarkdown, producing a TSV of calls and HTML QC report:

CallAndGenerateReport(rawCountData = 'myCountDir/umi_count', reportFile = 'report.html', callFile = 'calls.txt', barcodeWhitelist = c('HTO-1', 'HTO-2', 'HTO-3'), title = 'Cell Hashing For Experiment 1')

Installation

# Make sure to update your Rprofile to include Bioconductor repos, such as adding this line to ~/.Rprofile:
local({options(repos = BiocManager::repositories())})

#Latest version:
devtools::install_github(repo = 'bimberlab/cellhashR', ref = 'master', dependencies = TRUE, upgrade = 'always')

Pre-packaged Docker images with all needed dependencies installed can be found on our GitHub Packages page. We recommend using a specific release, which you can do using tags:

docker pull ghcr.io/bimberlab/cellhashr:latest

Known Issues

If you receive an error along the lines of:

"ERROR; return code from pthread_create() is 22\n"

Please manually install preprocessCore with threading disabled:

devtools::install_github('bmbolstad/preprocessCore', dependencies = T, upgrade = 'always', configure.args = '--disable-threading')

demuxEM

Unlike the other algorithms, which just require the HTO count matrix, demuxEM also requires the 10x h5 gene expression counts. This can be supplied as follows. This example runs BFF and demuxEM:

  rawData <- '../testdata/438-21-GEX/umi_count'
  h5File <- '../testdata/438-21-GEX/438-21-raw_feature_bc_matrix.h5'
  barcodeMatrix <- ProcessCountMatrix(rawCountData = rawData, barcodeWhitelist = c('MS-11', 'MS-12'))
  df <- GenerateCellHashingCalls(barcodeMatrix = barcodeMatrix, methods = c('bff_cluster', 'demuxem'), demuxem.rawFeatureMatrixH5 = h5File)

Development Guidelines

New development should occur on a branch, and go through a Pull Request before merging into the master branch. See here for information on the pull request workflow. Ideally PRs would be reviewed by another person. For the PR, please review the set of changed files carefully to make sure you are only merging the changes you intend.
New functions should have Roxygen2 documentation.
As part of each PR, you should run 'devtools::document()' to update documentation and include these changes with your commits.
It is a good idea to run 'R CMD check' locally to make sure your changes will pass. See here for more information
Code should only be merged after the build and tests pass. The master branch should always be stable.
New features should ideally have at least a basic test (see R testthat). There is existing test data in ./tests/testdata. This can be expanded, but please be conscious about file size and try to reuse data across tests if appropriate.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.github		.github
R		R
inst/rmd		inst/rmd
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.dockerignore		.dockerignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
Dockerfile		Dockerfile
NAMESPACE		NAMESPACE
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cellhashR.iml		cellhashR.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cellhashR

Table of Contents

Overview

Example Usage

Installation

Known Issues

demuxEM

Development Guidelines

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cellhashR

Table of Contents

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages