Skip to content

bug1303/cellhashR

 
 

Repository files navigation

R Build and Checks DOI

cellhashR

An R package designed to demultiplex cell hashing data. Please see our documentation for more detail.

Table of Contents

Cell hashing is a method that allows sample multiplexing or super-loading within single-cell RNA-seq platforms, such as 10x genomics, originally developed at New York Genome Center in collaboration with the Satija lab. See here for more detail on the technique. The general idea is that cells are labeled with a staining reagent (such as an antibody) tagged with a short nucleotide barcode. Other staining methods have been published, such as the lipid-based Multi-Seq (https://www.ncbi.nlm.nih.gov/pubmed/31209384). In all methods, the hashtag oligo/barcode is sequenced in parallel with cellular mRNA, creating a separate cell hashing library. After sequencing, the cell barcode and hashing index are parsed using tools like Cite-seq-Count (https://github.com/Hoohm/CITE-seq-Count), creating a count matrix with the total hash tag counts per cell.

Once the count matrix is created, an algorithm must be used to demultiplex cells and assign them to hash tags (i.e. sample). This is where cellhashR comes in. This package provides several functions:

  • Quality control reports for the cell hashing library, covering read counts and normalization. Think FASTQC, except for cell hashing data.
  • A single interface to run one or more demutiplexing algorithms, including the novel demultiplexing algorithms BFF_raw and BFF_cluster. Each algorithm has pros and cons, and will perform better or worse under certain conditions (though in our experience, of the algorithms we have tested, the BFF algorithms work most consistently and under the widest variety of conditions). If you select multiple algorithms (our default workflow), cellhashR will score cells using the consensus call from the set. Various QC summaries are produced during this process as well, if debugging is needed. In addition to the BFF demultiplexing algorithms, other algorithms that can be run from cellhashR include:
  • The workflow produces a unified table with the results of each caller and the consensus call. Final QC plots and summaries are created.

Each step of the workflow can either be run interactively in R (through the terminal or RStudio), or it can be executed as a pipeline that runs all commands and creates the call table and an HTML report.

Click here to view an example QC report

Below are the primary functions of cellhashR needed to QC and score hashing data:

# Example 1: parse CITE-seq-Count output, printing QC
barcodeData <- ProcessCountMatrix(rawCountData = 'myCountDir/umi_count', minCountPerCell = 5)

# Example 2: parse CITE-seq-Count output, providing a barcode whitelist. 
barcodeData <- ProcessCountMatrix(rawCountData = 'myCountDir/umi_count', minCountPerCell = 5, barcodeWhitelist = c('HTO-1', 'HTO-2', 'HTO-3', 'HTO-4', 'HTO-6'))

# Create QC plots of barcode normalization
PlotNormalizationQC(barcodeData)

# Generate the final cell hashing calls
calls <- GenerateCellHashingCalls(barcodeMatrix = barcodeData, methods = c('multiseq', 'htodemux'))

# Inspect negative cells:
SummarizeCellsByClassification(calls = calls, barcodeMatrix = barcodeData)

Or export/save a template RMarkdown file outlining the default workflow, which can be run interactively or headlessly as part of a pipeline:

GetExampleMarkdown(dest = 'cellhashR_template.rmd')

Finally, the workflow can be executed using this wrapper around the Rmarkdown, producing a TSV of calls and HTML QC report:

CallAndGenerateReport(rawCountData = 'myCountDir/umi_count', reportFile = 'report.html', callFile = 'calls.txt', barcodeWhitelist = c('HTO-1', 'HTO-2', 'HTO-3'), title = 'Cell Hashing For Experiment 1')
# Make sure to update your Rprofile to include Bioconductor repos, such as adding this line to ~/.Rprofile:
local({options(repos = BiocManager::repositories())})

#Latest version:
devtools::install_github(repo = 'bimberlab/cellhashR', ref = 'master', dependencies = TRUE, upgrade = 'always')

Pre-packaged Docker images with all needed dependencies installed can be found on our GitHub Packages page. We recommend using a specific release, which you can do using tags:

docker pull ghcr.io/bimberlab/cellhashr:latest

If you receive an error along the lines of:

"ERROR; return code from pthread_create() is 22\n"

Please manually install preprocessCore with threading disabled:

devtools::install_github('bmbolstad/preprocessCore', dependencies = T, upgrade = 'always', configure.args = '--disable-threading')

Unlike the other algorithms, which just require the HTO count matrix, demuxEM also requires the 10x h5 gene expression counts. This can be supplied as follows. This example runs BFF and demuxEM:

  rawData <- '../testdata/438-21-GEX/umi_count'
  h5File <- '../testdata/438-21-GEX/438-21-raw_feature_bc_matrix.h5'
  barcodeMatrix <- ProcessCountMatrix(rawCountData = rawData, barcodeWhitelist = c('MS-11', 'MS-12'))
  df <- GenerateCellHashingCalls(barcodeMatrix = barcodeMatrix, methods = c('bff_cluster', 'demuxem'), demuxem.rawFeatureMatrixH5 = h5File)
  • New development should occur on a branch, and go through a Pull Request before merging into the master branch. See here for information on the pull request workflow. Ideally PRs would be reviewed by another person. For the PR, please review the set of changed files carefully to make sure you are only merging the changes you intend.

  • New functions should have Roxygen2 documentation.

  • As part of each PR, you should run 'devtools::document()' to update documentation and include these changes with your commits.

  • It is a good idea to run 'R CMD check' locally to make sure your changes will pass. See here for more information

  • Code should only be merged after the build and tests pass. The master branch should always be stable.

  • New features should ideally have at least a basic test (see R testthat). There is existing test data in ./tests/testdata. This can be expanded, but please be conscious about file size and try to reuse data across tests if appropriate.

About

An R package designed to demultiplex cell hashing data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • R 97.9%
  • Dockerfile 2.1%