Skip to content
Doublet detection in single-cell RNA-seq data.
Jupyter Notebook Python
Branch: master
Clone or download
Latest commit ba99503 Jan 14, 2020
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs Double fix install requires Aug 25, 2019
doubletdetection Double fix install requires Aug 25, 2019
tests/notebooks Fix defaults (#127) Jul 17, 2019
.editorconfig add coding standard files Jul 6, 2019
.gitignore Finalized root .gitignore Jul 8, 2019
.pre-commit-config.yaml
CONTRIBUTING.rst add coding standard files Jul 6, 2019
LICENSE.txt Adds licensing info -- MIT (#45) May 31, 2017
README.md Correct DOI one last time. Really Jan 15, 2020
readthedocs.yaml make readthedocs pass (#126) Jul 12, 2019
setup.cfg Final line length update to 99. Jul 8, 2019
setup.py Double fix install requires Aug 25, 2019

README.md

DoubletDetection

DOI Documentation Status

DoubletDetection is a Python3 package to detect doublets (technical errors) in single-cell RNA-seq count matrices.

Installing DoubletDetection

git clone https://github.com/JonathanShor/DoubletDetection.git
cd DoubletDetection
pip3 install .

If you are using pipenv as your virtual environment, it may struggle installing from the setup.py due to our custom Phenograph requirement. If so, try the following in the cloned repo:

pipenv run pip3 install .

Running DoubletDetection

To run basic doublet classification:

import doubletdetection
clf = doubletdetection.BoostClassifier()
# raw_counts is a cells by genes count matrix
labels = clf.fit(raw_counts).predict()
  • raw_counts is a scRNA-seq count matrix (cells by genes), and is array-like
  • labels is a 1-dimensional numpy ndarray with the value 1 representing a detected doublet, 0 a singlet, and np.nan an ambiguous cell.

The classifier works best when

  • There are several cell types present in the data
  • It is applied individually to each run in an aggregated count matrix

In v2.5 we have added a new experimental clustering method (scanpy's Louvain clustering) that is much faster than phenograph. We are still validating results from this new clustering. Please see the notebook below for an example of using this new feature.

See our jupyter notebook for an example on 8k PBMCs from 10x.

Obtaining data

Data can be downloaded from the 10x website.

Citations

bioRxiv submission and journal publication expected in the coming months. Please use the following for now:

Gayoso, Adam, & Shor, Jonathan. (2018, July 17). DoubletDetection (Version v2.4). Zenodo. http://doi.org/10.5281/zenodo.2678042

This project is licensed under the terms of the MIT license.

You can’t perform that action at this time.