Automated scRNA-seq Analysis in the Cloud

What’s the problem?

Currently, the process of analysing sc-RNA-seq data is difficult to manage without a repertoire of technological skills. There is no singular workflow that guides the user from data inputs to relevant analysis, particularly with a user friendly output.

Utilizing existing tools, we set out to create a linear workflow that would perform basic QC like filtering, normalization and automated annotations. We utilized the existing database Tabula Muris Senis as the starting point for the labelling step but we intend to use other datasets, in particular the ones part of the HCA when they become available.

Objective: Build a semi-automated sc-RNA-seq analysis workflow in the cloud that takes raw, unprocessed data and outputs a processed file annotated using OnClass and Tabula Muris Senis as the reference database for the annotations.

What is Tabula Muris Senis?

Tabula Muris Senis is a comprehensive resource for the cell biology community which offers a detailed molecular and cell-type specific portrait of aging.

What is OnClass?

OnClass is a python package for single-cell cell type annotation. It uses the Cell Ontology to capture the cell type similarity and because of that it can label cells in the new dataset whether they are present or not in the training data.

What's in this repo?

There are three related python projects here:

In [webapp][webapp/] there is simple flask app, that uses the docker containers defined in
[context_processing][context_processing]
and [context_annotations][context_annotations].

To download sample data:

./download-data.sh

To run the flask app:

cd webapp
pip install -r requirements.txt
./start.sh

The app uses images we have pushed to dockerhub. To rebuild the image locally and run it with the samples in data/:

./build-and-run-image.sh

There are some tests:

test-ci.sh is fast and is run by github: We should make sure we have a green checkmark before merging!
test-local.sh exercises all the scripts, and may be much slower. It should run successfully on a fresh checkout.

Roadmap:

This was begun at the Single Cell Hackathon, NYGC, January 15-17, 2020. It can run in a local development environment, but it's a long way from being something that could be deployed in the cloud. We've created issues for some of the next steps.

Input gene counts and metadata .h5ad
1. Preprocessing
Process data using Scanpy
1. Minimum number of reads
2. Minimum number of genes
3. Minimum number of cells
Visualization
1. Utilizing CZ Biohub cellxgene tool
Annotations
1. Label Propagation
2. SCVI & OnClass

Dependencies:

Scanpy Docker Numpy IPython Louvain Leidenalg python-igraph OnClass

Input file format

.h5ad

Where is the data?

Tabula Muris
Tabula Muris Senis

Codeathon team:

Lead Angela Oliveira Pisco, PhD - Chan Zuckerberg Biohub
Chuck McCallum - Harvard Medical School
Kyndal Goss – NIH Vaccine Research Center
Sanjana Shah - NIH Vaccine Research Center
Jaqueline Cattell – NIH Office of Data Science Strategy

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
context_annotations		context_annotations
context_processing		context_processing
webapp		webapp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TMS2.svg		TMS2.svg
block-diagram.jpg		block-diagram.jpg
build-and-run-image.sh		build-and-run-image.sh
docker-utils-annotations.sh		docker-utils-annotations.sh
docker-utils.sh		docker-utils.sh
download-data.sh		download-data.sh
requirements-dev.txt		requirements-dev.txt
test-ci.sh		test-ci.sh
test-local.sh		test-local.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated scRNA-seq Analysis in the Cloud

What’s the problem?

What is Tabula Muris Senis?

What is OnClass?

What's in this repo?

Roadmap:

Dependencies:

Input file format

Where is the data?

Codeathon team:

About

Releases

Packages

Contributors 5

Languages

License

NCBI-Codeathons/automated-sc-RNA-seq-analysis-in-the-cloud

Folders and files

Latest commit

History

Repository files navigation

Automated scRNA-seq Analysis in the Cloud

What’s the problem?

What is Tabula Muris Senis?

What is OnClass?

What's in this repo?

Roadmap:

Dependencies:

Input file format

Where is the data?

Codeathon team:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages