Skip to content

Nuwa-genomics/Nuwa

Repository files navigation

GitHub last commit CI tests docs

A bioinformatics web tool built with streamlit and scanpy for genomics data processing and analysis 🧬🐍.

Deep neural networks have many potential use cases for genomic analyses including quality control, dimensionality reduction or even spatial transcriptomics. Nuwa aims to integrate several deep learning models in a visual, easy to use interface with other filtering and data analysis familiar to scanpy users.

Warning

Project is still in its infancy, not recommended for use in research or commercial use. Still missing support for exporting some python and R scripts.

Getting Started

Make sure docker and docker-compose are installed on your host machine.

Next, clone the repo:

git clone https://github.com/nuwa-genomics/Nuwa.git && cd Nuwa

If you have a Nvidia GPU:

1.Make sure cuda drivers are installed on the host machine.

2.Install and configure Nvidia container toolkit for docker

3.Bring up containers using CUDA dockerfile:

docker-compose -f cuda.docker-compose.yml up -d --build

If using a CPU:

Bring up containers using CPU dockerfile:

docker-compose -f cpu.docker-compose.yml up -d --build

Then visit http://localhost in your browser.

Docs

See our Documentation for more information and tutorials.

Common issues

  • Ensure system clock is correct if you encounter release file invalid error when building images:
E: Release file for http://archive.ubuntu.com/ubuntu/dists/focal-updates/InRelease is not valid yet (invalid for another 1min 55s). Updates for this repository will not be applied.

Preprocess

Preprocess 10x genomics reads using scanpy's preprocessing module:

  • Filter genes and cell metrics
  • Annotate and filter mitochrondrial, ribosomal and haemoglobin genes
  • Show highly variable genes
  • Show most expressed genes
  • Normalize, logarithmize and scale data
  • Doublet detection
  • Batch effect correction
  • Cell cycle scoring
  • Apply recipes to quickly preprocess data

preprocess preprocess

Dataset integration

Integrate a variety of dataset types using Scanpy's external integration libraries and SCVI toolkit, along with useful pandas data manipulation for dataframes.

integrate

You can also train a deep learning model to integrate datasets using SCVI and scANVI.

integrate

Build model

Available models:

  • Cite-seq dimensionality reduction for cluster analysis.
  • Solo Remove doublets using semi-supervised autoencoders.

Automatically selects a Cuda capable GPU for faster training if one is available.

build model

Cluster Analysis

Cluster analysis consists of:

  • Autoencoder cluster plot
  • tSNE cluster plot
  • Principal Component Analysis of selected genes
  • Variance ratio of principal components
  • Neighbourhood graph

Analysis

Differential gene expression

Differential gene expression looks at how genes are expressed compared to the rest of the dataset. This includes useful matrix plots, dot plots and violin vlots to visualise variable expression. You can also choose which clusters and statistical tests to run the DE analysis.

Differential gene expression

Elbow plots for comparing clusters:

Differential gene expression elbow plot

Interactive violin plots for individual genes or clusters:

Differential gene expression violin plot

Trajectory Inference

Trajectory inference gives key insights into cell differentiation through ordering the stages of development into a continuous sequence of clusters. Here you can:

  • View PAGA graphs
  • Embed PAGA into louvain graphs
  • View diffusion pseudotime of selected genes

Trajectory inference

Spatial Transcriptomics

View expression profiles while retaining spatial information. Currently includes:

  • Visualise spatial plots overlaid on histology images
  • Neighbourhood enrichment
  • Interaction matrices
  • Centrality score
  • Ripley score
  • Co-occurance score
  • Ligand-receptor interaction

Spatial transcriptomics

Plotly 3D

A 3D interactive chart for visualising cluster embeddings.

Plotly chart

Access bioconda evironment

You can access the docker container by running:

docker exec -it streamlit bash
cd ../streamlit-volume/
#activate bioconda environment
source activate bioconda_env
#interactive python
python3
#also interactive R
R

From here you can access many of the python and R packages availble within bioconda.

File structure

The streamlit volume is mounted within the installation path of the git repository.

β”œβ”€β”€ Nuwa
    β”œβ”€β”€ streamlit-volume
        β”œβ”€β”€ exported_workspaces
        |   β”œβ”€β”€ workspace_6f2b160d4a89bafe
        |       β”œβ”€β”€ workspace_6f2b160d4a89bafe.h5ad
        |       β”œβ”€β”€ checksum.sha256
        β”œβ”€β”€ workspace_6f2b160d4a89bafe
            β”œβ”€β”€ adata
            |   β”œβ”€β”€ example.h5ad
            β”œβ”€β”€ downloads
            |   β”œβ”€β”€example
            |      β”œβ”€β”€ seurat #files downloaded in seurat format
            |      |   β”œβ”€β”€ barcodes.tsv
            |      |   β”œβ”€β”€ features.tsv
            |      |   β”œβ”€β”€ matrix.mtx
            |      |   β”œβ”€β”€ metadata.csv
            |      β”œβ”€β”€ example.h5ad #files downloaded in h5ad format
            β”œβ”€β”€ uploads
                β”œβ”€β”€ example.h5ad

Run on mobile

For security reasons the web server is only accessible over a local interface. To disable this run:

SERVER_ADDR=0.0.0.0 docker-compose up --build

The app can then be accessed on http://IP_ADDRESS_OF_COMPUTER in your phone's browser.

⚠️ THIS WILL ALLOW ANYONE IN YOUR LOCAL NETWORK TO ACCESS THE WEB SERVER AND HENCE YOUR COMPUTER'S RESOUCES AND DOCKER ENVIRONMENT

The host's filesysystem will not be accessible to others.

Running Tests

Automated unit tests can be run using the tests docker compose file:

docker-compose -f tests.docker-compose.yml up --build

Future work

  • Add other models
  • Transfer learning, saving and loading models
  • support other file types
  • Add other analysis scores/graphs

Contributing

If you have a feature request, notice a bug or have issues running the app please let us know in the Issues or Discussions tabs! Want to make a contribution? Make a pull request!

Citations

[1] Lin, X., Tian, T., Wei, Z. et al. Clustering of single-cell multi-omics data with a multimodal deep learning method. Nat Commun 13, 7705 (2022). https://doi.org/10.1038/s41467-022-35031-9.

[2] Bernstein, N., Fong, N., Lam, I. et al. (2020). Solo: Doublet Identification in Single-Cell RNA-Seq via Semi-Supervised Deep Learning. https://www.cell.com/cell-systems/fulltext/S2405-4712(20)30195-2.

[3] Adam Gayoso., Jonathan Shor. (2022). JonathanShor/DoubletDetection: doubletdetection v4.2 (v4.2). Zenodo. https://doi.org/10.5281/zenodo.6349517.

[4] Samuel L. Wolock., Romain Lopez., Allon M. Klein. (2019). Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. https://www.sciencedirect.com/science/article/pii/S2405471218304745.

Donate ₿❀️

All development is non-profit, any contributions are greatly appreciated! Donate in bitcoin to:

Bitcoin address (on-chain)

bc1qs9l5dvkrtgcxfewm5ly6rs47p2qjxv55qkqwwu