Skip to content

Workflow to download, process, and explore microbial RNA-seq data from NCBI SRA

License

Notifications You must be signed in to change notification settings

avsastry/modulome-workflow

Repository files navigation

modulome-workflow

⚠️ This repository is now deprecated. Please see https://github.com/SBRG/modulome-workflow for the actively maintained repository

This repository presents a computational workflow to compute and characterize all iModulons for a selected organism. This occurs in five steps:

  1. Gather all publicly available RNA-seq data for the organism (Step 1)
  2. Process the RNA-seq data (Step 2)
  3. Inspect data to identify high-quality datasets (Step 3)
  4. Compute iModulons (Step 4)
  5. Characterize iModulons using PyModulon (Step 5)

Background

iModulons are independently-modulated group of genes that are computed through Independent Component Analysis (ICA) of a gene expression dataset. To learn more about iModulons or explore published iModulons, visit iModulonDB or see our publications for Escherichia coli, Staphylococcus aureus, or Bacillus subtilis.

Here, we introduce the concept of the Modulome for an organism, which is the set of all iModulons that can be computed for the organism based on publicly available RNA-seq data. The computational pipeline provides a step-by-step workflow to compute the Modulome for Bacillus subtilis.

Setup

Docker

We have provided pre-built Docker containers with all necessary software.

To begin, install Docker and Nextflow.

Local installation

You can also run each program locally, with all requirements listed in the conda environment.yaml file. For Step 5 (Characterized iModulons), additionally install pymodulon.

Cite

Please cite the following pre-print: Mining all publicly available expression data to compute dynamic microbial transcriptional regulatory networks