This repository presents a computational workflow to compute and characterize all iModulons for a selected organism. This occurs in five steps:
- Gather all publicly available RNA-seq data for the organism (Step 1)
- Process the RNA-seq data (Step 2)
- Inspect data to identify high-quality datasets (Step 3)
- Compute iModulons (Step 4)
- Characterize iModulons using PyModulon (Step 5)
iModulons are independently-modulated group of genes that are computed through Independent Component Analysis (ICA) of a gene expression dataset. To learn more about iModulons or explore published iModulons, visit iModulonDB or see our publications for Escherichia coli, Staphylococcus aureus, or Bacillus subtilis.
Here, we introduce the concept of the Modulome for an organism, which is the set of all iModulons that can be computed for the organism based on publicly available RNA-seq data. The computational pipeline provides a step-by-step workflow to compute the Modulome for Bacillus subtilis.
We have provided pre-built Docker containers with all necessary software.
To begin, install Docker and Nextflow.
You can also run each program locally, with all requirements listed in the conda environment.yml
file. For Step 5 (Characterized iModulons), additionally install pymodulon.
Please cite the following pre-print: Mining all publicly available expression data to compute dynamic microbial transcriptional regulatory networks