This script compiles and links biological reactions and compound information from across multiple public repositories.
This is #2 of a series of pipelines that comprise this Universal Reference for various 'omics data analysis.
1. Universal Taxonomy Database: found here
2. Universal Compounds Database: this repository
3. Universal Reactions Database: this repository
4. Universal Protein Alignment Database: found here
5. Universal ncRNA Alignment Database: found here
These databases span all kingdoms of life. The databases allow the simultaneous identification of microbial community phylogeny and functions. All the biological molecules/compounds and their data are linked to their enzymes and transporters to map the flow of metabolites in microbe-microbe or microbe-host interactions. The databases are used for (meta)transcriptomics, (meta)proteomics, metabolomics, metagenomics, and for novel binning and MAG quality control software I've created. This way many types of data can be combined into a secondary analysis, with taxonomy and functions being directly linked. It also covers both the protein and the non-coding fraction of sequencing.
For the most part, this script is automated - except the BioCyc collection. You'll need to request a license to download their flat files. They will give you a username, password, and link to the download site - you'll be asked to asked to provide these for downloading the files.
This script automatically downloads data it needs off the various public repositories.
- Upshot - you shouldn't have to do anything but run it
- Downside - those repositories may change their links, so you may have to fix the links in the code
There are several primary and secondary microbiome analysis pipelines on my main GitHub page that use these databases:
- RNAseq/Metatranscriptome Analysis here
- Metagenome Primary Analysis here
- Metabolomics Analysis - TBD; for now you can directly link the metabolomics output to compounds->reactions->proteins/oranisms using the Functional and Protein databases
- Strain-level Metagenome Binning here
- Secondary Functional Analysis and Visualization here