Skip to content

This script compiles and links biological reactions and compound information from across multiple public repositories.

License

Notifications You must be signed in to change notification settings

TealFurnholm/Universal_Biological_Compounds_Database

Repository files navigation

Universal_Biological_Compounds_And_Reactions_Database

This script compiles and links biological reactions and compound information from across multiple public repositories. This is #2 of a series of pipelines that comprise this Universal Reference for various 'omics data analysis.
1. Universal Taxonomy Database: found here
2. Universal Compounds Database: this repository
3. Universal Reactions Database: this repository
4. Universal Protein Alignment Database: found here
5. Universal ncRNA Alignment Database: found here

How Universal?

These databases span all kingdoms of life. The databases allow the simultaneous identification of microbial community phylogeny and functions. All the biological molecules/compounds and their data are linked to their enzymes and transporters to map the flow of metabolites in microbe-microbe or microbe-host interactions. The databases are used for (meta)transcriptomics, (meta)proteomics, metabolomics, metagenomics, and for novel binning and MAG quality control software I've created. This way many types of data can be combined into a secondary analysis, with taxonomy and functions being directly linked. It also covers both the protein and the non-coding fraction of sequencing.

One Manual Input: BioCyc

For the most part, this script is automated - except the BioCyc collection. You'll need to request a license to download their flat files. They will give you a username, password, and link to the download site - you'll be asked to asked to provide these for downloading the files.

Automation

This script automatically downloads data it needs off the various public repositories.

  • Upshot - you shouldn't have to do anything but run it
  • Downside - those repositories may change their links, so you may have to fix the links in the code

I put as many "wget" commands as I could near the top of the script so you could easily mod them if need be.

How to Use

There are several primary and secondary microbiome analysis pipelines on my main GitHub page that use these databases:

  • RNAseq/Metatranscriptome Analysis here
  • Metagenome Primary Analysis here
  • Metabolomics Analysis - TBD; for now you can directly link the metabolomics output to compounds->reactions->proteins/oranisms using the Functional and Protein databases
  • Strain-level Metagenome Binning here
  • Secondary Functional Analysis and Visualization here

Get started: wiki

About

This script compiles and links biological reactions and compound information from across multiple public repositories.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages