DeepGOMeta

This repository contains the scripts and datafiles used in the DeepGOmeta manuscript.

Dependencies

The code was developed and tested using python 3.10.
Clone the repository: git clone https://github.com/bio-ontology-research-group/deepgometa.git
Create virtual environment with Conda or python3-venv module.
Install PyTorch: pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
Install DGL: pip install dgl==1.1.2+cu117 -f https://data.dgl.ai/wheels/cu117/repo.html
Install other requirements: pip install -r requirements.txt

Running DeepGOMeta model

Follow these instructions to obtain predictions for your proteins. You'll need around 30Gb storage and a GPU with >16Gb memory (or you can use CPU)

Download the data.tar.gz
Extract tar xvzf data.tar.gz
Run the model python predict.py -if data/example.fa

Docker container

We also provide a docker container with all dependencies installed: docker pull coolmaksat/deepgometa
This repository is installed at /deepgometa directory. To run the scripts you'll need to mount the data directory. Example:
docker run --gpus all -v $(pwd)/data:/workspace/deepgometa/data coolmaksat/deepgometa python predict.py -if data/example.fa

Nextflow

DeepGOMeta can be run as a Nextflow workflow using the docker image for easier execution.

Requirements:

For amplicon data: OTU table of relative abundance, where OTUs are classified using the RDP database
For WGS data: Protein sequences in FASTA format

After cloning the repository, navigate to the Nextflow directory: cd Nextflow
Update the runOptions paths in nextflow.config
Navigate to the data directory cd data_and_scripts and download the genome annotations
Run workflow. Example: nextflow run DeepGOMeta.nf -profile docker/singularity --amplicon true --OTU_table otu_relative_abd.tsv --pkl_dir /PATH/TO/PKL/DIR/

Paired Datasets

Data and metadata: download from SRA and MG-RAST using sample accessions
Processing reads:
- 16S reads - generate OTU tables using the Nextflow 16SProcessing workflow
- WGS reads - obtain protein sequences using the assembly pipeline
Functional annotation:
- OTU tables - generate a weighted functional profile for each OTU table using DeepGOmeta predictions
- Protein fasta - run DeepGOmeta on Prodigal output from metagenome assemblies, and generate a binary functional profile for each dataset
Clustering and Purity: use a metadata file and the functional profile to apply PCA, k-means clustering, calculating purity, and generating plots for 16S datasets and WGS datasets
Information Content Calculation: create a .txt file for each sample containing the 16S predicted functions and WGS predicted functions on separate lines (e.g. 16Ssample'\t'GO1'\t'GO2'\n'WGSsample'\t'GO2'\t'GO3), and get IC for each function, then run a t-test

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
Nextflow		Nextflow
PairedDatasets		PairedDatasets
.gitignore		.gitignore
LICENSE		LICENSE
Normalizer.groovy		Normalizer.groovy
README.md		README.md
Sim.groovy		Sim.groovy
Sim_matrix_plot.py		Sim_matrix_plot.py
aminoacids.py		aminoacids.py
annots_data.py		annots_data.py
data.py		data.py
deepgo_esm.py		deepgo_esm.py
deepgo_gat.py		deepgo_gat.py
dgg.py		dgg.py
diamond_data.py		diamond_data.py
diamond_preds.py		diamond_preds.py
esm_missing.py		esm_missing.py
evaluate.py		evaluate.py
evaluate_entailment.py		evaluate_entailment.py
extract_esm.py		extract_esm.py
filter_tax.py		filter_tax.py
get_specific_terms.py		get_specific_terms.py
mlp_esm.py		mlp_esm.py
predict.py		predict.py
requirements.txt		requirements.txt
run_diamond.sh		run_diamond.sh
save_graphs.py		save_graphs.py
torch_utils.py		torch_utils.py
uni2pandas.py		uni2pandas.py
upload_data.sh		upload_data.sh
utils.py		utils.py

License

bio-ontology-research-group/deepgometa

Folders and files

Latest commit

History

Repository files navigation

DeepGOMeta

Dependencies

Running DeepGOMeta model

Docker container

Nextflow

Paired Datasets

About

Resources

License

Stars

Watchers

Forks

Languages