## cGEM modeling pipeline

Inputs
- MAGs and metadata (taxonomy, relative abundance, etc.)
- Medium composition (tsv file)

Steps
1. Reconstruct individual GEMs with CarveME -> save xmls
2. Generate MICOM's taxonomy table for reconstructed models -> save tsv
3. Build Community model with MICOM -> save pickle
4. Run MICOM's grow workflow and get exchanges -> save tsv
5. Run Visualization module to get three different plots -> save pngs

__NOTES__:

* Currently using the SCIP solver

## Input files

1. Directory with gene predictions translated to proteins for each MAG as faa files, with file names as MAG IDs
2. Relative abundance as tsv file, with MAG ID, abundance, and taxonomy
3. Medium composition as tsv file, with exchange reaction ID, and maximum uptake rate
4. Universal model file as xml

In [1]:
import pandas as pd

abundances = pd.read_csv("../tests/data/abundances.tsv", sep="\t", index_col=0)
abundances.head()

Unnamed: 0_level_0,taxonomy,abundance
id,Unnamed: 1_level_1,Unnamed: 2_level_1
TARA_ARC_108_MAG_00080,Alteromonas,30
TARA_ARC_108_MAG_00083,Sulfitobacter,40
TARA_ARC_108_MAG_00201,Polaribacter,10
TARA_ARC_108_MAG_00174,Marinobacter,20


## Reconstruction of individual GEMs

In [1]:
%%bash

docker run \
  -v /home/robaina/Documents/NewAtlantis/microcom:/app/microcom \
  -v /home/robaina/Documents/NewAtlantis/microcom/tests/data/genome_table.tsv:/app/config.tsv \
  -v /home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_carveme:/app/results \
  ghcr.io/new-atlantis-labs/carveme:latest \
  --config /app/config.tsv \
  --outdir /app/results \
  --processes 10

diamond v2.1.11.165 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 16
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
diamond v2.1.11.165 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 16
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Database input file: /opt/conda/envs/carveme/lib/python3.12/site-packages/carveme/data/generated/bigg_proteins.faa
Opening the database file... Database input file: /opt/conda/envs/carveme/lib/python3.12/site-packages/carveme/data/generated/bigg_proteins.faa
Opening

Failed to run diamond.


Error: Error detecting input file format. Input file seems to be empty.


Failed to run diamond.


 [0.117s]
Masking sequences...  [0.117s]
Masking sequences...  [0.157s]
Writing sequences...  [0.009s]
Hashing sequences...  [0.167s]
Writing sequences...  [0.004s]
Loading sequences...  [0s]
Writing trailer...  [0s]
Closing the input file...  [0s]
Closing the database file...  [0.004s]

Database sequences  26727
  Database letters  11170577
     Database hash  24c02a79262bb879015f9aacd7495165
        Total time  0.296000s
 [0.008s]
Hashing sequences...  [0.002s]
Loading sequences...  [0s]
Writing trailer...  [0s]
Closing the input file...  [0s]
Closing the database file...  [0.001s]

Database sequences  26727
  Database letters  11170577
     Database hash  24c02a79262bb879015f9aacd7495165
        Total time  0.300000s
  self.problem = Model()
  self.problem = Model()
  self.problem = Model()
  self.problem = Model()


Error while terminating subprocess (pid=82055): 


Running diamond for the first time, please wait while we build the internal database...
Running diamond for the first time, please wait while we build the internal database...

Processing TARA_ARC_108_MAG_00083.fasta
  Universe: /app/microcom/data/universes/universal_prokaryote_curated.xml
  Media: /app/microcom/data/media/media_db.tsv
  Medium ID: MARINE
Completed processing TARA_ARC_108_MAG_00083.fasta

Processing TARA_ARC_108_MAG_00174.fasta
  Universe: /app/microcom/data/universes/universal_prokaryote_curated.xml
  Media: /app/microcom/data/media/media_db.tsv
  Medium ID: MARINE
Completed processing TARA_ARC_108_MAG_00174.fasta

Processing TARA_ARC_108_MAG_00080.fasta
  Universe: /app/microcom/data/universes/universal_prokaryote_curated.xml
  Media: /app/microcom/data/media/media_db.tsv
  Medium ID: MARINE
Completed processing TARA_ARC_108_MAG_00080.fasta

Processing TARA_ARC_108_MAG_00201.fasta
  Universe: /app/microcom/data/universes/universal_prokaryote_curated.xml
  Media: /app

## Prepare medium file from media database for MICOM

In [4]:
%%bash

DATA_DIR="/home/robaina/Documents/NewAtlantis/microcom/data"
RESULTS_DIR="/home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_micom"
mkdir -p "${RESULTS_DIR}"

docker run --rm \
  -v "${DATA_DIR}:/app/data" \
  -v "${RESULTS_DIR}:/app/results" \
  ghcr.io/new-atlantis-labs/micom:latest \
  get_medium_from_media_db \
  --media-db /app/data/media/media_db.tsv \
  --medium-id "MARINE" \
  --compartment "m" \
  --max-uptake 10.0 \
  --outfile /app/results/marine_media.tsv

## Making MICOM's taxa table

In [6]:
%%bash

DATA_DIR="/home/robaina/Documents/NewAtlantis/microcom/tests/data"
GEMS_DIR="/home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_carveme/gems"
RESULTS_DIR="/home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_micom"
mkdir -p "${RESULTS_DIR}"

docker run --rm \
  -v "${DATA_DIR}:/app/data" \
  -v "${GEMS_DIR}:/app/gems_scip" \
  -v "${RESULTS_DIR}:/app/results" \
  ghcr.io/new-atlantis-labs/micom:latest \
  build_taxa_table \
    --sample_id "TARA_ARC_108" \
    --abundances /app/data/abundances.tsv \
    --gems_dir /app/gems_scip \
    --out_taxatable /app/results/micom_database.tsv

In [8]:
import pandas as pd

taxa = pd.read_csv("../tests/test_docker_micom/micom_database.tsv", sep="\t", index_col=0)
taxa.head()

Unnamed: 0_level_0,id,abundance,taxonomy,file
sample_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
TARA_ARC_108,TARA_ARC_108_MAG_00174,20,Marinobacter,/app/gems_scip/TARA_ARC_108_MAG_00174.xml
TARA_ARC_108,TARA_ARC_108_MAG_00083,40,Sulfitobacter,/app/gems_scip/TARA_ARC_108_MAG_00083.xml
TARA_ARC_108,TARA_ARC_108_MAG_00080,30,Alteromonas,/app/gems_scip/TARA_ARC_108_MAG_00080.xml
TARA_ARC_108,TARA_ARC_108_MAG_00201,10,Polaribacter,/app/gems_scip/TARA_ARC_108_MAG_00201.xml


## Build community model

In [10]:
%%bash

GEMS_DIR="/home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_carveme/gems"
INPUT_DIR="/home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_micom"
RESULTS_DIR="/home/robaina/Documents/NewAtlantis/microcom/test_docker_micom"
mkdir -p "${RESULTS_DIR}"

docker run --rm \
  -v "${INPUT_DIR}:/app/data" \
  -v "${RESULTS_DIR}:/app/results" \
  -v "${GEMS_DIR}:/app/gems_scip" \
  ghcr.io/new-atlantis-labs/micom:latest \
  build_cgem \
  --taxa_table /app/data/micom_database.tsv \
  --outdir /app/results \
  --abundance_cutoff 0.01 \
  --gems_dir /app/gems_scip \
  --threads 10 \
  --solver "hybrid"

Ignoring reaction 'EX_h2_e' since it already exists.
Ignoring reaction 'EX_h_e' since it already exists.
Ignoring reaction 'EX_photon_e' since it already exists.
Ignoring reaction 'EX_12ppd__S_e' since it already exists.
Ignoring reaction 'EX_23dappa_e' since it already exists.
Ignoring reaction 'EX_26dap__M_e' since it already exists.
Ignoring reaction 'EX_3amp_e' since it already exists.
Ignoring reaction 'EX_3gmp_e' since it already exists.
Ignoring reaction 'EX_3hcinnm_e' since it already exists.
Ignoring reaction 'EX_3hpp_e' since it already exists.
Ignoring reaction 'EX_3ump_e' since it already exists.
Ignoring reaction 'EX_5dglcn_e' since it already exists.
Ignoring reaction 'EX_LalaDglu_e' since it already exists.
Ignoring reaction 'EX_acgal_e' since it already exists.
Ignoring reaction 'EX_acgal1p_e' since it already exists.
Ignoring reaction 'EX_acgam_e' since it already exists.
Ignoring reaction 'EX_acmana_e' since it already exists.
Ignoring reaction 'EX_ade_e' since it alr

Running ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:02:31
Model building completed. Manifest:
      sample_id                 file
0  TARA_ARC_108  TARA_ARC_108.pickle


In [11]:
from micom import load_pickle

cgem = load_pickle("../tests/test_docker_micom/TARA_ARC_108.pickle")
cgem

0,1
Name,TARA_ARC_108
Memory address,7fea1dfe3340
Number of metabolites,15395
Number of reactions,25648
Number of genes,3100
Number of groups,0
Objective expression,1.0*community_objective
Compartments,"e__TARA_ARC_108_MAG_00174, p__TARA_ARC_108_MAG_00174, c__TARA_ARC_108_MAG_00174, m, e__TARA_ARC_108_MAG_00083, p__TARA_ARC_108_MAG_00083, c__TARA_ARC_108_MAG_00083, e__TARA_ARC_108_MAG_00080, p__TARA_ARC_108_MAG_00080, c__TARA_ARC_108_MAG_00080, e__TARA_ARC_108_MAG_00201, p__TARA_ARC_108_MAG_00201, c__TARA_ARC_108_MAG_00201"


## Computing trophic exchanges

In [12]:
%%bash

RESULTS_DIR="/home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_micom"

docker run --rm \
  -v "${RESULTS_DIR}:/app/results" \
  ghcr.io/new-atlantis-labs/micom:latest \
  get_exchanges \
  --manifest /app/results/manifest.csv \
  --outdir /app/results \
  --media_file /app/results/marine_media.tsv \
  --growth_tradeoff 0.5 \
  --threads 12 \
  --out_exchanges /app/results/exchanges.tsv

Running ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:01:03
Growth simulations completed.


## Generate figures

In [28]:
%%bash

docker run \
    -v /home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_micom:/data \
    -v /home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_viz:/app/results \
    ghcr.io/new-atlantis-labs/cgem-viz:latest \
    --exchanges-file /data/exchanges.tsv \
    --flux-cutoff "top10" \
    --visualization-type network \

![Figure 1](/home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_viz/trophic_interactions.png)

In [29]:
%%bash

docker run \
    -v /home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_micom:/data \
    -v /home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_viz:/app/results \
    ghcr.io/new-atlantis-labs/cgem-viz:latest \
    --exchanges-file /data/exchanges.tsv \
    --visualization-type heatmap \
    --output-dir /app/results \
    --normalize-heatmap \
    --cluster-heatmap

![Figure 2](/home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_viz/exchange_heatmap.png)

In [33]:
%%bash

docker run \
    -v /home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_micom:/data \
    -v /home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_viz:/app/results \
    ghcr.io/new-atlantis-labs/cgem-viz:latest \
    --exchanges-file /data/exchanges.tsv \
    --visualization-type sankey \
    --output-dir /app/results \
    --sankey-flux-cutoff 0.1

![Figure 3](/home/robaina/Documents/NewAtlantis/microcom/tests/test_docker_viz/metabolic_sankey.png)