## cGEM modeling pipeline with MICOM

Inputs
- MAGs and metadata (taxonomy, relative abundance, etc.)
- Medium composition (tsv file)
- MICOM parameters (yaml file)

Steps
1. Reconstruct individual GEMs with CarveME -> save xmls
2. Generate MICOM's taxonomy table for reconstructed models -> save tsv
3. Build Community model with MICOM -> save pickle
4. Run MICOM's grow workflow and get exchanges -> save tsv
5. Run MICOM's elasticities workflow and get elasticities -> save tsv

## Input files

1. Directory with gene predictions translated to proteins for each MAG as faa files, with file names as MAG IDs
2. Relative abundance as tsv file, with MAG ID, abundance, and taxonomy
3. Medium composition as tsv file, with exchange reaction ID, and maximum uptake rate
4. Universal model file as xml

In [1]:
import pandas as pd

abundances = pd.read_csv("tests/data/abundances.tsv", sep="\t", index_col=0)
abundances.head()

Unnamed: 0_level_0,taxonomy,abundance
id,Unnamed: 1_level_1,Unnamed: 2_level_1
TARA_ARC_108_MAG_00080,Alteromonas,30
TARA_ARC_108_MAG_00083,Sulfitobacter,40
TARA_ARC_108_MAG_00201,Polaribacter,10
TARA_ARC_108_MAG_00174,Marinobacter,20


In [2]:
from src.helper_functions import get_medium_from_media_db

medium = get_medium_from_media_db(
    "data/media/media_db.tsv",
    "MARINE",
    compartment="m",
    max_uptake=10.0,
    outfile="tests/results/marine_media.tsv"
    )

## Reconstruction of individual GEMs

In [3]:
%%bash

GENOME_DIR="tests/data/genomes/"

for genome_file in "${GENOME_DIR}"*.fasta; do
    base_name=$(basename "$genome_file" .fasta)
    echo "Running $base_name"
    carve \
        --universe-file "data/universes/prokaryote_carveme_curated.xml" \
        --solver gurobi \
        -o "tests/gems/${base_name}.xml" \
        --init MARINE \
        --gapfill MARINE \
        --mediadb "data/media/media_db.tsv" \
        --fbc2 \
        "$genome_file" >/dev/null 2>&1
done

Running TARA_ARC_108_MAG_00080
Running TARA_ARC_108_MAG_00083
Running TARA_ARC_108_MAG_00174
Running TARA_ARC_108_MAG_00201


## Making MICOM's taxa table

In [9]:
%%bash

python src/build_taxa_table.py \
    "TARA_ARC_108" \
    tests/data/abundances.tsv \
    tests/gems \
    --output tests/results/micom_database.tsv

In [10]:
import pandas as pd

taxa = pd.read_csv("tests/results/micom_database.tsv", sep="\t", index_col=0)
taxa.head()

Unnamed: 0_level_0,id,abundance,taxonomy,file
sample_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
TARA_ARC_108,TARA_ARC_108_MAG_00174,20,Marinobacter,tests/gems/TARA_ARC_108_MAG_00174.xml
TARA_ARC_108,TARA_ARC_108_MAG_00083,40,Sulfitobacter,tests/gems/TARA_ARC_108_MAG_00083.xml
TARA_ARC_108,TARA_ARC_108_MAG_00080,30,Alteromonas,tests/gems/TARA_ARC_108_MAG_00080.xml
TARA_ARC_108,TARA_ARC_108_MAG_00201,10,Polaribacter,tests/gems/TARA_ARC_108_MAG_00201.xml


## Build community model

In [11]:
%%bash

python src/build_cgem.py \
  tests/results/micom_database.tsv \
  --out_folder tests/results \
  --cutoff 0.01 \
  --threads 10 \
  --solver gurobi

[2KRunning [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m  0%[0m [36m-:--:--[0mSet parameter Username
Academic license - for non-commercial use only - expires 2024-11-05
[2KRunning [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100%[0m [33m0:01:10[0m
[?25hModel building completed. Manifest:
      sample_id                 file
0  TARA_ARC_108  TARA_ARC_108.pickle


In [12]:
from micom import load_pickle

cgem = load_pickle("tests/results/TARA_ARC_108.pickle")
cgem

Read LP format model from file /tmp/tmpu94_reb8.lp
Reading time = 0.06 seconds
: 7675 rows, 22657 columns, 97547 nonzeros


0,1
Name,TARA_ARC_108
Memory address,7fb731cfc1d0
Number of metabolites,7670
Number of reactions,11328
Number of genes,3097
Number of groups,0
Objective expression,1.0*community_objective
Compartments,"e__TARA_ARC_108_MAG_00174, p__TARA_ARC_108_MAG_00174, c__TARA_ARC_108_MAG_00174, m, e__TARA_ARC_108_MAG_00083, p__TARA_ARC_108_MAG_00083, c__TARA_ARC_108_MAG_00083, p__TARA_ARC_108_MAG_00080, c__TARA_ARC_108_MAG_00080, e__TARA_ARC_108_MAG_00080, p__TARA_ARC_108_MAG_00201, c__TARA_ARC_108_MAG_00201, e__TARA_ARC_108_MAG_00201"


## Computing trophic exchanges

In [14]:
%%bash

python src/get_exchanges.py \
  tests/results/manifest.csv \
  tests/results \
  tests/results/marine_media.tsv \
  --tradeoff 0.5 \
  --threads 12 \
  --output tests/results/exchanges.tsv

Set parameter Username
Academic license - for non-commercial use only - expires 2024-11-05
Read LP format model from file /tmp/tmpzp2x4rik.lp
Reading time = 0.06 seconds
: 7675 rows, 22657 columns, 97547 nonzeros
[2KRunning [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m  0%[0m [36m-:--:--[0mRead LP format model from file /tmp/tmp0zm1l9z7.lp
Reading time = 0.06 seconds
: 7675 rows, 22657 columns, 97547 nonzeros
[2KRunning [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100%[0m [33m0:00:26[0m
[?25hGrowth simulations completed.


## Computing elasticities

In [16]:
%%bash

python src/get_elasticities.py \
  tests/results/TARA_ARC_108.pickle \
  --fraction 0.5 \
  --output tests/results/elasticities.tsv

Set parameter Username
Academic license - for non-commercial use only - expires 2024-11-05
Read LP format model from file /tmp/tmp4nkt7sqn.lp
Reading time = 0.06 seconds
: 7675 rows, 22657 columns, 97547 nonzeros
[2KMetabolites [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100%[0m [33m0:07:26[0mm [36m0:00:14[0m
[2KTaxa [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100%[0m [33m0:01:23[0mm [36m0:00:15[0m
[?25hElasticity calculations completed.


## Reconstruction workflow

In [5]:
%%bash

HOMEDIR="/home/robaina/Documents/NewAtlantis/microcom"

nextflow run pipelines/gem_reconstruction.nf \
  --genomes_dir ${HOMEDIR}/tests/data/genomes \
  --media_file ${HOMEDIR}/data/media/media_db.tsv \
  --medium_id "MARINE" \
  --universe ${HOMEDIR}/data/universes/prokaryote_carveme_curated.xml \
  --outdir ${HOMEDIR}/test_nf \
  -work-dir ${HOMEDIR}/test_nf/work \
  -log ${HOMEDIR}/test_nf/nextflow.log

N E X T F L O W  ~  version 23.04.1
Launching `pipelines/gem_reconstruction.nf` [astonishing_lamarck] DSL2 - revision: 6d120c797b
[-        ] process > ReconstructGEM -
/home/robaina/Documents/NewAtlantis/microcom/tests/data/genomes/TARA_ARC_108_MAG_00080.fasta
/home/robaina/Documents/NewAtlantis/microcom/tests/data/genomes/TARA_ARC_108_MAG_00083.fasta
/home/robaina/Documents/NewAtlantis/microcom/tests/data/genomes/TARA_ARC_108_MAG_00174.fasta
/home/robaina/Documents/NewAtlantis/microcom/tests/data/genomes/TARA_ARC_108_MAG_00201.fasta

executor >  local (4)
[88/d3260e] process > ReconstructGEM (2) [  0%] 0 of 4
/home/robaina/Documents/NewAtlantis/microcom/tests/data/genomes/TARA_ARC_108_MAG_00080.fasta
/home/robaina/Documents/NewAtlantis/microcom/tests/data/genomes/TARA_ARC_108_MAG_00083.fasta
/home/robaina/Documents/NewAtlantis/microcom/tests/data/genomes/TARA_ARC_108_MAG_00174.fasta
/home/robaina/Documents/NewAtlantis/microcom/tests/data/genomes/TARA_ARC_108_MAG_00201.fasta

executor

## Community modeling workflow

In [8]:
%%bash

HOMEDIR="/home/robaina/Documents/NewAtlantis/microcom"

nextflow run pipelines/community_modeling.nf \
  --gems_dir ${HOMEDIR}/test_nf/gems \
  --abundances ${HOMEDIR}/tests/data/abundances.tsv \
  --sample_id "test_nf" \
  --media_file ${HOMEDIR}/tests/results/marine_media.tsv \
  --outdir ${HOMEDIR}/test_nf \
  --growth_tradeoff 0.5 \
  --abundance_cutoff 0.01 \
  --threads 12 \
  --solver "gurobi" \
  --exchanges true \
  --elasticities false \
  --scripts_dir ${HOMEDIR}/src \
  -work-dir ${HOMEDIR}/test_nf/work \
  --log ${HOMEDIR}/test_nf/nextflow.log

N E X T F L O W  ~  version 23.04.1
Launching `pipelines/community_modeling.nf` [angry_bernard] DSL2 - revision: 60d3e4f5c2
[-        ] process > BuildTaxaTable    -
[-        ] process > BuildCommunityGEM -
[-        ] process > GetExchanges      -

[-        ] process > BuildTaxaTable    [  0%] 0 of 1
[-        ] process > BuildCommunityGEM -
[-        ] process > GetExchanges      -

executor >  local (1)
[31/414805] process > BuildTaxaTable (1) [  0%] 0 of 1
[-        ] process > BuildCommunityGEM  -
[-        ] process > GetExchanges       -

executor >  local (2)
[31/414805] process > BuildTaxaTable (1)    [100%] 1 of 1 ✔
[24/b9facc] process > BuildCommunityGEM (1) [  0%] 0 of 1
[-        ] process > GetExchanges          -

executor >  local (3)
[31/414805] process > BuildTaxaTable (1)    [100%] 1 of 1 ✔
[24/b9facc] process > BuildCommunityGEM (1) [100%] 1 of 1 ✔
[64/7fe570] process > GetExchanges (1)      [  0%] 0 of 1

executor >  local (3)
[31/414805] process > BuildTaxaTable

### Alternatively, use  config file

```yaml
//workflow.config
    params {
        gems_dir = "/path/to/genomes"
        abundances = "/path/to/abundances.tsv"
        sample_id = "YourSampleID"
        media_file = "/path/to/media_file.tsv"
        outdir = "/path/to/output/directory"
        out_taxatable taxa_table.tsv \
        out_exchanges exchanges.tsv \
        out_elasticities elasticities.tsv \
        growth_tradeoff 0.5 \
        abundance_cutoff 0.01 \
        threads 10 \
        solver gurobi \
        exchanges true \
        elasticities false}
```

```bash
nextflow run your_script.nf -c nextflow.config
```