# Auxotrophies and Interactomes in Mixed Culture of Freshwater Bacteria

## Project Overview

This notebook describes reverse ecological analysis of a mixed culture from Lake Grosse Fuchskuhle. For additional details, please see:

Garcia, S. L., Buck, M., McMahon, K. D., Grossart, H.-P., Eiler, A., & Warnecke, F. (2015). Auxotrophy and intra-population complementary in the “interactome” of a cultivated freshwater model community. Molecular Ecology, 24(17), 4449–4459. http://doi.org/10.1111/mec.13319

## Background and Theory

For an overview of the theory underlying this pipeline, please the notebook `masterNotebook` in the `master` branch of this repo. In the interest of brevity, this notebook focuses on carrying out the workflow, not the theory.

## Obtaining and Processing Metabolic Network Reconstructions from KBase

Reverse ecology analysis relies on metabolic network reconstructions for an organism of interest. KBase (http://kbase.us) is a powerful resource for obtaining genome-scale network reconstructions for microbial genomes. These reconstructions are distributed as SBML files, which must be processed prior to reverse ecology analysis. This section describes how to obtain reconstructions from KBase, and how to process them.

### Obtaining and Preparing SBML Files

Genomes (as fasta files) for your organisms of interest can be pushed from your computer to KBase. Once there, a KBase Narrative (iPython notebook) can be used to build reconstructions for your genomes. 

The script `loadGenomes` pushes these genomes to the KBase narrative created for this project.
* KBase Narrative: [Model Building for FNE-F8 Mixed Culture MAGs](https://narrative.kbase.us/narrative/ws.12305.obj.1)
* Workspace Browser: [joshamilton:1452727482251](https://narrative.kbase.us/functional-site/#/ws/objects/joshamilton:1452995633835)

I then ran the "Annotate Contigs" and "Build Metabolic Model" KBase apps for each genome. I downloaded the annotated genomes (Genbank format) and models (SBML and tsv formats). Files are stored in:
* `refGenomes` - genomes and annotations in a variety for formats
* `sbml` - metabolic models in SBML and 'tsv' formats. Pre- and post-processed versions (see below).

Once the genomes were downloaded, I further converted the Genbank-formatted genomes to fasta nucleotide (ffn), fasta amino acid (ffa), and gff format. Using the following scripts:
* `concatGbk.sh` - concatenate gbk files for each contig into a single file for each genome
* `kBaseGenbankToFasta.py` - convert gbk to ffn and ffa format
* `kBaseGenbankToGff.py` - convert gbk to gff format
* `cleanUpGFF.pl` - remove extra comments from gff files

## Processing SBML Files

Reconstructions from KBase require further processing before they are suitable for use in reverse ecology. The function below does a number of things:

1. Reformat gene locus tags
2. Remove biomass, exchange, spontaneous, DNA/RNA biosynthesis reactions and their corresponding genes
3. Import metabolite formulas
4. Check mass- and charge-balancing of reactions in the reconstruction
5. Remove trailing 0s from reaction and metabolite names

The post-processing has a major shortcoming. When KBase detects that one or more subunits of a complex are present, it creates a "full" GPR by adding 'Unknown' genes for the other subunits. CobraPy currently lacks functions to remove the genes. As such, these model should not be used to perform any simulations which rely on GPRs.

As output, the code returns processed SBML files in the `sbml/processed` folder. Also returns a summary of the model sizes, in the `data` folder.

The first chunk of code identifies imports the Python packages necessary for this analysis.

In [1]:
# Import special features for iPython
import sys
sys.path.append('../Python')
import matplotlib

# Import Python modules 
# These custom-written modules should have been included with the package
# distribution. 
import sbmlFunctions as sf
import metadataFunctions as mf

# Define local folder structure for data input and processing.
rawModelDir = 'sbml/raw'
processedDataDir = 'sbml/processed'
summaryStatsDir = 'data'

Then we call a function which processes each SBML file and preps it for analysis.

In [2]:
sf.processSBMLforRE(rawModelDir, processedDataDir, summaryStatsDir)

Processing model acIB2-FNE-F8, 1 of 4
Re-balancing on the basis of protons
Re-balancing on the basis of protons
Re-balancing on the basis of protons
Re-balancing on the basis of protons
All reactions are balanced
Processing model acIII-FNE-F8, 2 of 4
Manually correcting an imbalance
Re-balancing on the basis of protons
Re-balancing on the basis of protons
Re-balancing on the basis of protons
Re-balancing on the basis of protons
All reactions are balanced
Processing model bacIII-FNE-F8, 3 of 4
Manually correcting an imbalance
Re-balancing on the basis of protons
Re-balancing on the basis of protons
Re-balancing on the basis of protons
Re-balancing on the basis of protons
Re-balancing on the basis of protons
All reactions are balanced
Processing model PnecC-FNE-F8, 4 of 4
Manually correcting an imbalance
Manually correcting an imbalance
Re-balancing on the basis of protons
All reactions are balanced
