# Obtaining and Processing Network Reconstructions from KBase

KBase (http://kbase.us) is a powerful resource for obtaining genome-scale network reconstructions for microbial genomes. These reconstructions are distributed as SBML files, which must be processed prior to reverse ecology analysis. This notebook describes how to obtain reconstructions from KBase, and how to process them.

## Obtaining and Preparing SBML Files

Briefly, genomes (as fasta files) for your organisms of interest can be pushed from your computer to KBase. Once there, a KBase Narrative (iPython notebook) can be used to build reconstructions for your genomes. To do so, follow the instructions in `Perl/README.md`.


## Processing SBML Files

Reconstructions from KBase require further processing before they are suitable for use in reverse ecology. The function below does a nunmber of things:
1. Reformat gene locus tags
2. Remove biomass, exchange, spontaneous, DNA/RNA biosynthesis reactions and their corresponding genes
3. Import metabolite formulas
4. Check mass- and charge-balancing of reactions in the reconstruction
5. Remove trailing 0s from reaction and metabolite names

The post-processing has a major shortcoming. When KBase detects that one or more subunits of a complex are present, it creates a "full" GPR by adding 'Unknown' genes for the other subunits. CobraPy currently lacks functions to remove the genes. As such, these model should not be used to perform any simulations which rely on GPRs.

As output, the code returns processed SBML files in the 'processedDataDir' folder. Also returns a summary of the model sizes, in the 'summaryStatsDir' folder.

The first chunk of code identifies imports the Python packages necessary for this analysis.

In [1]:
# Import special features for iPython
import sys
sys.path.append('../Python')
import matplotlib
%matplotlib inline

# Import Python modules 
# These custom-written modules should have been included with the package
# distribution. 
import sbmlFunctions as sf
import metadataFunctions as mf

# Define local folder structure for data input and processing.
processedDataDir = 'ProcessedModelFiles'
rawModelDir = 'RawModelFiles'
summaryStatsDir = 'DataSummaries'

Then we call a function which processes each SBML file and preps it for analysis.

In [2]:
sf.processSBMLforRE(rawModelDir, processedDataDir, summaryStatsDir)

Processing model AAA027E14, 1 of 1
All reactions are balanced
