# *i*ML1515_GP - Updating the GEM-PRO

The purpose of this notebook is to provide a quick pipeline to update a GEM-PRO with the most recent sequence and structure data. This pipeline can also be used to download all sequence and structure data if you are starting with just the GEM-PRO ``.json.gz`` file. Otherwise, make sure that the ``iML1515_GP`` folder is in the same directory as this notebook.

Running this pipeline may take a while, timings are provided in the progress bars for each method below.

For a full tutorial on how a GEM-PRO is actually created, and the details of each method, see [this tutorial notebook](http://ssbio.readthedocs.io/en/latest/notebooks/GEM-PRO%20-%20SBML%20Model%20%28iNJ661%29.html).

Requirements:
- ``ssbio`` - installation instructions [here](http://ssbio.readthedocs.io/en/latest/#installation), documentation [here](http://ssbio.readthedocs.io/en/latest/index.html)

### Imports and loading the model

In [1]:
# Loading the JSON file
# Change the location of the .json file if it is located somewhere else
from ssbio.io import load_json
iML1515_GP = load_json('iML1515_GP.json.gz', decompression=True)

In [2]:
## Alternative - loading the pickle file
## Uncomment and use this loading method if the JSON file fails to load
# from ssbio.core.io import load_pickle
# iML1515_GP = load_pickle('iML1515_GP/model/iML1515_GP.pckl')

In [3]:
# Displaying directories to which information will be downloaded to
# You can change the root_dir if the iML1515_GP folder is located somewhere else
print('Location of the "iML1515_GP" folder:', iML1515_GP.root_dir)

## If the root directory needs to be changed
# iML1515_GP.root_dir = '/path/to/new/root_dir/with/iML1515_GP/folder/in/it'

Location of the "iML1515_GP" folder: .


In [4]:
# Setting logging display settings
import sys
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(sys.stderr)
formatter = logging.Formatter('[%(asctime)s] [%(name)s] %(levelname)s: %(message)s', datefmt="%Y-%m-%d %H:%M")
handler.setFormatter(formatter)
logger.handlers = [handler]

In [5]:
# Printing multiple outputs per cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

### Updating sequence information

In [6]:
# Set all methods to force_rerun=True to re-download information from KEGG and UniProt
iML1515_GP.kegg_mapping_and_metadata(kegg_organism_code='eco', force_rerun=True)
iML1515_GP.uniprot_mapping_and_metadata(model_gene_source='ENSEMBLGENOME_ID', force_rerun=True)
iML1515_GP.set_representative_sequence(force_rerun=True)

[2017-03-31 17:42] [ssbio.pipeline.gempro] INFO: 1513/1515: number of genes mapped to KEGG
[2017-03-31 17:42] [ssbio.pipeline.gempro] INFO: Completed ID mapping --> KEGG. See the "df_kegg_metadata" attribute for a summary dataframe.
[2017-03-31 17:42] [root] INFO: getUserAgent: Begin
[2017-03-31 17:42] [root] INFO: getUserAgent: user_agent: EBI-Sample-Client/ (services.py; Python 3.5.2; Linux) Python-requests/2.12.4
[2017-03-31 17:42] [root] INFO: getUserAgent: End





[2017-03-31 17:53] [ssbio.pipeline.gempro] INFO: 1513/1515: number of genes mapped to UniProt
[2017-03-31 17:53] [ssbio.pipeline.gempro] INFO: Completed ID mapping --> UniProt. See the "df_uniprot_metadata" attribute for a summary dataframe.





[2017-03-31 17:53] [ssbio.pipeline.gempro] INFO: 1515/1515: number of genes with a representative sequence
[2017-03-31 17:53] [ssbio.pipeline.gempro] INFO: See the "df_representative_sequences" attribute for a summary dataframe.





### Updating structure information

In [7]:
# Set all methods to force_rerun=True to re-download information from the PDB
iML1515_GP.map_uniprot_to_pdb(force_rerun=True)
iML1515_GP.blast_seqs_to_pdb(seq_ident_cutoff=.95, all_genes=False, force_rerun=True)
iML1515_GP.set_representative_structure(force_rerun=True)

[2017-03-31 17:54] [ssbio.pipeline.gempro] INFO: Mapping UniProt IDs --> PDB IDs...
[2017-03-31 17:54] [root] INFO: getUserAgent: Begin
[2017-03-31 17:54] [root] INFO: getUserAgent: user_agent: EBI-Sample-Client/ (services.py; Python 3.5.2; Linux) Python-requests/2.12.4
[2017-03-31 17:54] [root] INFO: getUserAgent: End
[2017-03-31 18:01] [ssbio.pipeline.gempro] INFO: 684/1515: number of genes with at least one experimental structure
[2017-03-31 18:01] [ssbio.pipeline.gempro] INFO: Completed UniProt --> best PDB mapping. See the "df_pdb_ranking" attribute for a summary dataframe.





[2017-03-31 18:39] [ssbio.pipeline.gempro] INFO: Completed sequence --> PDB BLAST. See the "df_pdb_blast" attribute for a summary dataframe.
[2017-03-31 18:39] [ssbio.pipeline.gempro] INFO: 75: number of genes with additional structures added from BLAST





[2017-03-31 18:45] [ssbio.core.protein] ERROR: 5hbo: unable to parse structure file as mmtf. Falling back to mmCIF format.
[2017-03-31 18:57] [ssbio.pipeline.gempro] INFO: 1513/1515: number of genes with a representative structure
[2017-03-31 18:57] [ssbio.pipeline.gempro] INFO: See the "df_representative_structures" attribute for a summary dataframe.





### Saving the updated GEM-PRO

In [None]:
# Both JSON and pickle saving methods are provided
# JSON is human-readable and data can be utilized in other languages
# Pickles are Python specific
iML1515_GP.save_json('iML1515_GP_updated.json', compression=False)
# iML1515_GP.save_pickle(outfile='iML1515_GP/model/iML1515_GP_updated.pckl')