# *i*ML1515_GP - Loading and Exploring the GEM-PRO

This notebook guides you through loading the GEM-PRO model for ***i*ML1515** and exploring the contents of it.

Requirements:
- ``ssbio`` - installation instructions [here](http://ssbio.readthedocs.io/en/latest/#installation), documentation [here](http://ssbio.readthedocs.io/en/latest/index.html)

## Loading the GEM-PRO

In [1]:
# Loading the JSON file
# Change the location of the .json file if it is located somewhere else
from ssbio.core.io import load_json
iML1515_GP = load_json('iML1515_GP.json.gz', decompression=True)

In [2]:
## Alternative - loading the pickle file
## Uncomment and use this loading method if the JSON file fails to load
# from ssbio.core.io import load_pickle
# iML1515_GP = load_pickle('iML1515_GP/model/iML1515_GP.pckl')

## Basic information & DataFrames

### Genes with and without structures

In [3]:
# List all genes that have at least one experimental PDB structure
iML1515_GP.genes_with_experimental_structures[:10]

[<GenePro b0870 at 0x7f0334cc75f8>,
 <GenePro b2551 at 0x7f0358653dd8>,
 <GenePro b3500 at 0x7f03343a12e8>,
 <GenePro b0945 at 0x7f03343a1cf8>,
 <GenePro b3916 at 0x7f03343094e0>,
 <GenePro b1260 at 0x7f0334309eb8>,
 <GenePro b1261 at 0x7f0334330390>,
 <GenePro b3426 at 0x7f03342bea90>,
 <GenePro b3770 at 0x7f03342899b0>,
 <GenePro b4054 at 0x7f03342ba438>]

In [4]:
# List all genes that have at least one homology model
iML1515_GP.genes_with_homology_models[:10]

[<GenePro b0870 at 0x7f0334cc75f8>,
 <GenePro b2551 at 0x7f0358653dd8>,
 <GenePro b3368 at 0x7f0334cc7588>,
 <GenePro b2436 at 0x7f0334387828>,
 <GenePro b3500 at 0x7f03343a12e8>,
 <GenePro b0945 at 0x7f03343a1cf8>,
 <GenePro b4467 at 0x7f0334347550>,
 <GenePro b2979 at 0x7f033435e080>,
 <GenePro b4468 at 0x7f033435ea90>,
 <GenePro b3916 at 0x7f03343094e0>]

### Summary DataFrames

In [5]:
# Summarize the sequences mapped to each protein
iML1515_GP.df_representative_sequences.head()

Unnamed: 0_level_0,uniprot,kegg,num_pdbs,pdbs,seq_len,sequence_file
gene,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
b0870,P75823,ecj:JW0854;eco:b0870,1,3WLX,333,P75823.fasta
b2551,P0A825,ecj:JW2535;eco:b2551,3,1EQB;1DFO;3G8M,417,P0A825.fasta
b3368,P0AEA8,ecj:JW3331;eco:b3368,0,,457,P0AEA8.fasta
b2436,P36553,ecj:JW2429;eco:b2436,0,,299,P36553.fasta
b3500,P06715,ecj:JW3467;eco:b3500,4,1GEU;1GER;1GET;1GES,450,P06715.fasta


In [6]:
# Summarize the structures mapped to each protein
iML1515_GP.df_representative_structures.head()

Unnamed: 0_level_0,id,is_experimental,reference_seq_top_coverage,structure_file
gene,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
b0870,3wlx-A,True,99.4,3wlx-A_clean.pdb
b2551,1dfo-A,True,99.8,1dfo-A_clean.pdb
b3368,CYSG_ECOLI-X,False,100.0,CYSG_ECOLI_model1_clean-X_clean.pdb
b2436,HEM6_ECOLI-X,False,100.0,HEM6_ECOLI_model1_clean-X_clean.pdb
b3500,1ges-A,True,98.0,1ges-A_clean.pdb


### Inspecting the content of one gene and its protein

In [7]:
# Looking at the content stored per gene
my_protein = iML1515_GP.genes.b3916.protein
my_protein

<Protein b3916 at 0x7f0334309550>

#### Protein sequences and structures

In [8]:
my_protein.sequences

[<KEGGProp eco:b3916 at 0x7f033431eb38>,
 <UniProtProp P0A796 at 0x7f033431ecc0>]

In [9]:
my_protein.structures

[<PDBProp 1pfk at 0x7f033434cb00>,
 <PDBProp 2pfk at 0x7f033431ecf8>,
 <StructProp K6PF1_ECOLI at 0x7f0334330278>,
 <StructProp E03690 at 0x7f0334330320>]

#### Domain information

In [10]:
my_protein.representative_sequence.domains

['PDP:1ZXXAa', 'PDP:1ZXXAb']

In [11]:
my_protein.representative_structure.domains

['PDP:1ZXXAa', 'PDP:1ZXXAb']

## Viewing 3D structures

In [12]:
# Listing all mapped structures for this protein
iML1515_GP.genes.b3916.protein.structures

[<PDBProp 1pfk at 0x7f033434cb00>,
 <PDBProp 2pfk at 0x7f033431ecf8>,
 <StructProp K6PF1_ECOLI at 0x7f0334330278>,
 <StructProp E03690 at 0x7f0334330320>]

In [13]:
# Displaying the original PDB structure
iML1515_GP.genes.b3916.protein.structures.get_by_id('1pfk').view_structure(recolor=False)

In [14]:
# Displaying the single chain set as the protein's representative structure
iML1515_GP.genes.b3916.protein.representative_structure.view_structure(recolor=False)