# *i*ML1515_GP - Loading and Exploring the GEM-PRO

This notebook guides you through loading the GEM-PRO model for ***i*ML1515** and exploring the contents of it.

Requirements:
- ``ssbio`` - installation instructions [here](http://ssbio.readthedocs.io/en/latest/#installation), documentation [here](http://ssbio.readthedocs.io/en/latest/index.html)

## Loading the GEM-PRO

In [1]:
# Loading the JSON file
# Change the location of the .json file if it is located somewhere else
from ssbio.io import load_json
iML1515_GP = load_json('iML1515_GP/model/iML1515_GP.json', decompression=False)

In [2]:
# Alternative - loading the pickle file
# Uncomment and use this loading method if the JSON file fails to load
# from ssbio.io import load_pickle
# iML1515_GP = load_pickle('iML1515_GP/model/iML1515_GP.pckl')

## Basic information & DataFrames

### Genes with and without structures

In [3]:
# List all genes that have at least one experimental PDB structure
iML1515_GP.genes_with_experimental_structures[:10]

[<GenePro b0870 at 0x7fe43eec8910>,
 <GenePro b2551 at 0x7fe44dc42890>,
 <GenePro b3500 at 0x7fe43ee30fd0>,
 <GenePro b0945 at 0x7fe43ee3eb50>,
 <GenePro b3916 at 0x7fe43ee07c10>,
 <GenePro b1260 at 0x7fe43ee51b90>,
 <GenePro b1261 at 0x7fe43eda65d0>,
 <GenePro b3426 at 0x7fe43edb8e50>,
 <GenePro b3770 at 0x7fe43ed81510>,
 <GenePro b4054 at 0x7fe43ed92390>]

In [4]:
# List all genes that have at least one homology model
iML1515_GP.genes_with_homology_models[:10]

[<GenePro b0870 at 0x7fe43eec8910>,
 <GenePro b2551 at 0x7fe44dc42890>,
 <GenePro b3368 at 0x7fe43eec8990>,
 <GenePro b2436 at 0x7fe43ef15f90>,
 <GenePro b3500 at 0x7fe43ee30fd0>,
 <GenePro b0945 at 0x7fe43ee3eb50>,
 <GenePro b4467 at 0x7fe43ee58190>,
 <GenePro b2979 at 0x7fe43eded450>,
 <GenePro b4468 at 0x7fe43edf4f90>,
 <GenePro b3916 at 0x7fe43ee07c10>]

### Summary DataFrames

In [5]:
# Summarize the sequences mapped to each protein
iML1515_GP.df_representative_sequences.head()

Unnamed: 0_level_0,uniprot,kegg,pdbs,sequence_file,metadata_file
gene,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
b0002,P00561,ecj:JW0001;eco:b0002,6MX1,P00561.fasta,P00561.xml
b0003,P00547,ecj:JW0002;eco:b0003,,P00547.fasta,P00547.xml
b0004,P00934,ecj:JW0003;eco:b0004,1VB3,P00934.fasta,P00934.xml
b0007,P30143,ecj:JW0006;eco:b0007,,P30143.fasta,P30143.xml
b0008,P0A870,ecj:JW0007;eco:b0008,1I2N;1I2O;1I2P;1I2Q;1I2R;1ONR;1UCW;3KOF;4RZ5;4...,P0A870.fasta,P0A870.xml


In [6]:
# Summarize the structures mapped to each protein
iML1515_GP.df_representative_structures.head()

Unnamed: 0_level_0,id,is_experimental,file_type,structure_file
gene,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
b0002,REP-AK1H_ECOLI,False,pdb,AK1H_ECOLI_model1_clean-X_clean.pdb
b0003,REP-KHSE_ECOLI,False,pdb,KHSE_ECOLI_model1_clean-X_clean.pdb
b0004,REP-1vb3,True,pdb,1vb3-A_clean.pdb
b0007,REP-YAAJ_ECOLI,False,pdb,YAAJ_ECOLI_model1_clean-X_clean.pdb
b0008,REP-4s2b,True,pdb,4s2b-A_clean.pdb


### Inspecting the content of one gene and its protein

In [7]:
# Looking at the content stored per gene
my_protein = iML1515_GP.genes.b3916.protein
my_protein

<Protein b3916 at 0x7fe43ee07c90>

#### Protein sequences and structures

In [8]:
my_protein.sequences

[KEGGProp(seq=Seq('MIKKIGVLTSGGDAPGMNAAIRGVVRSALTEGLEVMGIYDGYLGLYEDRMVQLD...KLY', SingleLetterAlphabet()), id='eco:b3916', name='eco:b3916', description='eco:b3916 K00850 6-phosphofructokinase 1 [EC:2.7.1.11] | (RefSeq) pfkA; 6-phosphofructokinase I (A)', dbxrefs=[]),
 UniProtProp(seq=Seq('MIKKIGVLTSGGDAPGMNAAIRGVVRSALTEGLEVMGIYDGYLGLYEDRMVQLD...KLY', SingleLetterAlphabet()), id='P0A796', name='sp|P0A796|PFKA_ECOLI', description='ATP-dependent 6-phosphofructokinase isozyme 1', dbxrefs=[])]

In [9]:
my_protein.structures

[<PDBProp 1pfk at 0x7fe43ede7990>,
 <PDBProp 2pfk at 0x7fe43eda38d0>,
 <StructProp K6PF1_ECOLI at 0x7fe43ee14fd0>,
 <StructProp E03690 at 0x7fe43eda3f50>,
 <StructProp REP-1pfk at 0x7fe43eda6090>]

#### Domain information

In [10]:
my_protein.representative_sequence.domains

['PDP:1ZXXAa', 'PDP:1ZXXAb']

In [11]:
my_protein.representative_structure.domains

['PDP:1ZXXAa', 'PDP:1ZXXAb']

## Viewing 3D structures

In [12]:
# Listing all mapped structures for this protein
iML1515_GP.genes.b3916.protein.structures

[<PDBProp 1pfk at 0x7fe43ede7990>,
 <PDBProp 2pfk at 0x7fe43eda38d0>,
 <StructProp K6PF1_ECOLI at 0x7fe43ee14fd0>,
 <StructProp E03690 at 0x7fe43eda3f50>,
 <StructProp REP-1pfk at 0x7fe43eda6090>]

In [13]:
# Displaying the original PDB structure
iML1515_GP.genes.b3916.protein.structures.get_by_id('1pfk').view_structure(recolor=False)

A Jupyter Widget

A Jupyter Widget

In [14]:
# Displaying the single chain set as the protein's representative structure
iML1515_GP.genes.b3916.protein.representative_structure.view_structure(recolor=False)

A Jupyter Widget