# Recon3D_GP - Loading and Exploring the GEM-PRO

This notebook guides you through loading the GEM-PRO model for **Recon3D_GP** and exploring the contents of it.

### Requirements:
- ``ssbio`` - installation instructions [here](http://ssbio.readthedocs.io/en/latest/#installation), documentation [here](http://ssbio.readthedocs.io/en/latest/index.html)

### Quick start:

##### Installation
```bash
pip install nglview
pip install ssbio
```

##### Running the notebook

1. Obtain one of these three items:
    1. GitHub repository clone (`git clone https://github.com/SBRG/Recon3D`)
    1. Lite GEM-PRO archive (`Recon3D_GP.tar.gz`)
    1. GEM-PRO model (``Recon3D_GP.json.gz``)
1. If A: just open this notebook and run it.
1. If B: unzip the archive into the directory where this notebook is located.
1. If C: create a folder where this notebook is located, named ``Recon3D_GP/model/`` and place ``Recon3D_GP.json.gz`` in it.
1. Make sure your files are arranged like so:
```
.
├── Recon3D_GP
│   ├── data
│   ├── genes
│   ├── homology_models_raw
│   └── model
│       └── Recon3D_GP.json.gz
├── Recon3D_GP - Loading and Exploring the GEM-PRO.ipynb
└── Recon3D_GP - Updating the GEM-PRO.ipynb
```
1. Run this notebook!

## Loading the GEM-PRO

In [1]:
# Loading the JSON file
# Change the location of the .json file if it is located somewhere else
from ssbio.io import load_json
Recon3D_GP = load_json('./Recon3D_GP/model/Recon3D_GP.json.gz', decompression=True)

In [2]:
# # Alternative - loading the pickle file
# # Uncomment and use this loading method if the JSON file fails to load
# from ssbio.io import load_pickle
# Recon3D_GP = load_pickle('./Recon3D_GP/model/Recon3D_GP.pckl')

## Basic information & DataFrames

### Genes with and without structures

In [3]:
# List all genes that have at least one experimental PDB structure
Recon3D_GP.genes_with_experimental_structures[:10]

[<GenePro 8639__46__1 at 0x7fb45b1896d8>,
 <GenePro 26__46__1 at 0x7fb4990cb470>,
 <GenePro 10993__46__1 at 0x7fb45b0ff438>,
 <GenePro 3945__46__1 at 0x7fb45b0b80b8>,
 <GenePro 3939__46__1 at 0x7fb45b0b8198>,
 <GenePro 4967__46__2 at 0x7fb45b011b00>,
 <GenePro 1738__46__1 at 0x7fb45b011be0>,
 <GenePro 8050__46__1 at 0x7fb45b01ba58>,
 <GenePro 4967__46__1 at 0x7fb45af725c0>,
 <GenePro 127__46__1 at 0x7fb45af8f0f0>]

In [4]:
# List all genes that have at least one homology model
Recon3D_GP.genes_with_homology_models[:10]

[<GenePro 1591__46__1 at 0x7fb45b0ee278>,
 <GenePro 1594__46__1 at 0x7fb45b0ee908>,
 <GenePro 10993__46__1 at 0x7fb45b0ff438>,
 <GenePro 89874__46__1 at 0x7fb45b118c18>,
 <GenePro 92483__46__1 at 0x7fb45b118d30>,
 <GenePro 3945__46__1 at 0x7fb45b0b80b8>,
 <GenePro 3939__46__1 at 0x7fb45b0b8198>,
 <GenePro 160287__46__1 at 0x7fb45b0cb908>,
 <GenePro 3948__46__1 at 0x7fb45b044ba8>,
 <GenePro 9123__46__1 at 0x7fb45b053668>]

### Summary DataFrames

In [5]:
# Summarize each protein
Recon3D_GP.df_proteins.head()

Unnamed: 0_level_0,id,sequences,num_sequences,representative_sequence,repseq_gene_name,repseq_uniprot,repseq_description,num_structures,experimental_structures,num_experimental_structures,homology_models,num_homology_models,representative_structure,representative_chain,representative_chain_seq_coverage,repstruct_resolution,num_sequence_alignments,num_structure_alignments
gene,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
0,0,[],0,,,,,0,[],0,[],0,,,,,0,0
10000__46__1,10000__46__1,[Q9Y243-1],1,Q9Y243-1,AKT3,Q9Y243-1,RAC-gamma serine/threonine-protein kinase,3,[2x18],1,[H03080],1,REP-H03080,X,1.0,,9,0
10005__46__1,10005__46__1,[O14734-1],1,O14734-1,ACOT8,O14734-1,Acyl-coenzyme A thioesterase 8,3,[],0,"[H28002, ENSP00000217455]",2,REP-H28002,X,1.0,,1,0
10005__46__2,10005__46__2,[],0,,,,,0,[],0,[],0,,,,,0,0
10005__46__3,10005__46__3,[],0,,,,,0,[],0,[],0,,,,,0,0


In [6]:
# Summarize the sequences mapped to each protein
Recon3D_GP.df_representative_sequences.head()

Unnamed: 0_level_0,uniprot,kegg,pdbs,sequence_file,metadata_file
gene,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
10000__46__1,Q9Y243-1,hsa:10000,2X18,Q9Y243-1.fasta,Q9Y243-1.xml
10005__46__1,O14734-1,hsa:10005,,O14734-1.fasta,O14734-1.xml
10007__46__1,P46926-1,hsa:10007,1NE7,P46926-1.fasta,P46926-1.xml
100137049__46__1,P0C869-1,hsa:100137049;hsa:8681,,P0C869-1.fasta,P0C869-1.xml
10020__46__1,Q9Y223-1,hsa:10020,2YHW;2YHY;2YI1;3EO3;4ZHT,Q9Y223-1.fasta,Q9Y223-1.xml


In [7]:
# Summarize the structures mapped to each protein
Recon3D_GP.df_representative_structures.head()

Unnamed: 0_level_0,id,is_experimental,file_type,structure_file
gene,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
10000__46__1,REP-H03080,False,pdb,NP_005456.1_model1_clean-X_clean.pdb
10005__46__1,REP-H28002,False,pdb,NP_005460.2_model1_clean-X_clean.pdb
10007__46__1,REP-1ne7,True,pdb,1ne7-A_clean.pdb
10026__46__1,REP-H01243,False,pdb,NP_005473.1_model1_clean-X_clean.pdb
10038__46__1,REP-H19990,False,pdb,NP_005475.2_model1_clean-X_clean.pdb


### Inspecting the content of one gene and its protein

In [8]:
# Looking at the content stored per gene
my_protein = Recon3D_GP.genes.get_by_id('100__46__1').protein
my_protein

<Protein 100__46__1 at 0x7fb459da0358>

#### Protein sequences and structures

In [9]:
my_protein.sequences

[UniProtProp(seq=Seq('MAQTPAFDKPKVELHVHLDGSIKPETILYYGRRRGIALPANTAEGLLNVIGMDK...QNL', SingleLetterAlphabet()), id='P00813-1', name='sp|P00813|ADA_HUMAN', description='Adenosine deaminase', dbxrefs=[])]

In [10]:
my_protein.structures

[<StructProp H27942 at 0x7fb459dc1748>,
 <StructProp ENSP00000361965 at 0x7fb459dc17b8>,
 <PDBProp 3iar at 0x7fb459dada20>,
 <PDBProp 2bgn at 0x7fb459dc1780>,
 <PDBProp 1w1i at 0x7fb459dc19b0>,
 <PDBProp 1qxl at 0x7fb459dc1c88>,
 <PDBProp 1krm at 0x7fb459dc1fd0>,
 <PDBProp 2z7g at 0x7fb459dca160>,
 <PDBProp 2e1w at 0x7fb459dca278>,
 <PDBProp 1wxz at 0x7fb459dca390>,
 <PDBProp 1wxy at 0x7fb459dca4a8>,
 <PDBProp 1vfl at 0x7fb459dca5c0>,
 <PDBProp 1v7a at 0x7fb459dca6d8>,
 <PDBProp 1v79 at 0x7fb459dca7f0>,
 <PDBProp 1uml at 0x7fb459dca908>,
 <PDBProp 1o5r at 0x7fb459dcaa20>,
 <PDBProp 1ndz at 0x7fb459dcab38>,
 <PDBProp 1ndy at 0x7fb459dcac50>,
 <PDBProp 1ndw at 0x7fb459dcad68>,
 <PDBProp 1ndv at 0x7fb459dcae80>,
 <PDBProp 2ada at 0x7fb459dcaf98>,
 <PDBProp 3km8 at 0x7fb459dd80f0>,
 <StructProp REP-3iar at 0x7fb459dd8588>]

## Viewing 3D structures

In [11]:
# Downloading and displaying a structure
my_protein.structures.get_by_id('2bgn').download_structure_file(outdir=my_protein.structure_dir, file_type='mmtf')
my_protein.structures.get_by_id('2bgn').view_structure(recolor=False)

A Jupyter Widget

In [12]:
# Displaying the protein's representative structure (single chain)
my_protein.representative_structure.view_structure(recolor=False)

A Jupyter Widget