## Create unigo tree

To create an unigo tree, you need a collection of protein object, respecting the ProteinObject pydantic model below

In [1]:
import sys
sys.path.append('../')

In [2]:
from pydantic import BaseModel
from typing import List

class GODatum(BaseModel):
    id: str
    evidence: str
    term: str

class ProteinObject(BaseModel):
    id: str
    go: List[GODatum]

### Load example
Load the example in data/proteins.json and create an iterator on objects 

In [3]:
import json
proteins = json.load(open('data/proteins.json'))

In [4]:
def get_iterator(json_models):
    for p in json_models:
        yield ProteinObject.parse_obj(p)

In [5]:
protein_iterator = get_iterator(proteins)

### Create the tree

You can create a tree for each go namespace : biological process, molecular function and cellular component. You will need a owl ontology file to set ontology first. 

In [6]:
import unigo



DVL PYPROTEINS_EXT PKG 2022-09-06 11:05:33.208708


In [7]:
ontology = "data/go_2207.owl" 

In [8]:
unigo.setOntology(ontology)
tree = unigo.tree.createGoTree('cellular component', protein_iterator)

Blueprint xpGoTree cellular component extracted
Applying true path collapsing
178 GO terms, 338 children_links, 316 leaves, 49 proteins (0 discarded)
xpGoTree cellular component filtered for supplied uniprot entries


### Exploring tree 

You can get a node by GO term id or name 

In [9]:
tree.getByID('GO:0005634')

{'ID': 'GO:0005634', 'name': 'nucleus', 'eTag': ['Q08881', 'Q8VDQ8', 'Q8NF99', 'P55263', 'P63015', 'P49591', 'O15078', 'Q8BGU5', 'Q6XPS3', 'Q12906', 'P63015', 'P22888'], 'leafCount': 0, 'features': {}, 'oNode': obo.GO_0005634, 'isDAGelem': True, 'is_a': [], 'children': ['nuclear lumen']}

In [10]:
tree.getByName('nucleus')

{'ID': 'GO:0005634', 'name': 'nucleus', 'eTag': ['Q08881', 'Q8VDQ8', 'Q8NF99', 'P55263', 'P63015', 'P49591', 'O15078', 'Q8BGU5', 'Q6XPS3', 'Q12906', 'P63015', 'P22888'], 'leafCount': 0, 'features': {}, 'oNode': obo.GO_0005634, 'isDAGelem': True, 'is_a': [], 'children': ['nuclear lumen']}

You can get all tree proteins

In [11]:
tree.getMembers()

['P58107',
 'Q15109',
 'O55225',
 'Q8NF99',
 'Q7Z7G2',
 'Q8K426',
 'Q8CBB9',
 'P49591',
 'P29387',
 'P35174',
 'P18089',
 'Q9P2K2',
 'P20273',
 'P63015',
 'Q6XPS3',
 'A6NNS2',
 'Q6UWK7',
 'Q925Q5',
 'Q9Y2A7',
 'P55263',
 'C9JE06',
 'Q9H2M9',
 'P10321',
 'Q9HCS2',
 'O15031',
 'Q64281',
 'Q61609',
 'Q3UH66',
 'A0A3Q4EGR4',
 'O15078',
 'Q8BXA7',
 'Q30KP6',
 'P58659',
 'Q8NBJ4',
 'Q8VDQ8',
 'Q8WZA2',
 'P14142',
 'P41233',
 'Q12906',
 'P04196',
 'Q6UX15',
 'O35874',
 'Q08881',
 'Q8BGU5',
 'P24666',
 'Q6J9G0',
 'Q96LA5',
 'Q9H400',
 'P22888']

You can get all child proteins of a node

In [29]:
a = tree.getMembersByID('GO:0005634')

You can get all child proteins of a node with more detailed informations about why it has been classified as child. 
Returned variable is a dictionnary with proteins as keys, and list of go terms that are child of your parent term and so that are be used to classify the protein as child of parent node

In [28]:
tree.getDetailedMembersFromParentID('GO:0005634')

{'Q08881': [('GO:0005634', 'nucleus')],
 'Q8VDQ8': [('GO:0005634', 'nucleus'), ('GO:0005730', 'nucleolus')],
 'Q8NF99': [('GO:0005634', 'nucleus'), ('GO:0005730', 'nucleolus')],
 'P55263': [('GO:0005634', 'nucleus'), ('GO:0005654', 'nucleoplasm')],
 'P63015': [('GO:0005634', 'nucleus'),
  ('GO:0005634', 'nucleus'),
  ('GO:0005654', 'nucleoplasm'),
  ('GO:0005654', 'nucleoplasm')],
 'P49591': [('GO:0005634', 'nucleus')],
 'O15078': [('GO:0005634', 'nucleus')],
 'Q8BGU5': [('GO:0005634', 'nucleus')],
 'Q6XPS3': [('GO:0005634', 'nucleus')],
 'Q12906': [('GO:0005634', 'nucleus'),
  ('GO:0005730', 'nucleolus'),
  ('GO:0005654', 'nucleoplasm')],
 'P22888': [('GO:0005634', 'nucleus')],
 'Q8CBB9': [('GO:0001650', 'fibrillar center')],
 'Q15109': [('GO:0001650', 'fibrillar center')],
 'P35174': [('GO:0005654', 'nucleoplasm')],
 'Q8BXA7': [('GO:0005654', 'nucleoplasm')],
 'C9JE06': [('GO:0005654', 'nucleoplasm')],
 'P58107': [('GO:0097356', 'perinucleolar compartment')]}