## Create unigo tree

To create an unigo tree, you need a collection of protein object, respecting the ProteinObject pydantic model below

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from pydantic import BaseModel
from typing import List

class GODatum(BaseModel):
    id: str
    evidence: str
    term: str

class ProteinObject(BaseModel):
    id: str
    go: List[GODatum]

### Load example
Load the example in data/proteins.json and create an iterator on objects 

In [2]:
import json
proteins = json.load(open('data/proteins.json'))

In [3]:
def get_iterator(json_models):
    for p in json_models:
        yield ProteinObject.parse_obj(p)

In [4]:
protein_iterator = get_iterator(proteins)

### Create the tree

You can create a tree for each go namespace : biological process, molecular function and cellular component. You will need a owl ontology file to set ontology first. 

In [2]:
import unigo



DVL PYPROTEINS_EXT PKG 2022-09-13 16:43:12.560604


In [3]:
ontology_file = "/data1/cecile/PSF/ontology/go_2207.owl" 

In [4]:
%%time
unigo.tree.setOntologyDict(ontology_file)

{}
CPU times: user 21.6 s, sys: 836 ms, total: 22.4 s
Wall time: 22.4 s


In [8]:
%%time
unigo.setOntology(ontology_file)

CPU times: user 15.8 s, sys: 701 ms, total: 16.5 s
Wall time: 16.5 s


In [9]:
%%time
tree = unigo.tree.createGoTree('cellular component', protein_iterator)

Blueprint xpGoTree cellular component extracted
read dag
OOOOOOOO GO:0005575
Applying true path collapsing
178 GO terms, 338 children_links, 316 leaves, 49 proteins (0 discarded)
xpGoTree cellular component filtered for supplied uniprot entries
CPU times: user 114 ms, sys: 501 µs, total: 114 ms
Wall time: 113 ms


### Create the tree from dict

In [10]:
%%time
unigo.tree.setOntologyDict(ontology_file)

{}
CPU times: user 313 ms, sys: 3.55 ms, total: 317 ms
Wall time: 315 ms


In [11]:
%%time
unigo.tree.setOntologyDict(ontology_file)

{}
CPU times: user 300 ms, sys: 4.07 ms, total: 305 ms
Wall time: 303 ms


In [15]:
%%time
tree = unigo.tree.createGoTree('cellular component', protein_iterator, from_dict = True)

Blueprint xpGoTree cellular component extracted
read dag from dict
Applying true path collapsing
178 GO terms, 338 children_links, 316 leaves, 49 proteins (0 discarded)
xpGoTree cellular component filtered for supplied uniprot entries
CPU times: user 34.9 ms, sys: 0 ns, total: 34.9 ms
Wall time: 34.3 ms


### Exploring tree 

You can get a node by GO term id or name 

In [None]:
tree.getByID('GO:0005634')

In [None]:
tree.getByName('nucleus')

You can get all tree proteins

In [None]:
tree.getMembers()

You can get all child proteins of a node

In [None]:
a = tree.getMembersByID('GO:0005634')

You can get all child proteins of a node with more detailed informations about why it has been classified as child. 
Returned variable is a dictionnary with proteins as keys, and list of go terms that are child of your parent term and so that are be used to classify the protein as child of parent node

In [None]:
tree.getDetailedMembersFromParentID('GO:0005634')

In [None]:
import unigo

In [None]:
%%time
unigo.tree.setOntology(ontology_file)

In [None]:
%%time
unigo.tree.setOntologyDict(ontology_file)

In [None]:
tree = unigo.tree.createGoTree('cellular component', protein_iterator)