One of the main advantages of UniProt is that the proteins are annotated for their function etc. using the [GeneOntology](http://geneontology.org/). BioPython has currently no direct support for working with GO annotations, so we will have to write our own parser for the GO data. This won't be difficult, luckily. Download this file: http://purl.obolibrary.org/obo/go.obo

The GO terms are stored in a pretty straightforward manner:

```
[Term]
id: GO:0000003
name: reproduction
namespace: biological_process
alt_id: GO:0019952
alt_id: GO:0050876
def: "The production of new individuals that contain some portion of genetic material inherited from one or more parent organisms." [GOC:go_curators, GOC:isa_complete, GOC:jl, ISBN:0198506732]
subset: goslim_chembl
subset: goslim_generic
subset: goslim_pir
subset: goslim_plant
subset: gosubset_prok
synonym: "reproductive physiological process" EXACT []
xref: Wikipedia:Reproduction
is_a: GO:0008150 ! biological_process
disjoint_from: GO:0044848 ! biological phase
<emptyline>
```

So we will need to parse this file somehow. Every terms seems to start with `[Term]` and end with an empty line (you never know about the last one, though). 

# Named tuples

A handy way to store a simple record-style data. [Docs](https://docs.python.org/2/library/collections.html#collections.namedtuple) Let us make a named tuple class to store elementary data about a GO term: id, name, namespace, def

In [8]:
import collections
GOTerm=collections.namedtuple("GOTerm","id, name, namespace, definition")
term1=GOTerm("GO:12345","dummyterm","function","this is just a dummy term")
term2=GOTerm("GO:12346","dummyterm2","function","this is yet another dummy term")
print term1
print term2
print term1.id, term2.id

term3=GOTerm._make("GO:12347|dummyterm3|function|this is a third dummy term".split("|"))#_make() can be used to create one from a sequence
print "term3=",term3


GOTerm(id='GO:12345', name='dummyterm', namespace='function', definition='this is just a dummy term')
GOTerm(id='GO:12346', name='dummyterm2', namespace='function', definition='this is yet another dummy term')
GO:12345 GO:12346
term3= GOTerm(id='GO:12347', name='dummyterm3', namespace='function', definition='this is a third dummy term')


# GO ontology as a dictionary of named tuples

The easiest way to represent data like this (not just GO). Make a dictionary which has GO ids as keys, and named tuples as values. Try (Ville exercise).