## Querying GO Terms programatically with GOATools
Web-based tools such as GOrilla are convenient to use for small numbers of queries. However, you may often perform a more complex analysis with dozens of clusters instead of just 4, as we have here. In this case, it helps to perform the GO queries programatically with Python. This can be done with the goatools python library. 


In [None]:
#Download ontologies 
from goatools.base import download_go_basic_obo
obo_fname=download_go_basic_obo()

In [None]:
#Download associations of genes and GO terms 
from goatools.base import download_ncbi_associations
gene2go = download_ncbi_associations()

In [None]:
# Load ontologies 
from goatools.obo_parser import GODag
obodag = GODag("go-basic.obo")


In [None]:
#Load associations 
from __future__ import print_function
from goatools.associations import read_ncbi_gene2go

geneid2gos_human = read_ncbi_gene2go("gene2go", taxids=[9606])

print("{N:,} annotated human genes".format(N=len(geneid2gos_human)))

In [None]:
from goatools.test_data.genes_NCBI_9606_ProteinCoding import GeneID2nt as GeneID2nt_human

In [None]:
from goatools.go_enrichment import GOEnrichmentStudy

In [None]:
goeaobj = GOEnrichmentStudy(
        GeneID2nt_human.keys(), # List of mouse protein-coding genes
        geneid2gos_human, # geneid/GO associations
        obodag, # Ontologies
        propagate_counts = False,
        alpha = 0.05, # default significance cut-off
        methods = ['fdr_bh']) # defult multipletest correction method

In [None]:
entrez_gene_ids=open("entrez_gene_ids.txt",'r').read().strip().split('\n')
entrez_dict=dict() 
for line in entrez_gene_ids: 
    tokens=line.split('\t')
    gene_id=tokens[1] 
    gene_name=tokens[2]
    entrez_dict[gene_name]=int(gene_id)

In [None]:
#Let's analyze cluster 0 
cluster_0_names=open('0.txt').read().strip().split('\n')
cluster_0_ids=[entrez_dict[n] for n in cluster_0_names if n in entrez_dict]

In [None]:
# 'p_' means "pvalue". 'fdr_bh' is the multipletest method we are currently using.
geneids_study = cluster_0_ids
goea_results_all = goeaobj.run_study(geneids_study)
goea_results_sig = [r for r in goea_results_all if r.p_fdr_bh < 0.05]

In [None]:
#Write the results to a text file 
goeaobj.wr_txt("go_terms.0.txt", goea_results_sig)

In [None]:
!head -n10 go_terms.0.txt