**Implementation algorithm**

Prioritization of candidate genes in QTL regions based on associations between traits and biological processes.
Bargsten JW, Nap JP, Sanchez-Perez GF, van Dijk AD.
BMC Plant Biol. 2014 Dec 10;14:330. doi: 10.1186/s12870-014-0330-3.

In [1]:
import qtl2gene
import pandas as pd
search = qtl2gene.SEARCH(
    "http://pbg-ld.candygene-nlesc.surf-hosted.nl:8890/sparql")

## Terpenoids

GO-terms: GO:0003677, GO:0045893

QTL from: Chromosome 1 : 86142248 - 86467672

Candidate: Terpense synthase

Define the QTL and compute genes within this interval

In [2]:
interval = search.make_interval(
    "http://localhost:8890/genome/Solanum_lycopersicum/chromosome/1",
    86142248,
    86467672)

#genes for interval
genes = search.interval_genes(interval)
#goterms for genes
goterms = search.genes_goterms(genes["gene_id"].unique())
#compute the numbers 
gonumbers = search.get_go_numbers(goterms, genes)

Number of genes found in QTL

In [3]:
len(genes)
#pd.DataFrame(goterms.groupby(["go_id","go_cat","go_term"])["gene_id"].count())

38

GO terms with boundary on adjusted p-value

In [4]:
p_boundary = 0.1
selection = gonumbers[gonumbers["p_adjusted"]<p_boundary]
pd.DataFrame(selection)[["p_less", "p_greater", "p_adjusted"]]

Unnamed: 0_level_0,p_less,p_greater,p_adjusted
go_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
GO:0000226,0.999769,0.022263,0.03628
GO:0006270,0.999792,0.021161,0.035811
GO:0006535,0.999905,0.014526,0.028664
GO:0008360,0.999693,0.02556,0.040166
GO:0009693,0.999956,0.010079,0.02334
GO:0009744,0.999872,0.016743,0.029467
GO:0009835,0.999933,0.012305,0.027071
GO:0009965,0.999966,0.008964,0.021911
GO:0010091,1.0,0.001125,0.00707
GO:0010143,0.999905,0.014526,0.028664


Genes with at least one of these GO terms, sorted descending by number of GO terms

In [5]:
geneSelection = pd.DataFrame(goterms[goterms["go_id"].isin(list(selection.index))].groupby("gene_id")["gene_id"].count())
geneSelection.columns = ["#go-terms"]
geneSelection = geneSelection.sort_values(by="#go-terms", ascending=False)
display(geneSelection)

Unnamed: 0_level_0,#go-terms
gene_id,Unnamed: 1_level_1
Solyc01g095040.2,12
Solyc01g094800.2,5
Solyc01g094700.2,2
Solyc01g094760.2,2
Solyc01g094790.2,2
Solyc01g095080.2,2
Solyc01g094720.2,1
Solyc01g094830.2,1
Solyc01g094880.2,1
Solyc01g094930.2,1


In [6]:
pd.DataFrame(goterms[goterms["go_id"].isin(list(selection.index))]).groupby(["gene_id","go_id"]).sum()[["go_term","go_cat"]]

Unnamed: 0_level_0,Unnamed: 1_level_0,go_term,go_cat
gene_id,go_id,Unnamed: 2_level_1,Unnamed: 3_level_1
Solyc01g094700.2,GO:0010143,cutin biosynthetic process,biological_process
Solyc01g094700.2,GO:0016311,dephosphorylation,biological_process
Solyc01g094720.2,GO:0098656,anion transmembrane transport,biological_process
Solyc01g094760.2,GO:0006270,DNA replication initiation,biological_process
Solyc01g094760.2,GO:0009744,response to sucrose,biological_process
Solyc01g094790.2,GO:0006535,cysteine biosynthetic process from serine,biological_process
Solyc01g094790.2,GO:0019499,cyanide metabolic process,biological_process
Solyc01g094800.2,GO:0010199,organ boundary specification between lateral o...,biological_process
Solyc01g094800.2,GO:0040029,"regulation of gene expression, epigenetic",biological_process
Solyc01g094800.2,GO:0043044,ATP-dependent chromatin remodeling,biological_process
