**Implementation algorithm**

Prioritization of candidate genes in QTL regions based on associations between traits and biological processes.
Bargsten JW, Nap JP, Sanchez-Perez GF, van Dijk AD.
BMC Plant Biol. 2014 Dec 10;14:330. doi: 10.1186/s12870-014-0330-3.

In [1]:
import qtl2gene
import pandas as pd
search = qtl2gene.SEARCH(
    "http://pbg-ld.candygene-nlesc.surf-hosted.nl:8890/sparql")

## Volatiles compounds, branched-chain amino acid, 3-methylbuthanal 3-methylbuthanol

GO-terms: GO:0046568, GO:0018455, GO:0052676

QTL from: Chromosome 3 : 69685329 - 71362039

Candidate: ???

Define the QTL and compute genes within this interval

In [2]:
interval = search.make_interval(
    "http://localhost:8890/genome/Solanum_lycopersicum/chromosome/3", 
    69685329,71362039)

#genes for interval
genes = search.interval_genes(interval)
#goterms for genes
goterms = search.genes_goterms(genes["gene_id"].unique())
#compute the numbers 
gonumbers = search.get_go_numbers(goterms, genes)

Number of genes found in QTL

In [3]:
len(genes)
#pd.DataFrame(goterms.groupby(["go_id","go_cat","go_term"])["gene_id"].count())

148

GO terms with boundary on adjusted p-value

In [4]:
p_boundary = 0.1
selection = gonumbers[gonumbers["p_adjusted"]<p_boundary]
pd.DataFrame(selection)[["p_less", "p_greater", "p_adjusted"]]

Unnamed: 0_level_0,p_less,p_greater,p_adjusted
go_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
GO:0000209,0.998923,0.010311,0.098639
GO:0007338,1.0,0.004381,0.088981
GO:0009767,0.999863,0.00454,0.088981
GO:0010014,0.999943,0.013085,0.098639
GO:0010603,0.999943,0.013085,0.098639
GO:0018008,0.999981,0.008742,0.098639
GO:0030433,0.999723,0.007161,0.098639
GO:0031627,0.999911,0.003439,0.088981
GO:0036092,0.999943,0.013085,0.098639
GO:0045041,0.999943,0.013085,0.098639


Genes with at least one of these GO terms, sorted descending by number of GO terms

In [5]:
geneSelection = pd.DataFrame(goterms[goterms["go_id"].isin(list(selection.index))].groupby("gene_id")["gene_id"].count())
geneSelection.columns = ["#go-terms"]
geneSelection = geneSelection.sort_values(by="#go-terms", ascending=False)
display(geneSelection)

Unnamed: 0_level_0,#go-terms
gene_id,Unnamed: 1_level_1
Solyc03g122370.2,2
Solyc03g123660.2,2
Solyc03g123880.2,2
Solyc03g121640.2,1
Solyc03g121690.1,1
Solyc03g121830.1,1
Solyc03g121840.2,1
Solyc03g121950.2,1
Solyc03g121990.2,1
Solyc03g122000.2,1


In [6]:
pd.DataFrame(goterms[goterms["go_id"].isin(list(selection.index))]).groupby(["gene_id","go_id"]).sum()[["go_term","go_cat"]]

Unnamed: 0_level_0,Unnamed: 1_level_0,go_term,go_cat
gene_id,go_id,Unnamed: 2_level_1,Unnamed: 3_level_1
Solyc03g121640.2,GO:0045041,protein import into mitochondrial intermembran...,biological_process
Solyc03g121690.1,GO:0000209,protein polyubiquitination,biological_process
Solyc03g121830.1,GO:0018008,N-terminal peptidyl-glycine N-myristoylation,biological_process
Solyc03g121840.2,GO:0031627,telomeric loop formation,biological_process
Solyc03g121950.2,GO:0010603,regulation of cytoplasmic mRNA processing body...,biological_process
Solyc03g121990.2,GO:0031627,telomeric loop formation,biological_process
Solyc03g122000.2,GO:0009767,photosynthetic electron transport chain,biological_process
Solyc03g122370.2,GO:0010014,meristem initiation,biological_process
Solyc03g122370.2,GO:0071528,tRNA re-export from nucleus,biological_process
Solyc03g123400.1,GO:2000032,regulation of secondary shoot formation,biological_process
