**Implementation algorithm**

Prioritization of candidate genes in QTL regions based on associations between traits and biological processes.
Bargsten JW, Nap JP, Sanchez-Perez GF, van Dijk AD.
BMC Plant Biol. 2014 Dec 10;14:330. doi: 10.1186/s12870-014-0330-3.

In [1]:
import qtl2gene
import pandas as pd
search = qtl2gene.SEARCH(
    "http://pbg-ld.candygene-nlesc.surf-hosted.nl:8890/sparql")

## Phenylethanol, Phenylacetaldehyde

GO-terms: GO:0016747, GO:0102387, GO:0018449, GO:0004029, GO:0008957, GO:1990055, GO:0050177, GO:0018814


QTL from: Chromosome 8, in 55068565 - 63267130

Candidate: CT77, CT148 Aromatic amino acid decarboxylase


Define the QTL and compute genes within this interval

In [2]:
interval = search.make_interval(
    "http://localhost:8890/genome/Solanum_lycopersicum/chromosome/8",
    55068565,
    63267130)

#genes for interval
genes = search.interval_genes(interval)
#goterms for genes
goterms = search.genes_goterms(genes["gene_id"].unique())
#compute the numbers 
gonumbers = search.get_go_numbers(goterms, genes)

Number of genes found in QTL

In [3]:
len(genes)
#pd.DataFrame(goterms.groupby(["go_id","go_cat","go_term"])["gene_id"].count())

817

GO terms with boundary on adjusted p-value

In [4]:
p_boundary = 0.1
selection = gonumbers[gonumbers["p_adjusted"]<p_boundary]
pd.DataFrame(selection)[["p_less", "p_greater", "p_adjusted"]]

Unnamed: 0_level_0,p_less,p_greater,p_adjusted
go_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
GO:0006486,0.99982,0.0009026948,0.03997648
GO:0006591,1.0,5.582303e-13,1.730514e-10
GO:0009116,0.99996,0.0006283871,0.03246667
GO:0019752,1.0,1.20035e-07,1.860542e-05
GO:0030163,0.999925,0.0003447568,0.02137492
GO:0034227,0.999986,0.001724078,0.06680802
GO:0044550,0.999999,3.057624e-06,0.0002369659
GO:0046148,1.0,1.664955e-06,0.0001720453


Genes with at least one of these GO terms, sorted descending by number of GO terms

In [5]:
geneSelection = pd.DataFrame(goterms[goterms["go_id"].isin(list(selection.index))].groupby("gene_id")["gene_id"].count())
geneSelection.columns = ["#go-terms"]
geneSelection = geneSelection.sort_values(by="#go-terms", ascending=False)
display(geneSelection)

Unnamed: 0_level_0,#go-terms
gene_id,Unnamed: 1_level_1
Solyc08g066670.2,1
Solyc08g066680.2,1
Solyc08g074690.2,1
Solyc08g074920.1,1
Solyc08g074930.1,1
Solyc08g074940.2,1
Solyc08g075340.2,1
Solyc08g076250.1,1
Solyc08g076620.1,1
Solyc08g076630.1,1


In [6]:
pd.DataFrame(goterms[goterms["go_id"].isin(list(selection.index))]).groupby(["gene_id","go_id"]).sum()[["go_term","go_cat"]]

Unnamed: 0_level_0,Unnamed: 1_level_0,go_term,go_cat
gene_id,go_id,Unnamed: 2_level_1,Unnamed: 3_level_1
Solyc08g066670.2,GO:0006486,protein glycosylation,biological_process
Solyc08g066680.2,GO:0006486,protein glycosylation,biological_process
Solyc08g066690.2,GO:0006486,protein glycosylation,biological_process
Solyc08g066880.2,GO:0009116,nucleoside metabolic process,biological_process
Solyc08g066900.1,GO:0009116,nucleoside metabolic process,biological_process
Solyc08g067100.2,GO:0030163,protein catabolic process,biological_process
Solyc08g067470.2,GO:0044550,secondary metabolite biosynthetic process,biological_process
Solyc08g068280.1,GO:0006591,ornithine metabolic process,biological_process
Solyc08g068600.2,GO:0019752,carboxylic acid metabolic process,biological_process
Solyc08g068610.2,GO:0019752,carboxylic acid metabolic process,biological_process
