**Implementation algorithm**

Prioritization of candidate genes in QTL regions based on associations between traits and biological processes.
Bargsten JW, Nap JP, Sanchez-Perez GF, van Dijk AD.
BMC Plant Biol. 2014 Dec 10;14:330. doi: 10.1186/s12870-014-0330-3.

In [1]:
import qtl2gene
import pandas as pd
search = qtl2gene.SEARCH(
    "http://pbg-ld.candygene-nlesc.surf-hosted.nl:8890/sparql")

## Lycopene-Beta-Cyclase Activity

GO-terms: GO:0045436 GP:0016117

QTL from: Solyc06g073470.2 ... Solyc06g083850.2

Candidate: Solyc06g074240.1

Define the QTL and compute genes within this interval

In [2]:
tg1 = "Solyc06g073470.2"
tg2 = "Solyc06g083850.2"

#intervalT = search.compute_interval(tg1, tg2)

interval = search.make_interval(
    "http://localhost:8890/genome/Solanum_lycopersicum/chromosome/6",
    45280179,
    49150528)

#genes for interval
genes = search.interval_genes(interval)
#goterms for genes
goterms = search.genes_goterms(genes["gene_id"].unique())
#compute the numbers 
gonumbers = search.get_go_numbers(goterms, genes)

Number of genes found in QTL

In [3]:
len(genes)
#pd.DataFrame(goterms.groupby(["go_id","go_cat","go_term"])["gene_id"].count())

535

GO terms with boundary on adjusted p-value

In [4]:
p_boundary = 0.1
selection = gonumbers[gonumbers["p_adjusted"]<p_boundary]
pd.DataFrame(selection)[["p_less", "p_greater", "p_adjusted"]]

Unnamed: 0_level_0,p_less,p_greater,p_adjusted
go_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
GO:0006412,0.999324,0.00185,0.057568
GO:0006857,0.999839,0.001768,0.057568
GO:0009251,1.0,1.2e-05,0.003014
GO:0010345,0.999981,0.000593,0.029514
GO:0019748,0.999968,0.000514,0.029514
GO:0030433,0.999995,8.5e-05,0.010552
GO:0031047,0.999977,0.00028,0.023267
GO:0045899,0.999945,0.001262,0.052372


Genes with at least one of these GO terms, sorted descending by number of GO terms

In [5]:
geneSelection = pd.DataFrame(goterms[goterms["go_id"].isin(list(selection.index))].groupby("gene_id")["gene_id"].count())
geneSelection.columns = ["#go-terms"]
geneSelection = geneSelection.sort_values(by="#go-terms", ascending=False)
display(geneSelection)

Unnamed: 0_level_0,#go-terms
gene_id,Unnamed: 1_level_1
Solyc06g083620.2,2
Solyc06g082630.2,2
Solyc06g082660.2,2
Solyc06g073530.1,1
Solyc06g082750.2,1
Solyc06g076780.2,1
Solyc06g076800.2,1
Solyc06g082140.2,1
Solyc06g082650.2,1
Solyc06g082670.2,1


In [6]:
pd.DataFrame(goterms[goterms["go_id"].isin(list(selection.index))]).groupby(["gene_id","go_id"]).sum()[["go_term","go_cat"]]

Unnamed: 0_level_0,Unnamed: 1_level_0,go_term,go_cat
gene_id,go_id,Unnamed: 2_level_1,Unnamed: 3_level_1
Solyc06g073530.1,GO:0031047,gene silencing by RNA,biological_process
Solyc06g073540.2,GO:0031047,gene silencing by RNA,biological_process
Solyc06g073740.2,GO:0009251,glucan catabolic process,biological_process
Solyc06g073750.2,GO:0009251,glucan catabolic process,biological_process
Solyc06g073760.2,GO:0009251,glucan catabolic process,biological_process
Solyc06g073790.2,GO:0006412,translation,biological_process
Solyc06g073800.2,GO:0006412,translation,biological_process
Solyc06g073880.2,GO:0030433,ubiquitin-dependent ERAD pathway,biological_process
Solyc06g074300.2,GO:0006412,translation,biological_process
Solyc06g074390.2,GO:0010345,suberin biosynthetic process,biological_process
