**Implementation algorithm**

Prioritization of candidate genes in QTL regions based on associations between traits and biological processes.
Bargsten JW, Nap JP, Sanchez-Perez GF, van Dijk AD.
BMC Plant Biol. 2014 Dec 10;14:330. doi: 10.1186/s12870-014-0330-3.

In [1]:
import qtl2gene
import pandas as pd
search = qtl2gene.SEARCH(
    "http://pbg-ld.candygene-nlesc.surf-hosted.nl:8890/sparql")

## Brix, Soluble Solids, Sugars

Trait: 

QTL from: Chromosome `9`, around `3474710`

Candidate: `Lin5` (`Solyc09g010080`)

Define the QTL and compute genes within this interval

In [2]:
d=100000
p=3474710
interval = search.make_interval(
    "http://localhost:8890/genome/Solanum_lycopersicum/chromosome/9",
    p-d, 
    p+d)

#genes for interval
genes = search.interval_genes(interval)
#goterms for genes
goterms = search.genes_goterms(genes["gene_id"].unique())
#compute the numbers 
gonumbers = search.get_go_numbers(goterms, genes)

Number of genes found in QTL

In [3]:
len(genes)
#pd.DataFrame(goterms.groupby(["go_id","go_cat","go_term"])["gene_id"].count())

29

GO terms with boundary on adjusted p-value

In [4]:
p_boundary = 0.1
selection = gonumbers[gonumbers["p_adjusted"]<p_boundary]
pd.DataFrame(selection)[["p_less", "p_greater", "p_adjusted"]]

Unnamed: 0_level_0,p_less,p_greater,p_adjusted
go_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
GO:0000028,0.999371,0.036274,0.066147
GO:0000462,0.999217,0.040409,0.069593
GO:0005975,0.997593,0.029466,0.060896
GO:0006260,0.999509,0.032123,0.062238
GO:0006310,0.998101,0.062441,0.092175
GO:0006486,0.997681,0.068875,0.092832
GO:0006614,0.999961,0.009403,0.026488
GO:0006839,0.998859,0.048627,0.079338
GO:0007018,0.997681,0.068875,0.092832
GO:0009736,0.999953,0.010254,0.026488


Genes with at least one of these GO terms, sorted descending by number of GO terms

In [5]:
geneSelection = pd.DataFrame(goterms[goterms["go_id"].isin(list(selection.index))].groupby("gene_id")["gene_id"].count())
geneSelection.columns = ["#go-terms"]
geneSelection = geneSelection.sort_values(by="#go-terms", ascending=False)
display(geneSelection)

Unnamed: 0_level_0,#go-terms
gene_id,Unnamed: 1_level_1
Solyc09g010180.2,5
Solyc09g009900.2,2
Solyc09g009940.2,2
Solyc09g010030.1,2
Solyc09g010100.2,2
Solyc09g010110.2,2
Solyc09g010170.2,2
Solyc09g009950.2,1
Solyc09g009960.2,1
Solyc09g009970.2,1


In [6]:
pd.DataFrame(goterms[goterms["go_id"].isin(list(selection.index))]).groupby(["gene_id","go_id"]).sum()[["go_term","go_cat"]]

Unnamed: 0_level_0,Unnamed: 1_level_0,go_term,go_cat
gene_id,go_id,Unnamed: 2_level_1,Unnamed: 3_level_1
Solyc09g009900.2,GO:0006260,DNA replication,biological_process
Solyc09g009900.2,GO:0006310,DNA recombination,biological_process
Solyc09g009940.2,GO:0006614,SRP-dependent cotranslational protein targetin...,biological_process
Solyc09g009940.2,GO:0070208,protein heterotrimerization,biological_process
Solyc09g009950.2,GO:0016554,cytidine to uridine editing,biological_process
Solyc09g009960.2,GO:0019878,lysine biosynthetic process via aminoadipic acid,biological_process
Solyc09g009970.2,GO:0034059,response to anoxia,biological_process
Solyc09g010030.1,GO:0006839,mitochondrial transport,biological_process
Solyc09g010030.1,GO:0032543,mitochondrial translation,biological_process
Solyc09g010050.1,GO:0016567,protein ubiquitination,biological_process
