Implementation algorithm

Alex Warwick Vesztrocy, Christophe Dessimoz, Henning Redestig, Prioritising candidate genes causing QTL using hierarchical orthologous groups, *Bioinformatics*, Volume 34, Issue 17, 01 September 2018, Pages i612–i619, https://doi.org/10.1093/bioinformatics/bty615

In [1]:
import qtlsearch
import pandas as pd
from IPython.display import Image,SVG
search = qtlsearch.SEARCH(
    "http://pbg-ld.candygene-nlesc.surf-hosted.nl:8890/sparql", 
    "http://sparql.omabrowser.org/sparql",
    "https://sparql.uniprot.org/sparql")

## Phenylethanol, Phenylacetaldehyde

GO-terms: `GO:0016747`, `GO:0102387`, `GO:0018449`, `GO:0004029`, `GO:0008957`, `GO:1990055`, `GO:0050177`, `GO:0018814`

QTL from: Chromosome `8`, in `55068565` - `63267130`

Candidate: `CT77, CT148 Aromatic amino acid decarboxylase`

Define the QTL and compute genes within this interval

In [2]:
intervalT = search.make_interval(
    "http://localhost:8890/genome/Solanum_lycopersicum/chromosome/8", 
    55068565,
    63267130)

#genes for interval
genesT = search.interval_genes(intervalT)

Compute the list of GO annotations

In [3]:
qtls = [genesT.index]
#disabled GO:0016747 : too general, causing problems with size
go_annotations = pd.concat([
                            #search.get_child_annotations("GO:0016747"),
                            search.get_child_annotations("GO:0102387"), 
                            search.get_child_annotations("GO:0018449"), 
                            search.get_child_annotations("GO:0004029"), 
                            search.get_child_annotations("GO:0008957"),
                            search.get_child_annotations("GO:1990055"),
                            search.get_child_annotations("GO:0050177")])
print(go_annotations)

                                                                                       label
go_annotation                                                                               
http://purl.obolibrary.org/obo/GO_0102387         2-phenylethanol acetyltransferase activity
http://purl.obolibrary.org/obo/GO_0018449             1-phenylethanol dehydrogenase activity
http://purl.obolibrary.org/obo/GO_0034520            2-naphthaldehyde dehydrogenase activity
http://purl.obolibrary.org/obo/GO_0050569              glycolaldehyde dehydrogenase activity
http://purl.obolibrary.org/obo/GO_0018484       4-hydroxybenzaldehyde dehydrogenase activity
http://purl.obolibrary.org/obo/GO_0047551         2-oxoaldehyde dehydrogenase (NAD) activity
http://purl.obolibrary.org/obo/GO_0052814       medium-chain-aldehyde dehydrogenase activity
http://purl.obolibrary.org/obo/GO_0033723          fluoroacetaldehyde dehydrogenase activity
http://purl.obolibrary.org/obo/GO_0019145          aminobutyraldehyde 

Get data and do computations

In [4]:
result = qtlsearch.QTLSEARCH(search, qtls,go_annotations)

Create report

In [5]:
report_list = result.report()
for report in report_list:
    display(report)

Unnamed: 0_level_0,alias,uniprot_id,description,chromosome,location,score
gene_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Solyc08g068190.2,101257095,K4CM43,Aldehyde dehydrogenase,8,57303048-57306002,3.702432
Solyc08g076790.2,101246651,K4CN39,Cinnamoyl-CoA reductase-like protein,8,60704895-60707948,0.009002
Solyc08g068600.2,101264847,K4CM83,Decarboxylase family protein,8,57730921-57733032,0.004473
Solyc08g068610.2,778255,K4CM84,Decarboxylase family protein,8,57740004-57742160,0.004473
Solyc08g068620.1,,K4CM85,Decarboxylase family protein,8,57747891-57749811,0.004473
Solyc08g068630.2,101265155,K4CM86,Decarboxylase family protein,8,57763544-57765666,0.004473
Solyc08g068640.2,101265757,K4CM87,Decarboxylase family protein,8,57774707-57776533,0.004473
Solyc08g068670.2,101265461,K4CM89,Decarboxylase family protein,8,57798879-57800980,0.004473
Solyc08g068680.2,AADC1A,Q1KSC6,Decarboxylase family protein,8,57812621-57814771,0.004473
Solyc08g068690.1,101266245,K4CM91,N-acetyltransferase,8,57820371-57821093,0.003834
