# Ontology Discovery using SparQL

Below are some SPARQL queries designed to help you navigate the gene interactions ontology and get the insights you need. I’ll provide queries that focus on common tasks you might encounter, from getting specific gene interactions to cross-referencing information based on your BioPax framework.

In [1]:
import rdflib

# Load the ontology file
ontology_file = "ontology/ontology.ttl"

# Initialize the graph
g = rdflib.Graph()

# Parse the ontology
g.parse(ontology_file, format="turtle")

# Print the number of triples in the graph
print(f"Ontology loaded with {len(g)} triples.")


Ontology loaded with 1961 triples.


In [9]:
def run_sparql_query(graph, query_str):
    results = graph.query(query_str)
    for row in results:
        # Ensure attribute access using proper index or name lookup
        subject = row.get("subject") if "subject" in row.labels else row[0]
        predicate = row.get("predicate") if "predicate" in row.labels else row[1]
        obj = row.get("object") if "object" in row.labels else row[2]
        print(f"Subject: {subject}, Predicate: {predicate}, Object: {obj}")

## 1. Retrieve All Interactions Between SL_GENE and TSG_GENE
This query will help you fetch all interactions where an SL_GENE controls a TSG_GENE, following the BioPax structure.

In [25]:
query_str = """
  PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>
  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

  SELECT ?slGene ?tsGene ?evidence ?slLabel ?tsLabel
  WHERE {
    ?interaction a bp:Control .
    ?interaction bp:controller ?slGene .
    ?interaction bp:controlled ?tsGene .
    OPTIONAL { ?interaction bp:evidence ?evidence . }
    
    # If you want labels, grab them too
    OPTIONAL { ?slGene rdfs:label ?slLabel . }
    OPTIONAL { ?tsGene rdfs:label ?tsLabel . }
  }
  ORDER BY ?slGene
"""

# Execute the query on the local graph
results = g.query(query_str)

# Convert the results directly to a pandas DataFrame
df = pd.DataFrame(results.bindings)

# Display the DataFrame in a grid
df

Unnamed: 0,evidence,slGene,tsGene
0,2 occurrences,http://example.org/ARID2,http://example.org/LKB1
1,2 occurrences,http://example.org/ARID2,http://example.org/NF2
2,3 occurrences,http://example.org/ATG9A,http://example.org/LKB1
3,3 occurrences,http://example.org/ATG9A,http://example.org/NF1
4,3 occurrences,http://example.org/ATG9A,http://example.org/VHL
...,...,...,...
204,2 occurrences,http://example.org/VGLL3,http://example.org/RB1
205,2 occurrences,http://example.org/WRAP53,http://example.org/TP53
206,2 occurrences,http://example.org/WRAP53,http://example.org/NF1
207,2 occurrences,http://example.org/YPEL5,http://example.org/LKB1


## 2. Find All Genes Interacting with a Specific SL_GENE
Say you want to find all tumor suppressor genes that are controlled by a specific SL_GENE like "TYMS"—this query is for you!

In [28]:
query_str = """
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?tsGene ?tsLabel
WHERE {
  ?interaction a bp:Control .
  ?interaction bp:controller <http://example.org/TYMS> .
  ?interaction bp:controlled ?tsGene .
  
  # Grab labels for the controlled genes
  OPTIONAL { ?tsGene rdfs:label ?tsLabel . }
}
"""

# Execute the query on the local graph
results = g.query(query_str)

# Convert the results directly to a pandas DataFrame
df = pd.DataFrame(results.bindings)

# Display the DataFrame in a grid
df

Unnamed: 0,tsGene
0,http://example.org/TP53
1,http://example.org/LKB1
2,http://example.org/NF1
3,http://example.org/NF2
4,http://example.org/PBRM1
5,http://example.org/PTEN
6,http://example.org/TP53BP1


## 3. Get Evidence for Interactions Between Genes
Want to see the evidence backing up the interactions between SL_GENE and TSG_GENE? This query will give you an answer based on hard coded logic that we need to improve:

In [29]:
query_str = """
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>

SELECT ?slGene ?tsGene ?evidence
WHERE {
  ?interaction a bp:Control .
  ?interaction bp:controller ?slGene .
  ?interaction bp:controlled ?tsGene .
  ?interaction bp:evidence ?evidence .
}
ORDER BY ?slGene
"""

# Execute the query on the local graph
results = g.query(query_str)

# Convert the results directly to a pandas DataFrame
df = pd.DataFrame(results.bindings)

# Display the DataFrame in a grid
df

Unnamed: 0,evidence,slGene,tsGene
0,2 occurrences,http://example.org/ARID2,http://example.org/LKB1
1,2 occurrences,http://example.org/ARID2,http://example.org/NF2
2,3 occurrences,http://example.org/ATG9A,http://example.org/LKB1
3,3 occurrences,http://example.org/ATG9A,http://example.org/NF1
4,3 occurrences,http://example.org/ATG9A,http://example.org/VHL
...,...,...,...
204,2 occurrences,http://example.org/VGLL3,http://example.org/RB1
205,2 occurrences,http://example.org/WRAP53,http://example.org/TP53
206,2 occurrences,http://example.org/WRAP53,http://example.org/NF1
207,2 occurrences,http://example.org/YPEL5,http://example.org/LKB1


## 4. List All Genes Without Lexicalization (Missing Labels)
Need to track down those naughty nodes without labels? Here’s how to get them:

In [30]:
query_str = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?gene
WHERE {
  ?gene a ?type .
  FILTER NOT EXISTS { ?gene rdfs:label ?label }
}
"""

# Execute the query on the local graph
results = g.query(query_str)

# Convert the results directly to a pandas DataFrame
df = pd.DataFrame(results.bindings)

# Display the DataFrame in a grid
df

Unnamed: 0,gene
0,http://example.org/PMVK
1,http://example.org/MCM9
2,http://example.org/DHX35
3,http://example.org/JTB
4,http://example.org/PSMF1
...,...
465,http://example.org/USP48_to_NF1_interaction
466,http://example.org/USP48_to_NF2_interaction
467,http://example.org/WRAP53_to_NF1_interaction
468,http://example.org/YPEL5_to_LKB1_interaction


## 5. Find Genes with a Specific Database Cross-Reference (e.g., HGNC)
This query finds genes with a cross-reference to an external database like HGNC. Useful for ensuring data is linked properly to external resources.

In [32]:
query_str = """
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>
PREFIX xref: <http://www.biopax.org/release/biopax-level3.owl#Xref>

SELECT ?gene ?xref ?xrefDB
WHERE {
  ?gene bp:xref ?xref .
  ?xref bp:db ?xrefDB .
  FILTER (?xrefDB = "HGNC")  # Change to the database of interest
}
"""

# Execute the query on the local graph
results = g.query(query_str)

# Convert the results directly to a pandas DataFrame
df = pd.DataFrame(results.bindings)

# Display the DataFrame in a grid
df

Unnamed: 0,gene,xref,xrefDB
0,http://example.org/PMVK,http://example.org/PMVK_HGNC_Xref,HGNC
1,http://example.org/MCM9,http://example.org/MCM9_HGNC_Xref,HGNC
2,http://example.org/DHX35,http://example.org/DHX35_HGNC_Xref,HGNC
3,http://example.org/JTB,http://example.org/JTB_HGNC_Xref,HGNC
4,http://example.org/PSMF1,http://example.org/PSMF1_HGNC_Xref,HGNC
...,...,...,...
82,http://example.org/MAPKAP1,http://example.org/MAPKAP1_HGNC_Xref,HGNC
83,http://example.org/DHODH,http://example.org/DHODH_HGNC_Xref,HGNC
84,http://example.org/RMI2,http://example.org/RMI2_HGNC_Xref,HGNC
85,http://example.org/STUB1,http://example.org/STUB1_HGNC_Xref,HGNC


## 6. Get All Genes Involved in a Specific Biological Process
If you want to focus on genes involved in a certain biological process, such as apoptosis:

In [33]:
query_str = """
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?gene ?process
WHERE {
  ?interaction a bp:BiologicalProcess .
  ?interaction bp:participant ?gene .
  ?interaction rdfs:label ?processLabel .
  FILTER regex(?processLabel, "apoptosis", "i")  # Replace with any process of interest
}
"""


# Execute the query on the local graph
results = g.query(query_str)

# Convert the results directly to a pandas DataFrame
df = pd.DataFrame(results.bindings)

# Display the DataFrame in a grid
df

## 7. Find All Evidence Supporting a Specific Interaction
If you need to dive into the supporting evidence for a specific interaction, such as one involving "ARID1A" and "TP53":

In [35]:
query_str = """
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>

SELECT ?evidence
WHERE {
  ?interaction a bp:Control .
  ?interaction bp:controller <http://example.org/ARID1A> .
  ?interaction bp:controlled <http://example.org/TP53> .
  ?interaction bp:evidence ?evidence .
}
"""

# Execute the query on the local graph
results = g.query(query_str)

# Convert the results directly to a pandas DataFrame
df = pd.DataFrame(results.bindings)

# Display the DataFrame in a grid
df

## 8. Find All Tumor Suppressor Genes Controlled by Multiple SL_GENEs
This query identifies tumor suppressor genes that are controlled by more than one SL_GENE:

In [36]:
query_str = """
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>

SELECT ?tsGene (COUNT(?slGene) as ?countSLGenes)
WHERE {
  ?interaction a bp:Control .
  ?interaction bp:controlled ?tsGene .
  ?interaction bp:controller ?slGene .
}
GROUP BY ?tsGene
HAVING (COUNT(?slGene) > 1)
ORDER BY DESC(?countSLGenes)
"""

# Execute the query on the local graph
results = g.query(query_str)

# Convert the results directly to a pandas DataFrame
df = pd.DataFrame(results.bindings)

# Display the DataFrame in a grid
df

Unnamed: 0,countSLGenes,tsGene
0,33,http://example.org/VHL
1,29,http://example.org/LKB1
2,26,http://example.org/NF1
3,18,http://example.org/PBRM1
4,15,http://example.org/NF2
5,15,http://example.org/RB1
6,15,http://example.org/KEAP1
7,11,http://example.org/CDH1
8,11,http://example.org/BAP1
9,11,http://example.org/TP53BP1


## 9. Fetch Interaction Types Between Genes

Get all the types of interactions (like inhibition, activation) that occur between SL_GENE and TSG_GENE:

This query identifies tumor suppressor genes that are controlled by more than one SL_GENE:

In [38]:
query_str = """
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>

SELECT ?slGene ?tsGene ?interactionType
WHERE {
  ?interaction a bp:Control .
  ?interaction bp:controller ?slGene .
  ?interaction bp:controlled ?tsGene .
  ?interaction a ?interactionType .
  FILTER(?interactionType != bp:Control)  # Exclude general control interactions
}
"""

# Execute the query on the local graph
results = g.query(query_str)

# Convert the results directly to a pandas DataFrame
df = pd.DataFrame(results.bindings)

# Display the DataFrame in a grid
df

Unnamed: 0,interactionType,slGene,tsGene
0,http://www.biopax.org/release/biopax-level3.ow...,http://example.org/RPRD1B,http://example.org/TP53
1,http://www.biopax.org/release/biopax-level3.ow...,http://example.org/BPTF,http://example.org/TP53
2,http://www.biopax.org/release/biopax-level3.ow...,http://example.org/CAB39,http://example.org/TP53
3,http://www.biopax.org/release/biopax-level3.ow...,http://example.org/CNKSR2,http://example.org/TP53
4,http://www.biopax.org/release/biopax-level3.ow...,http://example.org/TADA1,http://example.org/TP53
...,...,...,...
204,http://www.biopax.org/release/biopax-level3.ow...,http://example.org/USP48,http://example.org/NF1
205,http://www.biopax.org/release/biopax-level3.ow...,http://example.org/USP48,http://example.org/NF2
206,http://www.biopax.org/release/biopax-level3.ow...,http://example.org/WRAP53,http://example.org/NF1
207,http://www.biopax.org/release/biopax-level3.ow...,http://example.org/YPEL5,http://example.org/LKB1


## 10. Explore Complexes Involving SL_GENE

If SL_GENEs participate in protein complexes, you might want to explore them:

In [39]:
query_str = """
PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>

SELECT ?slGene ?complex
WHERE {
  ?complex a bp:Complex .
  ?complex bp:component ?slGene .
}
"""

# Execute the query on the local graph
results = g.query(query_str)

# Convert the results directly to a pandas DataFrame
df = pd.DataFrame(results.bindings)

# Display the DataFrame in a grid
df