## BioPAX abstraction: Pathway-centered view

### First exploration on Reactome v91 (Homo Sapiens)

Total number of pathways : 2751

- How many pathways have sub-pathways ?

```sparql
SELECT (COUNT (DISTINCT ?pathway1) AS ?nbPathwayRoot) 
WHERE {
  ?pathway1 rdf:type bp3:Pathway .
  ?pathway2 rdf:type bp3:Pathway .
  
  ?pathway1 bp3:pathwayComponent ?pathway2 .
}
```

808 pathways (~ 29% of pathways have sub-pathways)

- How many pathways can be relied to their subpathways via bp3:nextStep ?

```sparql
SELECT (COUNT (DISTINCT ?pathwayRoot) AS ?nbPathwayRoot) 
WHERE {
  ?pathwayRoot rdf:type bp3:Pathway .
  ?subPathway rdf:type bp3:Pathway .
  
  ?pathwayRoot bp3:pathwayComponent ?subPathway .
  
  ?pathwayRoot bp3:pathwayOrder ?pathwayRootStep .
  ?subPathway bp3:pathwayOrder ?subPathwayStep .
  
  ?pathwayRootStep (bp3:nextStep)+ ?subPathwayStep .
}
```

148 pathways 

- How many sub-pathways have an entering nextStep from another pathway or a leaving nextStep to another pathway ? 

```sparql
SELECT (COUNT(DISTINCT ?subPathway) AS ?nbSubPathway) 
WHERE {
  ?pathwayRoot rdf:type bp3:Pathway .
  ?subPathway rdf:type bp3:Pathway .
  
  ?pathwayRoot bp3:pathwayComponent ?subPathway .
  
  FILTER EXISTS {
    {
      ?enteringNextStep bp3:nextStep ?subPathwayStep .
      ?previousPathway (bp3:pathwayOrder)* ?enteringNextStep .
    } 
    UNION 
    {
      ?subPathwayStep bp3:nextStep ?leavingNextStep .
      ?nextPathway (bp3:pathwayOrder)* ?leavingNextStep .
    }
  }
}
```



- Are there Biochemical Reactions that are uniquely linked to the root pathway via a bp3:pathwayComponent relationship and not via a bp3:PathwayStep ?

```sparql
SELECT ?pathwayRoot (COUNT(DISTINCT ?pathwayRootDirectProcess) AS ?nbPathwayRootDirectProcess)
WHERE {
  ?pathwayRoot rdf:type bp3:Pathway ;
               bp3:pathwayComponent ?subPathway, ?pathwayRootDirectProcess .
  
  ?subPathway rdf:type bp3:Pathway ;
              bp3:pathwayOrder ?subPathwayStep .
  
  ?subPathwayStep bp3:stepProcess ?subPathwayStepProcess .
  
  ?pathwayRootDirectProcess rdf:type bp3:BiochemicalReaction .
  
  FILTER NOT EXISTS {
    ?subPathwayStepProcess rdf:type bp3:BiochemicalReaction .
    FILTER (?pathwayRootDirectProcess = ?subPathwayStepProcess)
  }
}
GROUP BY ?pathwayRoot 
ORDER BY DESC(?nbPathwayRootDirectProcess)
```

Max 52 Biochemical reactions for a Pathway

This situation occurs for 311 pathways

~ 8000 nextSteps entre différents pathways


In [1]:
from SPARQLWrapper import SPARQLWrapper, JSON, CSV, N3, XML, TURTLE
import subprocess
import time
import os
from requests.utils import requote_uri
from urllib.parse import quote
import re
import rdflib
import pandas as pd

In [2]:
endpoint_reactome = "http://localhost:3030/reactome"
rdfFormat = "turtle"
current_directory = os.getcwd()
BioPAX_Ontology_file_path = os.path.join(current_directory, '../..', 'ReactomeData', 'biopax-level3.owl')
ReactomeBioPAX_file_path = os.path.join(current_directory, '../..', 'ReactomeData', 'Homo_sapiens.owl')
TestReactomeFile_path = os.path.join(current_directory, '../..', 'ReactomeData/SignalTransduction/SignalingByReceptorTyrosineKinase', 'R-HSA-9006934_level3.owl')

In [3]:
def extract_prefix_mappings(prefixes_string):
    """
    Extract prefix mappings from the SPARQL prefixes string.
    
    Parameters:
    prefixes_string (str): String containing PREFIX declarations
    
    Returns:
    dict: Mapping of full URIs to their prefixes
    """
    # Extract prefix declarations using regex
    prefix_pattern = re.compile(r'PREFIX\s+(\w+):\s*<([^>]+)>', re.IGNORECASE)
    return {uri: prefix for prefix, uri in prefix_pattern.findall(prefixes_string)}

def convert_to_prefixed_uri(uri_string, prefix_mappings):
    """
    Convert a full URI to prefixed format.
    
    Parameters:
    uri_string (str): Full URI string
    prefix_mappings (dict): Mapping of URIs to prefixes
    
    Returns:
    str: URI in prefixed format (e.g., 'reactome:Protein')
    """
    for uri_base, prefix in prefix_mappings.items():
        if uri_string.startswith(uri_base):
            local_part = uri_string[len(uri_base):]
            return f"{prefix}:{local_part}"
    return uri_string  # Return original if no prefix matches

def save_for_cytoscape(sparql, prefixes_string, output_file, format='csv', separator=','):
    """
    Save SPARQL CONSTRUCT results in a format compatible with Cytoscape,
    using prefix notation for URIs.
    
    Parameters:
    sparql (SPARQLWrapper): Configured SPARQLWrapper instance with query
    prefixes_string (str): String containing PREFIX declarations
    output_file (str): Path to save the output file
    format (str): Output format ('csv' or 'tsv')
    separator (str): Column separator (',' for CSV, '\t' for TSV)
    """
    # Extract prefix mappings
    prefix_mappings = extract_prefix_mappings(prefixes_string)
    
    # Get the results as an RDF graph
    sparql.setReturnFormat(TURTLE)
    results = sparql.queryAndConvert()
    
    # Create an RDFlib graph
    g = rdflib.Graph()
    if isinstance(results, bytes):
        g.parse(data=results.decode('utf-8'), format='turtle')
    else:
        g.parse(data=results, format='turtle')
    
    # Convert triples to a list of dictionaries with prefixed URIs
    triples_data = []
    for s, p, o in g:# Extraction of "Signaling by EGFR" (R-HSA-177929) pathway from Reactome BioPAX export v65
        # Convert each URI to prefixed format
        subject = convert_to_prefixed_uri(str(s), prefix_mappings)
        predicate = convert_to_prefixed_uri(str(p), prefix_mappings)
        object_ = convert_to_prefixed_uri(str(o), prefix_mappings)
        
        triples_data.append({
            'Source': subject,
            'Interaction': predicate,
            'Target': object_
        })
    
    # Convert to DataFrame for easy CSV/TSV export
    df = pd.DataFrame(triples_data)
    
    # Save to file
    if format == 'csv':
        df.to_csv(output_file, index=False, sep=',')
    else:  # tsv
        df.to_csv(output_file, index=False, sep='\t')
    
    print(f"Saved {len(triples_data)} interactions to {output_file}")
    return df

def preview_network_data(df, n=5):
    """
    Preview the network data before importing into Cytoscape.
    
    Parameters:
    df (pandas.DataFrame): DataFrame containing the network data
    n (int): Number of rows to preview
    """
    print(f"\nPreview of network data ({len(df)} total interactions):")
    print(f"\nFirst {n} interactions:")
    print(df.head(n))
    
    # Print some basic network statistics
    unique_nodes = set(df['Source'].unique()) | set(df['Target'].unique())
    print(f"\nNetwork statistics:")
    print(f"Number of unique nodes: {len(unique_nodes)}")
    print(f"Number of interactions: {len(df)}")
    print(f"Unique interaction types:")
    for interaction in sorted(df['Interaction'].unique()):
        print(f"  - {interaction}")

In [8]:
reactomeVersion = 91
prefixes = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX chebi: <http://purl.obolibrary.org/obo/chebi/>
PREFIX chebidb: <http://purl.obolibrary.org/obo/CHEBI_>
PREFIX chebirel: <http://purl.obolibrary.org/obo/CHEBI#>
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
PREFIX bp3: <http://www.biopax.org/release/biopax-level3.owl#>
PREFIX reactome: <http://www.reactome.org/biopax/91/48887#>
PREFIX abstraction:<http://abstraction/#>
""".format(reactomeVersion)

biopaxURI = "http://www.biopax.org/release/biopax-level3.owl#"

In [5]:
command = [
    '/home/cbeust/Softwares/JenaFuseki/apache-jena-fuseki-4.9.0/fuseki-server',
    '--file', ReactomeBioPAX_file_path,
    '--file', BioPAX_Ontology_file_path,
    '/reactome']

process = subprocess.Popen(command)
time.sleep(60)

14:55:41 INFO  Server          :: Dataset: in-memory: load file: /home/cbeust/Projects/2025/BioPAXAbstraction/06_PathwayAbstraction/Scripts/../../ReactomeData/Homo_sapiens.owl
14:55:42 WARN  riot            :: [line: 67054, col: 52] {W137} Input is large. Switching off checking for illegal reuse of rdf:ID's.
14:56:03 INFO  Server          :: Dataset: in-memory: load file: /home/cbeust/Projects/2025/BioPAXAbstraction/06_PathwayAbstraction/Scripts/../../ReactomeData/biopax-level3.owl
14:56:03 INFO  Server          :: Running in read-only mode for /reactome
14:56:03 INFO  Server          :: Apache Jena Fuseki 4.9.0
14:56:03 INFO  Config          :: FUSEKI_HOME=/home/cbeust/Softwares/JenaFuseki/apache-jena-fuseki-4.9.0
14:56:03 INFO  Config          :: FUSEKI_BASE=/home/cbeust/Projects/2025/BioPAXAbstraction/06_PathwayAbstraction/Scripts/run
14:56:03 INFO  Config          :: Shiro file: file:///home/cbeust/Projects/2025/BioPAXAbstraction/06_PathwayAbstraction/Scripts/run/shiro.ini
14:56:03

### SPARQL queries for abstraction

#### 1 - SuperPathwayOf

Defines a relation that decribes the direct hierarchy of pathways.

In [9]:
query_super_pathway_of = """
CONSTRUCT {
  ?pathway abstraction:SuperPathwayOf ?subPathway .
}
WHERE {
  ?pathway rdf:type bp3:Pathway .
  
  ?pathway bp3:pathwayComponent ?subPathway .
  ?subPathway rdf:type bp3:Pathway .
}
"""

sparql = SPARQLWrapper(endpoint_reactome)
sparql.setQuery(prefixes+query_super_pathway_of)
df = save_for_cytoscape(sparql, prefixes, "../Results/ReactomeHomoSapiens/SuperPathwayOf.csv", format='csv')
preview_network_data(df)

Saved 2769 interactions to ../Results/ReactomeHomoSapiens/SuperPathwayOf.csv

Preview of network data (2769 total interactions):

First 5 interactions:
                 Source                 Interaction                Target
0   reactome:Pathway170  abstraction:SuperPathwayOf   reactome:Pathway178
1   reactome:Pathway439  abstraction:SuperPathwayOf   reactome:Pathway442
2  reactome:Pathway2530  abstraction:SuperPathwayOf  reactome:Pathway2532
3   reactome:Pathway661  abstraction:SuperPathwayOf   reactome:Pathway653
4   reactome:Pathway505  abstraction:SuperPathwayOf   reactome:Pathway506

Network statistics:
Number of unique nodes: 2751
Number of interactions: 2769
Unique interaction types:
  - abstraction:SuperPathwayOf


14:57:18 INFO  Fuseki          :: [5] GET http://localhost:3030/reactome?query=%0APREFIX+rdf%3A+%3Chttp%3A//www.w3.org/1999/02/22-rdf-syntax-ns%23%3E%0APREFIX+rdfs%3A%3Chttp%3A//www.w3.org/2000/01/rdf-schema%23%3E%0APREFIX+owl%3A+%3Chttp%3A//www.w3.org/2002/07/owl%23%3E%0APREFIX+xsd%3A+%3Chttp%3A//www.w3.org/2001/XMLSchema%23%3E%0APREFIX+dc%3A+%3Chttp%3A//purl.org/dc/elements/1.1/%3E%0APREFIX+dcterms%3A+%3Chttp%3A//purl.org/dc/terms/%3E%0APREFIX+chebi%3A+%3Chttp%3A//purl.obolibrary.org/obo/chebi/%3E%0APREFIX+chebidb%3A+%3Chttp%3A//purl.obolibrary.org/obo/CHEBI_%3E%0APREFIX+chebirel%3A+%3Chttp%3A//purl.obolibrary.org/obo/CHEBI%23%3E%0APREFIX+oboInOwl%3A+%3Chttp%3A//www.geneontology.org/formats/oboInOwl%23%3E%0APREFIX+bp3%3A+%3Chttp%3A//www.biopax.org/release/biopax-level3.owl%23%3E%0APREFIX+reactome%3A+%3Chttp%3A//www.reactome.org/biopax/91/48887%23%3E%0APREFIX+abstraction%3A%3Chttp%3A//abstraction/%23%3E%0A%0ACONSTRUCT+%7B%0A++%3Fpathway+abstraction%3ASuperPathwayOf+%3FsubPathway+.%0A%

#### 2 - NextStepPathway

Defines a relation that describes the sequence of pathway steps across different pathways

In [10]:
query_next_step_pathway = """ 
CONSTRUCT {
  ?pathway abstraction:NextStepPathway ?nextPathway
}
WHERE {
  ?pathway rdf:type bp3:Pathway .
  ?nextPathway rdf:type bp3:Pathway .
  
  ?pathway bp3:pathwayOrder ?pathwayStep .
  ?nextPathway bp3:pathwayOrder ?nextStep .
  
  ?pathwayStep bp3:nextStep ?nextStep .
  
  FILTER (?pathway != ?nextPathway)
}
"""

sparql = SPARQLWrapper(endpoint_reactome)
sparql.setQuery(prefixes+query_next_step_pathway)
df = save_for_cytoscape(sparql, prefixes, "../Results/ReactomeHomoSapiens/NextStepPathway.csv", format='csv')
preview_network_data(df)

14:57:27 INFO  Fuseki          :: [6] GET http://localhost:3030/reactome?query=%0APREFIX+rdf%3A+%3Chttp%3A//www.w3.org/1999/02/22-rdf-syntax-ns%23%3E%0APREFIX+rdfs%3A%3Chttp%3A//www.w3.org/2000/01/rdf-schema%23%3E%0APREFIX+owl%3A+%3Chttp%3A//www.w3.org/2002/07/owl%23%3E%0APREFIX+xsd%3A+%3Chttp%3A//www.w3.org/2001/XMLSchema%23%3E%0APREFIX+dc%3A+%3Chttp%3A//purl.org/dc/elements/1.1/%3E%0APREFIX+dcterms%3A+%3Chttp%3A//purl.org/dc/terms/%3E%0APREFIX+chebi%3A+%3Chttp%3A//purl.obolibrary.org/obo/chebi/%3E%0APREFIX+chebidb%3A+%3Chttp%3A//purl.obolibrary.org/obo/CHEBI_%3E%0APREFIX+chebirel%3A+%3Chttp%3A//purl.obolibrary.org/obo/CHEBI%23%3E%0APREFIX+oboInOwl%3A+%3Chttp%3A//www.geneontology.org/formats/oboInOwl%23%3E%0APREFIX+bp3%3A+%3Chttp%3A//www.biopax.org/release/biopax-level3.owl%23%3E%0APREFIX+reactome%3A+%3Chttp%3A//www.reactome.org/biopax/91/48887%23%3E%0APREFIX+abstraction%3A%3Chttp%3A//abstraction/%23%3E%0A+%0ACONSTRUCT+%7B%0A++%3Fpathway+abstraction%3ANextStepPathway+%3FnextPathway%0A

Saved 3697 interactions to ../Results/ReactomeHomoSapiens/NextStepPathway.csv

Preview of network data (3697 total interactions):

First 5 interactions:
                 Source                  Interaction                Target
0  reactome:Pathway2717  abstraction:NextStepPathway   reactome:Pathway626
1   reactome:Pathway655  abstraction:NextStepPathway   reactome:Pathway646
2   reactome:Pathway400  abstraction:NextStepPathway   reactome:Pathway315
3  reactome:Pathway2581  abstraction:NextStepPathway   reactome:Pathway633
4  reactome:Pathway1799  abstraction:NextStepPathway  reactome:Pathway1802

Network statistics:
Number of unique nodes: 1481
Number of interactions: 3697
Unique interaction types:
  - abstraction:NextStepPathway


15:00:16 INFO  Fuseki          :: [6] 200 OK (168.773 s)


### Generate node table for ponderation 

In [12]:
query_node_table = """
SELECT ?pathway (COUNT (DISTINCT ?DirectPathwayStepProcess) AS ?NbDirectPathwayStepProcess) 
WHERE {
  ?pathway rdf:type bp3:Pathway .
  ?pathway bp3:pathwayOrder ?DirectPathwayStep .
  ?DirectPathwayStep bp3:stepProcess ?DirectPathwayStepProcess .
  ?DirectPathwayStepProcess (rdf:type/rdfs:subClassOf*) bp3:Interaction .
}
GROUP BY ?pathway    
"""

sparql = SPARQLWrapper(endpoint_reactome)
sparql.setQuery(prefixes+query_node_table)
sparql.setReturnFormat(CSV)
results = sparql.query().convert()
with open(f"../Results/ReactomeHomoSapiens/PathwayNodeTable.csv", "wb") as f:
    f.write(results)



15:19:53 INFO  Fuseki          :: [8] GET http://localhost:3030/reactome?query=%0APREFIX+rdf%3A+%3Chttp%3A//www.w3.org/1999/02/22-rdf-syntax-ns%23%3E%0APREFIX+rdfs%3A%3Chttp%3A//www.w3.org/2000/01/rdf-schema%23%3E%0APREFIX+owl%3A+%3Chttp%3A//www.w3.org/2002/07/owl%23%3E%0APREFIX+xsd%3A+%3Chttp%3A//www.w3.org/2001/XMLSchema%23%3E%0APREFIX+dc%3A+%3Chttp%3A//purl.org/dc/elements/1.1/%3E%0APREFIX+dcterms%3A+%3Chttp%3A//purl.org/dc/terms/%3E%0APREFIX+chebi%3A+%3Chttp%3A//purl.obolibrary.org/obo/chebi/%3E%0APREFIX+chebidb%3A+%3Chttp%3A//purl.obolibrary.org/obo/CHEBI_%3E%0APREFIX+chebirel%3A+%3Chttp%3A//purl.obolibrary.org/obo/CHEBI%23%3E%0APREFIX+oboInOwl%3A+%3Chttp%3A//www.geneontology.org/formats/oboInOwl%23%3E%0APREFIX+bp3%3A+%3Chttp%3A//www.biopax.org/release/biopax-level3.owl%23%3E%0APREFIX+reactome%3A+%3Chttp%3A//www.reactome.org/biopax/91/48887%23%3E%0APREFIX+abstraction%3A%3Chttp%3A//abstraction/%23%3E%0A%0ASELECT+%3Fpathway+%28COUNT+%28DISTINCT+%3FDirectPathwayStepProcess%29+AS+%3FN

### Concatenation of output files

In [22]:
nodeTable = pd.read_csv("../Results/ReactomeHomoSapiens/PathwayNodeTable.csv", header=None, sep=",")
pathwayURIs:list = nodeTable[0]
prefixMapping = extract_prefix_mappings(prefixes)
newPathwayNodes = list()
for node in pathwayURIs:
    newPathwayNodes.append(convert_to_prefixed_uri(node, prefixMapping))
print(newPathwayNodes)
nodeTable[0] = newPathwayNodes
print(nodeTable)
nodeTable.to_csv("../Results/ReactomeHomoSapiens/PathwayNodeTablev.tsv", sep="\t", header=None, index=False)

# concatenate output files
q1 = pd.read_csv("../Results/ReactomeHomoSapiens/SuperPathwayOf.csv", header=None, sep=",")
q1 = q1.drop(q1.index[0]).reset_index(drop=True)
q2 = pd.read_csv("../Results/ReactomeHomoSapiens/NextStepPathway.csv", header=None, sep=",")
q2 = q2.drop(q2.index[0]).reset_index(drop=True)

concat_df = pd.concat([q1, q2], ignore_index=True)

concat_df.to_csv("../Results/ReactomeHomoSapiens/PathwayAbstraction.tsv", sep="\t", header=None, index=False)
print(concat_df)


['pathway', 'reactome:Pathway1849', 'reactome:Pathway1716', 'reactome:Pathway1801', 'reactome:Pathway2298', 'reactome:Pathway2383', 'reactome:Pathway2165', 'reactome:Pathway1934', 'reactome:Pathway2032', 'reactome:Pathway641', 'reactome:Pathway1053', 'reactome:Pathway1271', 'reactome:Pathway205', 'reactome:Pathway689', 'reactome:Pathway774', 'reactome:Pathway556', 'reactome:Pathway423', 'reactome:Pathway1848', 'reactome:Pathway2033', 'reactome:Pathway1933', 'reactome:Pathway2166', 'reactome:Pathway2384', 'reactome:Pathway2299', 'reactome:Pathway1800', 'reactome:Pathway119', 'reactome:Pathway1185', 'reactome:Pathway991', 'reactome:Pathway640', 'reactome:Pathway78', 'reactome:Pathway1270', 'reactome:Pathway773', 'reactome:Pathway337', 'reactome:Pathway688', 'reactome:Pathway555', 'reactome:Pathway30', 'reactome:Pathway422', 'reactome:Pathway1803', 'reactome:Pathway1936', 'reactome:Pathway2034', 'reactome:Pathway1718', 'reactome:Pathway2167', 'reactome:Pathway2385', 'reactome:Pathway203',

In [14]:
# end process
process.kill()
time.sleep(60)