# Initializing the Guide to Pharmacology Database

---



## Setup and Data Loading

First, let's review the setup and data loading process:

In [1]:
!pip install -q rdflib-neo4j openpyxl


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [22]:
from rdflib_neo4j import Neo4jStoreConfig
from rdflib_neo4j import HANDLE_VOCAB_URI_STRATEGY
# from google.colab import userdata

In [23]:
import os

# NEO_DB_URI = 'bolt://172.18.176.1:7687'
NEO_DB_URI = os.getenv('NEO4J_LCL_URI')
NEO_DB_USERNAME = os.getenv('NEO4J_USERNAME')
NEO_DB_PWD = os.getenv('NEO4J_LCL_PASSWORD')


In [24]:
auth_data = {'uri': NEO_DB_URI,
             'database': "neo4j",
             'user': NEO_DB_USERNAME,
             'pwd': NEO_DB_PWD}

## Define namespaces and configuration


In [25]:
from rdflib import Namespace

prefixes = {
    'gtpo': Namespace('https://rdf.guidetopharmacology.org/ns/gtpo#'),
    'grac': Namespace('https://rdf.guidetopharmacology.org/GRAC/'),
    'cito': Namespace('http://purl.org/spar/cito/'),
    'dcat': Namespace('http://www.w3.org/ns/dcat#'),
    'dctypes': Namespace('http://purl.org/dc/dcmitype/'),
    'dct': Namespace('http://purl.org/dc/terms/'),
    'foaf': Namespace('http://xmlns.com/foaf/0.1/'),
    'freq': Namespace('http://purl.org/cld/freq/'),
    'idot': Namespace('http://identifiers.org/idot/'),
    'lexvo': Namespace('http://lexvo.org/ontology#'),
    'pav': Namespace('http://purl.org/pav/'),
    'prov': Namespace('http://www.w3.org/ns/prov#'),
    'rdf': Namespace('http://www.w3.org/1999/02/22-rdf-syntax-ns#'),
    'rdfs': Namespace('http://www.w3.org/2000/01/rdf-schema#'),
    'schemaorg': Namespace('http://schema.org/'),
    'sd': Namespace('http://www.w3.org/ns/sparql-service-description#'),
    'sio': Namespace('http://semanticscience.org/resource/'),
    'void': Namespace('http://rdfs.org/ns/void#'),
    'void-ext': Namespace('http://ldf.fi/void-ext#'),
    'xsd': Namespace('http://www.w3.org/2001/XMLSchema#'),
    # Add other required prefixes based on your data inspection
}

In [26]:
config = Neo4jStoreConfig(auth_data=auth_data,
                          custom_prefixes=prefixes,
                          handle_vocab_uri_strategy=HANDLE_VOCAB_URI_STRATEGY.IGNORE,
                          batching=True)

## Load the ontology data

In [27]:
from rdflib_neo4j import Neo4jStore
from rdflib import Graph
file_path = 'https://www.guidetopharmacology.org/DATA/rdf/2024.2/gtp-rdf.n3'

graph_store = Graph(store=Neo4jStore(config=config))
graph_store.parse(file_path,format="nt")
graph_store.close(True)

Uniqueness constraint on :Resource(uri) found. 
                
                
The store is now: Open
The store is now: Closed
IMPORTED 387357 TRIPLES


This setup process installs the necessary libraries, defines authentication data, sets up namespaces, and loads the Guide to Pharmacology ontology data into a Neo4j graph database.

## Querying the Guide to Pharmacology data

Now, let's explore the GtoP database data using a Cypher query:

In [28]:
from neo4j import GraphDatabase

# Initialize Neo4j driver
driver = GraphDatabase.driver(
    NEO_DB_URI,
    auth=(NEO_DB_USERNAME, NEO_DB_PWD)
)


#### Setup reused functions

In [29]:
def run_cypher_query(query):
    with driver.session(database="neo4j") as session:
        result = session.run(query)
        # Fetch all results and convert them into a list of dictionaries
        return [record.data() for record in result]

In [30]:
cypher_query = """
WITH ["EnzymeTargetFamily",
      "OtherProteinTargetFamily",
      "TransporterTargetFamily",
      "GProteinCoupledReceptorTargetFamily",
      "CatalyticReceptorTargetFamily",
      "LigandGatedIonChannelTargetFamily",
      "VoltageGatedIonChannelTargetFamily",
      "NuclearHormoneReceptorTargetFamily",
      "OtherIonChannelTargetFamily",
      "GroupingTargetFamily",
      "LigandTargetFamily"] AS targetFamilyTypes,

     ["SyntheticOrganicLigand",
      "NaturalProductLigand",
      "PeptideLigand",
      "MetaboliteLigand",
      "AntibodyLigand",
      "InorganicLigand"] AS ligandTypes,

     ["AntagonistInteraction",
      "AgonistInteraction",
      "ActivatorInteraction",
      "Interaction",
      "InhibitorInteraction",
      "AntibodyInteraction",
      "ChannelBlockerInteraction",
      "AllostericModulatorInteraction",
      "GatingInhibitorInteraction",
      "SubunitSpecificInteraction"] AS interactionTypes

UNWIND targetFamilyTypes AS targetFamilyType
UNWIND ligandTypes AS ligandType
UNWIND interactionTypes AS interactionType

CALL apoc.cypher.run("
    MATCH (tf:`" + targetFamilyType + "` {label: $label})<-[r1:hasTargetFamily]-(t:Target)
    MATCH (i:`" + interactionType + "`)-[r2:hasTarget]->(t)
    MATCH (i)-[r3:hasLigand]->(l:`" + ligandType + "`)
    RETURN tf, r1, t, r2, i, r3, l
    LIMIT 30
", {label: "STE7 family"}) YIELD value

RETURN value.tf AS tf, value.r1 AS r1, value.t AS t, value.r2 AS r2, value.i AS i, value.r3 AS r3, value.l AS l
"""

In [31]:
from pprint import pprint

# Execute the query and display the results
results = run_cypher_query(cypher_query)
pprint(results)

[{'i': {'uri': 'https://rdf.guidetopharmacology.org/GRAC/interaction88688'},
  'l': {'approved': 'f',
        'canonicalSMILES': 'COc1ccc(cc1)[C@H]1CC(=O)c2c1c(OC)cc(c2)OC',
        'comment': '(R)-STU104 is a first-in-class molecule that inhibits the '
                   'interaction between the proteins kinases TAK1 '
                   '(mitogen-activated protein kinase kinase kinase 7; MAP3K7) '
                   'and MKK3 (mitogen-activated protein kinase kinase 3; '
                   'MAP2K3) <Reference id=43711/>. These kinases are part of '
                   'the inflammatory signalling cascade that culminates in '
                   'TNF-&alpha; production. (R)-STU104 binds to MKK3 at a '
                   'location that disrupts its interaction with the upstream '
                   'kinase TAK1. This action perturbs MKK3 phosphorylation by '
                   'TAK1 and inhibits downstream signal propagation.  Blocking '
                   'TAK1-mediated phosphorylation 

This query retrieves information about the STE7 target family, including related targets, interactions, and ligands.

### Querying for a list of drug compounds

In [32]:
cypher_query = """
MATCH (l:Resource)
WHERE l.ligandName IN ['5-nitroso-8-quinolinol', '7-hydroxystaurosporine', 'alisertib', 'AMG 900', 'AR-42',
    'AT7519', 'AZD1152-HQPA', 'AZD5438', 'belinostat', 'Bms-265246', 'CCT129202', 'CCT137690',
    'CYC116', 'dinaciclib', 'ENMD-2076', 'entinostat', 'flavopiridol', 'indisulam', 'M 344',
    'nexturastat A', 'panobinostat', 'PF-03814735', 'PHA-793887', 'pyroxamide', 'R-547',
    'R306465', 'resminostat', 'RGFP966', 'ricolinostat', 'riviciclib', 'SB1317/TG02', 'Sns-032',
    'SNS-314 mesylate', 'tozasertib', 'ZM-447439']
MATCH path = (l)-[:hasLigand]-(i)-[:hasTarget]-(t)-[:hasTargetFamily]-(tf)
    OPTIONAL MATCH (i)-[:hasAction]-(a)
    OPTIONAL MATCH (i)-[:hasAffinity]-(af)
    OPTIONAL MATCH (i)-[:hasReference]-(r)
    OPTIONAL MATCH (l)-[:xref]-(xr)
RETURN path, a, af, r
"""

In [33]:
# Execute the query and display the results
results = run_cypher_query(cypher_query)

In [34]:
from pprint import pprint

pprint(results)

[{'a': {'uri': 'https://rdf.guidetopharmacology.org/ns/gtpo#InhibitionAction'},
  'af': {'hasMedianValue': 7.570000171661377,
         'uri': 'https://rdf.guidetopharmacology.org/GRAC/affinity80785'},
  'path': [{'approved': 'f',
            'canonicalSMILES': 'CNC1CC2OC(C1OC)(C)n1c3ccccc3c3c1c1n2c2ccccc2c1c1c3C(NC1=O)O',
            'comment': '7-hydroxystaurosporine is a cell-permeable <Ligand '
                       'id=346/> derivative, with anticancer activity '
                       '<Reference id=26172/>. It reversibly and '
                       'ATP-competitively inhibits multiple protein kinases, '
                       'including PKC&alpha;, &beta;, &gamma;, &delta; and '
                       '&epsilon; <Reference id=26176/>, Chk1 <Reference '
                       'id=26175/><Reference id=25819/><Reference id=26173/>, '
                       'Cdc25C-associated protein kinase 1 (cTAK1) <Reference '
                       'id=26175/>, Cdk1 <Reference id=25819/>, PAK4,

In [35]:
cypher_query = """
MATCH (l:Resource)
WHERE l.ligandName IN ['5-nitroso-8-quinolinol', '7-hydroxystaurosporine', 'alisertib', 'AMG 900', 'AR-42',
    'AT7519', 'AZD1152-HQPA', 'AZD5438', 'belinostat', 'Bms-265246', 'CCT129202', 'CCT137690',
    'CYC116', 'dinaciclib', 'ENMD-2076', 'entinostat', 'flavopiridol', 'indisulam', 'M 344',
    'nexturastat A', 'panobinostat', 'PF-03814735', 'PHA-793887', 'pyroxamide', 'R-547',
    'R306465', 'resminostat', 'RGFP966', 'ricolinostat', 'riviciclib', 'SB1317/TG02', 'Sns-032',
    'SNS-314 mesylate', 'tozasertib', 'ZM-447439']
MATCH path = (l)-[:hasLigand]-(i)-[:hasTarget]-(t)-[:hasTargetFamily]-(tf)
    OPTIONAL MATCH (i)-[:hasAction]-(a)
    OPTIONAL MATCH (i)-[:hasAffinity]-(af)
    OPTIONAL MATCH (i)-[:hasReference]-(r)
    OPTIONAL MATCH (l)-[:xref]-(xr)
RETURN 
    l.label as Compound,
    l.approved as FDA_Approved,
    t.label as Target,
    tf.label as Target_Family,
    a.uri as Action,
    af.hasMedianValue as Affinity,
    af.hasLowValue as Affinity_Low,
    af.hasHighValue as Affinity_High,
    r.uri as Reference,
    l.comment as Description,
    l.inChIKey as InChIKey,
    l.canonicalSMILES as SMILES,
    xr.uri as CHEMBL
ORDER BY l.ligandName, t.nomenclature
"""

In [36]:
import pandas as pd

# Execute the query and display the results
results = run_cypher_query(cypher_query)

df = pd.DataFrame(results)

In [37]:
df

Unnamed: 0,Compound,FDA_Approved,Target,Target_Family,Action,Affinity,Affinity_Low,Affinity_High,Reference,Description,InChIKey,SMILES,CHEMBL
0,7-hydroxystaurosporine,f,3-phosphoinositide dependent protein kinase 1,PDK1 family,https://rdf.guidetopharmacology.org/ns/gtpo#In...,7.48,,,http://identifiers.org/pubmed/11896604,7-hydroxystaurosporine is a cell-permeable <Li...,PBCZSGKMGDDXIJ-HQCWYSJUSA-N,CNC1CC2OC(C1OC)(C)n1c3ccccc3c3c1c1n2c2ccccc2c1...,http://identifiers.org/chembl.compound/CHEMBL1...
1,7-hydroxystaurosporine,f,"LCK proto-oncogene, Src family tyrosine kinase",Src family,https://rdf.guidetopharmacology.org/ns/gtpo#In...,7.30,,,http://identifiers.org/pubmed/15486189,7-hydroxystaurosporine is a cell-permeable <Li...,PBCZSGKMGDDXIJ-HQCWYSJUSA-N,CNC1CC2OC(C1OC)(C)n1c3ccccc3c3c1c1n2c2ccccc2c1...,http://identifiers.org/chembl.compound/CHEMBL1...
2,7-hydroxystaurosporine,f,checkpoint kinase 1,CHK1 subfamily,https://rdf.guidetopharmacology.org/ns/gtpo#In...,,7.46,8.15,http://identifiers.org/pubmed/10786669,7-hydroxystaurosporine is a cell-permeable <Li...,PBCZSGKMGDDXIJ-HQCWYSJUSA-N,CNC1CC2OC(C1OC)(C)n1c3ccccc3c3c1c1n2c2ccccc2c1...,http://identifiers.org/chembl.compound/CHEMBL1...
3,7-hydroxystaurosporine,f,checkpoint kinase 1,CHK1 subfamily,https://rdf.guidetopharmacology.org/ns/gtpo#In...,,7.46,8.15,http://identifiers.org/pubmed/15486189,7-hydroxystaurosporine is a cell-permeable <Li...,PBCZSGKMGDDXIJ-HQCWYSJUSA-N,CNC1CC2OC(C1OC)(C)n1c3ccccc3c3c1c1n2c2ccccc2c1...,http://identifiers.org/chembl.compound/CHEMBL1...
4,7-hydroxystaurosporine,f,checkpoint kinase 1,CHK1 subfamily,https://rdf.guidetopharmacology.org/ns/gtpo#In...,,7.46,8.15,http://identifiers.org/pubmed/17292828,7-hydroxystaurosporine is a cell-permeable <Li...,PBCZSGKMGDDXIJ-HQCWYSJUSA-N,CNC1CC2OC(C1OC)(C)n1c3ccccc3c3c1c1n2c2ccccc2c1...,http://identifiers.org/chembl.compound/CHEMBL1...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,riviciclib,f,cyclin dependent kinase 9,CDK9 subfamily,https://rdf.guidetopharmacology.org/ns/gtpo#In...,7.70,,,http://identifiers.org/pubmed/17363486,Riviciclib is a flavone based cyclin dependent...,QLUYMIVVAYRECT-OCCSQVGLSA-N,OCC1N(C)CCC1c1c(O)cc(c2c1oc(cc2=O)c1ccccc1Cl)O,
89,tozasertib,f,aurora kinase A,Aurora kinase (Aur) family,https://rdf.guidetopharmacology.org/ns/gtpo#In...,8.41,,,http://identifiers.org/pubmed/22037378,Tozasertib is a potent inhibitor of all three ...,GCIKSSRWRFVXBI-UHFFFAOYSA-N,CN1CCN(CC1)c1nc(Sc2ccc(cc2)NC(=O)C2CC2)nc(c1)N...,http://identifiers.org/chembl.compound/CHEMBL5...
90,tozasertib,f,aurora kinase A,Aurora kinase (Aur) family,https://rdf.guidetopharmacology.org/ns/gtpo#In...,9.22,,,http://identifiers.org/pubmed/14981513,Tozasertib is a potent inhibitor of all three ...,GCIKSSRWRFVXBI-UHFFFAOYSA-N,CN1CCN(CC1)c1nc(Sc2ccc(cc2)NC(=O)C2CC2)nc(c1)N...,http://identifiers.org/chembl.compound/CHEMBL5...
91,tozasertib,f,aurora kinase B,Aurora kinase (Aur) family,https://rdf.guidetopharmacology.org/ns/gtpo#In...,7.74,,,http://identifiers.org/pubmed/14981513,Tozasertib is a potent inhibitor of all three ...,GCIKSSRWRFVXBI-UHFFFAOYSA-N,CN1CCN(CC1)c1nc(Sc2ccc(cc2)NC(=O)C2CC2)nc(c1)N...,http://identifiers.org/chembl.compound/CHEMBL5...


In [38]:
# export the DataFrame to a xlsx file
df.to_excel('ligand_target_data.xlsx', index=False)

In [39]:
driver.close()

### NOTES:  additional tools

#### Nulling out a Neo4j database

In [40]:
# uncomment and use only when required - a quick way to clear the database

# cypher_query = """
# MATCH (n) DETACH DELETE n
# """

# results = run_cypher_query(cypher_query)
# driver.close()

# pprint(results)

## Citation

**BibTeX:**

```
@article{10.1093/nar/gkad944,
    author = {Harding, Simon D and Armstrong, Jane F and Faccenda, Elena and Southan, Christopher and Alexander, Stephen P H and Davenport, Anthony P and Spedding, Michael and Davies, Jamie A},
    title = "{The IUPHAR/BPS Guide to PHARMACOLOGY in 2024}",
    journal = {Nucleic Acids Research},
    volume = {52},
    number = {D1},
    pages = {D1438-D1449},
    year = {2023},
    month = {10},
    abstract = "{The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb; https://www.guidetopharmacology.org) is an open-access, expert-curated, online database that provides succinct overviews and key references for pharmacological targets and their recommended experimental ligands. It includes over 3039 protein targets and 12 163 ligand molecules, including approved drugs, small molecules, peptides and antibodies. Here, we report recent developments to the resource and describe expansion in content over the six database releases made during the last two years. The database update section of this paper focuses on two areas relating to important global health challenges. The first, SARS-CoV-2 COVID-19, remains a major concern and we describe our efforts to expand the database to include a new family of coronavirus proteins. The second area is antimicrobial resistance, for which we have extended our coverage of antibacterials in partnership with AntibioticDB, a collaboration that has continued through support from GARDP. We discuss other areas of curation and also focus on our external links to resources such as PubChem that bring important synergies to the resources.}",
    issn = {0305-1048},
    doi = {10.1093/nar/gkad944},
    url = {https://doi.org/10.1093/nar/gkad944},
    eprint = {https://academic.oup.com/nar/article-pdf/52/D1/D1438/55039511/gkad944.pdf},
}
```