# Adding a biochemical component to the Knowledge Graph
![kg](imgs/visualisation.png)

## Enables compound-centered queries
![cquery](imgs/compound_query.png)

### Molecular similarity is precomputed
![similarity](imgs/tanimoto_distance.png)

## Database: ModelSEED

* 43774 reactions
* 20582 compounds

Reactions and compounds are invariant across species, thus there is a fixed amount of reactions which are shared among taxa.

## Load ModelSEED data

In [1]:
from src.utils import extract_data, compute_fingerprint_distances

reactions_path = "/home/robaina/Documents/NewAtlantis/enzyme_activity/notebooks/data/annotations/modelseed/reactions.json"
compounds_path = "/home/robaina/Documents/NewAtlantis/enzyme_activity/notebooks/data/annotations/modelseed/compounds.json"

n = 100 # number of reactions and compounds in KG
distance_threshold = 0.2

reactions, compounds = extract_data(reactions_path, compounds_path, n)
chemical_distances = compute_fingerprint_distances(compounds)

## Load NEO4J credentials

In [2]:
from dotenv import load_dotenv
import os

load_dotenv()
uri = os.getenv("NEO4J_URI")
username = os.getenv("NEO4J_USERNAME")
password = os.getenv("NEO4J_PASSWORD")

## Build and load the KG to AuraDB

In [3]:
from neo4j import GraphDatabase
from src.utils import (
    parse_reaction_equation
    )
from src.neo4j import (
    create_compound,
    create_reaction,
    create_participates_in_relationship,
    create_substrate_of_relationship,
    create_product_of_relationship,
    create_chemically_similar_relationship
)

reactions_path = "/home/robaina/Documents/NewAtlantis/enzyme_activity/notebooks/data/annotations/modelseed/reactions.json"
compounds_path = "/home/robaina/Documents/NewAtlantis/enzyme_activity/notebooks/data/annotations/modelseed/compounds.json"


driver = GraphDatabase.driver(uri, auth=(username, password))
# Creating Nodes and Relationships in Neo4j
with driver.session() as session:
    # Create Compound Nodes
    for compound in compounds:
        session.execute_write(create_compound, compound)

    # Create CHEMICALLY_SIMILAR Relationships in Neo4j
    for compound1_id, compound2_id, distance in chemical_distances:
        if distance <= distance_threshold:
            session.execute_write(
                create_chemically_similar_relationship,
                compound1_id, compound2_id, distance
                )

    # Create Reaction Nodes and Relationships
    for reaction in reactions:
        session.execute_write(create_reaction, reaction)
        substrates, products = parse_reaction_equation(reaction)

        # Create PARTICIPATES_IN Relationships
        for compound_id in reaction["compound_ids"].split(";"):
            session.execute_write(create_participates_in_relationship, compound_id, reaction["id"])

        # Create SUBSTRATE_OF and PRODUCT_OF Relationships
        for substrate_id, stoichiometry in substrates:
            session.execute_write(create_substrate_of_relationship, substrate_id, reaction["id"], stoichiometry)
        for product_id, stoichiometry in products:
            session.execute_write(create_product_of_relationship, product_id, reaction["id"], stoichiometry)

driver.close()

## Find chemically similar compounds

```python
query = (
    "MATCH (c:Compound {id: $compound_id})-[:CHEMICALLY_SIMILAR {distance: $threshold}]->(similar:Compound) "
    "RETURN similar.id, similar.name, similar.smiles"
)
```

In [4]:
from src.queries import find_chemically_similar_compounds

compound_id = "cpd00304" #"cpd00002"
compound_name = [c for c in compounds if c["id"] == compound_id][0]["name"]
print(f"Target compound: {compound_name}")
similar_compounds = find_chemically_similar_compounds(uri, username, password, compound_id)
for compound in similar_compounds:
    print(compound)

Target compound: Retinal
('cpd01420', 'beta-Carotene', 'CC1=C(/C=C/C(C)=C/C=C/C(C)=C/C=C/C=C(C)/C=C/C=C(C)/C=C/C2=C(C)CCCC2(C)C)C(C)(C)CCC1')


## Find reactions and enzymes producing chemically similar compounds to target compound

```python
query = (
    "MATCH (c:Compound {id: $target_compound_id})-[:CHEMICALLY_SIMILAR]->(similar:Compound), "
    "(similar)-[:PRODUCT_OF]->(r:Reaction) "
    "RETURN r.id AS reaction_id, r.name AS reaction_name, similar.id AS similar_compound_id, "
    "similar.name AS similar_compound_name, similar.smiles AS similar_compound_smiles"
)
```

In [5]:
from src.queries import find_reactions_with_similar_product_compounds

compound_id = "cpd00069"
compound_name = [c for c in compounds if c["id"] == compound_id][0]["name"]
print(f"Target compound: {compound_name}")

reactions = find_reactions_with_similar_product_compounds(uri, username, password, compound_id)
reactions[0]

Target compound: L-Tyrosine


('rxn00024',
 '1,2-Benzenediol:oxygen oxidoreductase',
 'cpd00291',
 'L-Dopa',
 '[NH3+][C@@H](Cc1ccc(O)c(O)c1)C(=O)[O-]')

## Find thermodynamically feasible reactions converting target compound

```python
query = (
    "MATCH (c:Compound {id: $compound_id})-[:PARTICIPATES_IN]->(r:Reaction) "
    "WHERE r.deltag < 0 "
    "RETURN r.id, r.name, r.deltag"
)
```

In [6]:
from src.queries import find_negative_deltag_reactions_of_compound

compound_id = "cpd00002"
print(f'Querying for compound {compound_id}, {[compound["name"] for compound in compounds if compound["id"] == compound_id][0]}')
# Find reactions for the specified compound with negative deltag
reactions = find_negative_deltag_reactions_of_compound(uri, username, password, compound_id)
for reaction_id, reaction_name, deltag in reactions:
    print(f"Reaction ID: {reaction_id}, Name: {reaction_name}, DeltaG: {deltag}")

Querying for compound cpd00002, ATP
Reaction ID: rxn00061, Name: ATP diphosphohydrolase (phosphate-forming), DeltaG: -11.9
Reaction ID: rxn00062, Name: ATP phosphohydrolase, DeltaG: -6.16
Reaction ID: rxn00063, Name: ATP diphosphohydrolase (diphosphate-forming), DeltaG: -8.43
Reaction ID: rxn00064, Name: ATP aminohydrolase, DeltaG: -13.75
Reaction ID: rxn00077, Name: ATP:NAD+ 2'-phosphotransferase, DeltaG: -3.14
Reaction ID: rxn00078, Name: ATP:NADH 2'-phosphotransferase, DeltaG: -3.14
Reaction ID: rxn00097, Name: ATP:AMP phosphotransferase, DeltaG: -0.42
Reaction ID: rxn00098, Name: Adenosine-tetraphosphate phosphohydrolase, DeltaG: -7.49
Reaction ID: rxn00099, Name: (6S)-6-Hydroxy-1,4,5,6-tetrahydronicotinamide-adenine-dinucleotide hydro-lyase (ATP-hydrolysing), DeltaG: -11.69
