In [0]:
import torch
from torch import Tensor
print(torch.__version__)

2.6.0+cu124


In [0]:

# Install required packages.
import os
os.environ['TORCH'] = torch.__version__

In [0]:
#Connect to the Graph Database
import pandas as pd
import numpy as np
from collections import defaultdict
from gqlalchemy import Memgraph

# Make a connection to the database
memgraph = Memgraph(host='alzkb.ai', port=7687)

# Exploratory Data Analysis Summary

**1. Node‐Type Distribution**

> **Remark:**  
> - The graph is heavily “gene‐centric.”  
> - Only 34 disease nodes versus nearly 200 K gene nodes.  
> - Drugs (16 K) and biological processes (12 K) are the next largest categories.  
> - Diseases and drug‐classes appear very sparsely.

---

**2. Relationship‐Type Breakdown**

- **Total edges (all relation types)**: 1 668 487  
- **Drug→Disease edges** (`DRUGTREATSDISEASE` + `DRUGCAUSESEFFECT`): only 11  

> **Remark:**  
> - Nearly all edges involve Gene→(something) or Gene‐related relations.  
> - Only 9 “DRUGTREATSDISEASE” + 2 “DRUGCAUSESEFFECT” links exist, highlighting an extreme class imbalance for any Drug→Disease link‐prediction task.

In [0]:
labels = memgraph.execute_and_fetch(
    "MATCH (n) UNWIND labels(n) AS lab RETURN DISTINCT lab"
)
nodes = [row["lab"] for row in labels]
print("Node labels:", nodes)


Node labels: ['Drug', 'Gene', 'BiologicalProcess', 'Pathway', 'MolecularFunction', 'CellularComponent', 'Symptom', 'BodyPart', 'DrugClass', 'Disease', 'TranscriptionFactor']


In [0]:
# 1.2 Count nodes per label
counts = memgraph.execute_and_fetch(
    "MATCH (n) UNWIND labels(n) AS lab "
    "RETURN lab, count(*) AS cnt ORDER BY cnt DESC"
)
df_counts = pd.DataFrame(counts)
print(df_counts)


                    lab     cnt
0                  Gene  193279
1                  Drug   16581
2     BiologicalProcess   12322
3               Pathway    4516
4     MolecularFunction    3460
5     CellularComponent    1695
6              BodyPart     652
7   TranscriptionFactor     519
8               Symptom     505
9             DrugClass     474
10              Disease      34


In [0]:
# 1.3 List all relationship types
rels = memgraph.execute_and_fetch(
    "MATCH ()-[r]->() RETURN DISTINCT type(r) AS rel"
)
print("Rel types:", [row["rel"] for row in rels])

Rel types: ['TRANSCRIPTIONFACTORINTERACTSWITHGENE', 'BODYPARTOVEREXPRESSESGENE', 'BODYPARTUNDEREXPRESSESGENE', 'CHEMICALBINDSGENE', 'CHEMICALINCREASESEXPRESSION', 'CHEMICALDECREASESEXPRESSION', 'GENEREGULATESGENE', 'GENEINTERACTSWITHGENE', 'GENECOVARIESWITHGENE', 'GENEPARTICIPATESINBIOLOGICALPROCESS', 'GENEINPATHWAY', 'GENEHASMOLECULARFUNCTION', 'GENEASSOCIATEDWITHCELLULARCOMPONENT', 'DISEASELOCALIZESTOANATOMY', 'DRUGINCLASS', 'GENEASSOCIATESWITHDISEASE', 'DRUGTREATSDISEASE', 'DRUGCAUSESEFFECT', 'SYMPTOMMANIFESTATIONOFDISEASE']


In [0]:
# 1.4 Count relationships per type
rel_counts = memgraph.execute_and_fetch(
    "MATCH ()-[r]->() RETURN type(r) AS rel, count(*) AS cnt "
    "ORDER BY cnt DESC"
)
df_rel_counts = pd.DataFrame(rel_counts)
print(df_rel_counts)

                                     rel     cnt
0    GENEPARTICIPATESINBIOLOGICALPROCESS  548285
1                      GENEREGULATESGENE  263978
2                          GENEINPATHWAY  178991
3                  GENEINTERACTSWITHGENE  147088
4               GENEHASMOLECULARFUNCTION  104752
5             BODYPARTUNDEREXPRESSESGENE  102185
6              BODYPARTOVEREXPRESSESGENE   97772
7    GENEASSOCIATEDWITHCELLULARCOMPONENT   88880
8                   GENECOVARIESWITHGENE   61606
9                      CHEMICALBINDSGENE   25726
10           CHEMICALDECREASESEXPRESSION   21051
11           CHEMICALINCREASESEXPRESSION   18713
12  TRANSCRIPTIONFACTORINTERACTSWITHGENE    6910
13                           DRUGINCLASS    1945
14             GENEASSOCIATESWITHDISEASE     508
15         SYMPTOMMANIFESTATIONOFDISEASE      53
16             DISEASELOCALIZESTOANATOMY      33
17                     DRUGTREATSDISEASE       9
18                      DRUGCAUSESEFFECT       2


In [0]:
# 2) Compute total number of edges by summing the counts
total_edges = df_rel_counts["cnt"].astype(int).sum()
print("total_edges", total_edges)

total_edges 1668487


In [0]:
# 2.1 Node property keys
prop_keys = memgraph.execute_and_fetch(
    "MATCH (n) UNWIND keys(n) AS key RETURN DISTINCT key"
)
print("Node props:", [row["key"] for row in prop_keys])

Node props: ['nodeID', 'xrefCasRN', 'xrefDrugbank', 'commonName', 'sourceDatabase', 'typeOfGene', 'xrefEnsembl', 'xrefNcbiGene', 'geneSymbol', 'chromosome', 'xrefOMIM', 'xrefHGNC', 'xrefGeneOntology', 'pathwayId', 'pathwayName', 'xrefMeSH', 'xrefUberon', 'xrefNciThesaurus', 'xrefDiseaseOntology', 'xrefUmlsCUI', 'TF']


Disease

**Key Points:**  
1. All 11 Drug→Disease edges target the same canonical label “Alzheimer’s Disease” (rather than any of the more specific subtypes).  
2. Out of those 11, 9 edges have type `DRUGTREATSDISEASE` and 2 edges have type `DRUGCAUSESEFFECT`.  
3. No drugs are linked to any of the 33 other Alzheimer’s variants—only the generic “Alzheimer’s Disease” node is used.

---

**Implications for Link-Prediction**

- **Extreme Class Imbalance:**  
  - There are 34 disease nodes but only a single canonical “Alzheimer’s Disease” node is actually connected to drugs.
  - With 1 668 487 total edges in the graph, these 11 Drug→Disease edges are vanishingly rare.

- **Focused Task:**  
  - In practice, any Drug→Disease link-prediction model will primarily attempt to recover “Alzheimer’s Disease” edges, since *no* other disease variants have drug links.



In [0]:
import pandas as pd

# 1) How many distinct Disease nodes do we have?
count_res = list(memgraph.execute_and_fetch("""
  MATCH (d:Disease)
  RETURN COUNT(d) AS num_diseases
"""))
num_diseases = count_res[0]["num_diseases"]
print(f"Total distinct diseases in the graph: {num_diseases}")

# 2) List all disease names
rows = list(memgraph.execute_and_fetch("""
  MATCH (d:Disease)
  RETURN DISTINCT d.commonName AS disease
  ORDER BY d.commonName
"""))
diseases = [row['disease'] for row in rows]
print("disease: ", diseases)

Total distinct diseases in the graph: 34
disease:  ['ALZHEIMER DISEASE 10', 'ALZHEIMER DISEASE 18', 'ALZHEIMER DISEASE 19', 'ALZHEIMER DISEASE 2', 'ALZHEIMER DISEASE 4', 'ALZHEIMER DISEASE 5', 'ALZHEIMER DISEASE 6, LATE-ONSET', 'ALZHEIMER DISEASE 9, SUSCEPTIBILITY TO', 'ALZHEIMER DISEASE, FAMILIAL, 1', 'ALZHEIMER DISEASE, FAMILIAL, 3, WITH SPASTIC PARAPARESIS', 'ALZHEIMER DISEASE, FAMILIAL, 3, WITH UNUSUAL PLAQUES', 'ALZHEIMER DISEASE, FAMILIAL, WITH SPASTIC PARAPARESIS AND UNUSUAL PLAQUES', 'ALZHEIMER DISEASE, SUSCEPTIBILITY TO', 'ALZHEIMER DISEASE, SUSCEPTIBILITY TO, MITOCHONDRIAL', 'Alzheimer Disease 12', 'Alzheimer Disease 14', 'Alzheimer Disease 7', 'Alzheimer Disease 9', 'Alzheimer Disease, Early Onset', 'Alzheimer Disease, Familial, 3, with Spastic Paraparesis and Apraxia', 'Alzheimer Disease, Familial, 3, with Spastic Paraparesis and Unusual Plaques', 'Alzheimer Disease, Late Onset', 'Alzheimer disease type 1', 'Alzheimer disease, familial, type 3', "Alzheimer's Disease", "Alzh

In [0]:
import pandas as pd

# 1) Raw Drug→Disease edges
rows = list(memgraph.execute_and_fetch("""
  MATCH (dr:Drug)-[r]->(di:Disease)
  RETURN dr.commonName   AS drug,
         type(r)         AS relation,
         di.commonName   AS disease
  ORDER BY disease, drug
"""))
df = pd.DataFrame(rows)
print(df)

            drug           relation              disease
0   Benzatropine  DRUGTREATSDISEASE  Alzheimer's Disease
1      Donepezil  DRUGTREATSDISEASE  Alzheimer's Disease
2    Galantamine  DRUGTREATSDISEASE  Alzheimer's Disease
3    Haloperidol  DRUGTREATSDISEASE  Alzheimer's Disease
4      Memantine  DRUGTREATSDISEASE  Alzheimer's Disease
5     Naltrexone   DRUGCAUSESEFFECT  Alzheimer's Disease
6     Quetiapine  DRUGTREATSDISEASE  Alzheimer's Disease
7   Rivastigmine  DRUGTREATSDISEASE  Alzheimer's Disease
8     Ropinirole  DRUGTREATSDISEASE  Alzheimer's Disease
9     Selegiline  DRUGTREATSDISEASE  Alzheimer's Disease
10   Ziprasidone   DRUGCAUSESEFFECT  Alzheimer's Disease


Drug & Gene


Among all outgoing edges from Drugs, only 9 edges are of type **DRUGTREATSDISEASE** and 2 edges are of type **DRUGCAUSESEFFECT** (i.e., only 11 total Drug→Disease links). But there are quite few numbers of **GENEASSOCIATESWITHDISEASE** interactions.

In [0]:
import pandas as pd

# Aggregate outgoing edges from every Drug
query = """
MATCH (d:Drug)-[r]->(n)
RETURN type(r) AS rel, labels(n) AS target_labels, COUNT(*) AS cnt
ORDER BY cnt DESC
"""
rows = list(memgraph.execute_and_fetch(query))
df_out = pd.DataFrame(rows)
print(df_out)

                           rel target_labels    cnt
0            CHEMICALBINDSGENE        [Gene]  25726
1  CHEMICALDECREASESEXPRESSION        [Gene]  21051
2  CHEMICALINCREASESEXPRESSION        [Gene]  18713
3                  DRUGINCLASS   [DrugClass]   1945
4            DRUGTREATSDISEASE     [Disease]      9
5             DRUGCAUSESEFFECT     [Disease]      2


In [0]:
import pandas as pd

# Aggregate outgoing edges from every Gene
query = """
MATCH (d:Gene)-[r]->(n)
RETURN type(r) AS rel, labels(n) AS target_labels, COUNT(*) AS cnt
ORDER BY cnt DESC
"""
rows = list(memgraph.execute_and_fetch(query))
df_out = pd.DataFrame(rows)
print(df_out)


                                   rel        target_labels     cnt
0  GENEPARTICIPATESINBIOLOGICALPROCESS  [BiologicalProcess]  548285
1                    GENEREGULATESGENE               [Gene]  263978
2                        GENEINPATHWAY            [Pathway]  178991
3                GENEINTERACTSWITHGENE               [Gene]  147088
4             GENEHASMOLECULARFUNCTION  [MolecularFunction]  104752
5  GENEASSOCIATEDWITHCELLULARCOMPONENT  [CellularComponent]   88880
6                 GENECOVARIESWITHGENE               [Gene]   61606
7            GENEASSOCIATESWITHDISEASE            [Disease]     508


### Multi‐Hop Drug–Disease Relations

**Potential Strategy:**  
The graph contains a very large number of drug–gene and gene–gene edges but only a handful of direct drug–disease links.  We can exploit this by **inserting one or more gene nodes as intermediate hops** to uncover many more candidate Drug→Disease connections.

> _For example_, if we allow exactly one intermediate `Gene` node (i.e., Drug→Gene→Disease), we find **4352** potential Drug→Disease pairs that are not directly linked.

**Idea:**  
Use these **higher‐order paths** (e.g., Drug→Gene→Disease or Drug→Gene→Process→Gene→Disease) as features or as positive/negative signals in a supervised link‐prediction model.  In practice:

1. Enumerate all simple Gene‐mediated two‐hop paths (Drug→Gene→Disease).  
2. Count (or score) how many distinct gene intermediates connect each Drug–Disease pair.  
3. Use that count (or a weighted path score) as a predictor:  
   - A high number of gene‐mediated paths suggests a likely Drug–Disease association.  
   - Zero gene-mediated paths suggests a strong negative signal.  

By integrating these multi-hop path features, we can overcome the scarcity of direct Drug→Disease edges and dramatically expand our training set for downstream link‐prediction.  

In [0]:
import pandas as pd

# 1) Fetch every Drug–Gene–Disease two-hop
query = """
MATCH (d:Drug)-[:CHEMICALBINDSGENE|
               :CHEMICALINCREASESEXPRESSION|
               :CHEMICALDECREASESEXPRESSION]->(g:Gene)
MATCH (g)-[:GENEASSOCIATESWITHDISEASE]->(x:Disease)
RETURN d.commonName    AS drug,
       x.commonName    AS disease,
       COUNT(DISTINCT g) AS via_genes
ORDER BY via_genes DESC
"""
rows = list(memgraph.execute_and_fetch(query))

# 2) Load into a DataFrame
df = pd.DataFrame(rows)
print(f"Found {len(df)} distinct Drug→Disease candidates via genes\n")

Found 4352 distinct Drug→Disease candidates via genes



All common exploratory path between Drug and Disease

In [0]:
import pandas as pd
from functools import reduce

# Define each meta‐path by (query_fragment, column_name)
meta_paths = {
    "via_genes": (
        # 1) Drug→Gene via any of the three chemical relations
        """
        MATCH (dr:Drug)-[r]->(g:Gene)
        WHERE type(r) IN [
          'CHEMICALBINDSGENE',
          'CHEMICALINCREASESEXPRESSION',
          'CHEMICALDECREASESEXPRESSION'
        ]
        MATCH (g)-[:GENEASSOCIATESWITHDISEASE]->(dx:Disease)
        RETURN dr.commonName AS drug,
               dx.commonName AS disease,
               COUNT(DISTINCT g) AS via_genes
        """, "via_genes"
    ),
    "via_processes": (
        """
        MATCH (dr:Drug)-[r1:CHEMICALBINDSGENE]->(g:Gene)
        MATCH (g)-[:GENEPARTICIPATESINBIOLOGICALPROCESS]->(bp:BiologicalProcess)
        MATCH (bp)<-[:GENEPARTICIPATESINBIOLOGICALPROCESS]-(g2:Gene)
        MATCH (g2)-[:GENEASSOCIATESWITHDISEASE]->(dx:Disease)
        RETURN dr.commonName AS drug,
               dx.commonName AS disease,
               COUNT(DISTINCT bp) AS via_processes
        """, "via_processes"
    ),
    "via_pathways": (
        """
        MATCH (dr:Drug)-[r1:CHEMICALBINDSGENE]->(g:Gene)
        MATCH (g)-[:GENEINPATHWAY]->(pw:Pathway)
        MATCH (pw)<-[:GENEINPATHWAY]-(g2:Gene)
        MATCH (g2)-[:GENEASSOCIATESWITHDISEASE]->(dx:Disease)
        RETURN dr.commonName AS drug,
               dx.commonName AS disease,
               COUNT(DISTINCT pw) AS via_pathways
        """, "via_pathways"
    ),
    "via_molecular_func": (
        """
        MATCH (dr:Drug)-[r1:CHEMICALBINDSGENE]->(g:Gene)
        MATCH (g)-[:GENEHASMOLECULARFUNCTION]->(mf:MolecularFunction)
        MATCH (mf)<-[:GENEHASMOLECULARFUNCTION]-(g2:Gene)
        MATCH (g2)-[:GENEASSOCIATESWITHDISEASE]->(dx:Disease)
        RETURN dr.commonName AS drug,
               dx.commonName AS disease,
               COUNT(DISTINCT mf) AS via_molecular_func
        """, "via_molecular_func"
    ),
    "via_cellular_comp": (
        """
        MATCH (dr:Drug)-[r1:CHEMICALBINDSGENE]->(g:Gene)
        MATCH (g)-[:GENEASSOCIATEDWITHCELLULARCOMPONENT]->(cc:CellularComponent)
        MATCH (cc)<-[:GENEASSOCIATEDWITHCELLULARCOMPONENT]-(g2:Gene)
        MATCH (g2)-[:GENEASSOCIATESWITHDISEASE]->(dx:Disease)
        RETURN dr.commonName AS drug,
               dx.commonName AS disease,
               COUNT(DISTINCT cc) AS via_cellular_comp
        """, "via_cellular_comp"
    ),
    "via_TF": (
        """
        MATCH (dr:Drug)-[r1:CHEMICALBINDSGENE]->(tf:Gene)
        WHERE tf.TF = true
        MATCH (tf)-[:GENEREGULATESGENE]->(g:Gene)
        MATCH (g)-[:GENEASSOCIATESWITHDISEASE]->(dx:Disease)
        RETURN dr.commonName AS drug,
               dx.commonName AS disease,
               COUNT(DISTINCT tf) AS via_TF
        """, "via_TF"
    ),
    "via_drugclass": (
        """
        MATCH (dr:Drug)-[:DRUGINCLASS]->(dc:DrugClass)
        MATCH (dc)-[:DRUGINCLASS]->(d2:Drug)
        MATCH (d2)-[:CHEMICALBINDSGENE]->(g:Gene)
        MATCH (g)-[:GENEASSOCIATESWITHDISEASE]->(dx:Disease)
        RETURN dr.commonName AS drug,
               dx.commonName AS disease,
               COUNT(DISTINCT dc) AS via_drugclass
        """, "via_drugclass"
    ),
}

# 2) Run each query and collect into DataFrames
dfs = []
for cypher, col in meta_paths.values():
    rows = list(memgraph.execute_and_fetch(cypher))
    df = pd.DataFrame(rows)
    dfs.append(df)

for i, df in enumerate(dfs):
    print(f"DF {i} columns:", df.columns.tolist())

DF 0 columns: ['drug', 'disease', 'via_genes']
DF 1 columns: ['drug', 'disease', 'via_processes']
DF 2 columns: ['drug', 'disease', 'via_pathways']
DF 3 columns: ['drug', 'disease', 'via_molecular_func']
DF 4 columns: ['drug', 'disease', 'via_cellular_comp']
DF 5 columns: []
DF 6 columns: []


In [0]:
import pandas as pd
from functools import reduce

dfs = []
for name, (cypher, col) in meta_paths.items():
    rows = list(memgraph.execute_and_fetch(cypher))
    if rows:
        df = pd.DataFrame(rows)
    else:
        # create an empty DataFrame with the right schema
        df = pd.DataFrame(columns=["drug","disease", name])
    dfs.append(df)


In [0]:
df_all = reduce(
    lambda left, right: pd.merge(left, right, on=["drug","disease"], how="outer"),
    dfs
)

# fill NaNs in each path‐count column
for name in meta_paths.keys():
    df_all[name] = df_all[name].fillna(0).astype(int)

# optional any‐path sum
df_all["via_any"] = df_all[list(meta_paths.keys())].sum(axis=1)



In [0]:
# 3) Fill NaN with 0 in all meta‐path count columns
for _, col_name in meta_paths.values():
    if col_name in df_all.columns:
        df_all[col_name] = df_all[col_name].fillna(0).astype(int)
    else:
        # If a particular meta‐path column never appeared, create it filled with zeros
        df_all[col_name] = 0

# 4) Create a "total_paths" column summing all meta‐path counts for each (drug, disease)
via_columns = [col for _, col in meta_paths.values()]
df_all["total_paths"] = df_all[via_columns].sum(axis=1)

# 5) Compute summary statistics
total_pairs = len(df_all)
pairs_with_any_path = (df_all["total_paths"] > 0).sum()
total_meta_paths = df_all["total_paths"].sum()

print(f"Total distinct Drug→Disease pairs discovered by any meta‐path: {total_pairs}")
print(f"Number of pairs with at least one meta‐path: {pairs_with_any_path}")
print(f"Total meta‐path counts across all pairs: {total_meta_paths}")

# 6) (Optional) Inspect the top 10 Drug→Disease pairs by total_paths
top_10 = df_all.sort_values("total_paths", ascending=False).head(10)
print("\nTop 10 (drug, disease) by total meta‐paths:")
print(top_10[["drug", "disease", "total_paths"]])

Total distinct Drug→Disease pairs discovered by any meta‐path: 27983
Number of pairs with at least one meta‐path: 27983
Total meta‐path counts across all pairs: 7144546

Top 10 (drug, disease) by total meta‐paths:
              drug                           disease  total_paths
3934  Fostamatinib               Alzheimer's Disease         4032
389   Fostamatinib  Familial Alzheimer Disease (FAD)         4026
984   Fostamatinib  Alzheimer's Disease, Focal Onset         4013
1683  Fostamatinib     Alzheimer Disease, Late Onset         4013
2586  Fostamatinib    Alzheimer Disease, Early Onset         4013
3933    Fedratinib               Alzheimer's Disease         3890
391     Fedratinib  Familial Alzheimer Disease (FAD)         3884
1499     Sunitinib               Alzheimer's Disease         3882
358      Sunitinib  Familial Alzheimer Disease (FAD)         3876
2228    Fedratinib  Alzheimer's Disease, Focal Onset         3872



**Observations:**

1. **Drug “Fostamatinib”** appears at the very top for multiple Alzheimer variants.  
   - It has around 4000 distinct gene‐ or pathway‐mediated “connectors” to *each* of several Alzheimer labels.  
   - This suggests that in our heterogenous graph, Fostamatinib’s target genes co‐occur (via expression/regulation) with many Alzheimer‐associated genes, biological processes, or pathways.

2. **“Fedratinib” and “Sunitinib”** also show up with similarly large meta‐path counts (≈3800–3900).  
   - These drugs bind or regulate a set of genes that overlap heavily with Alzheimer‐related gene sub‐networks or share common pathways.  

3. **High Meta‐Path Counts Imply Strong Indirect Evidence**  
   - A “total_paths” of ~4000 means that there are thousands of distinct Gene→…→Disease routes.  
   - In a supervised link‐prediction pipeline, such a drug–disease pair would be considered highly promising (high positive signal) even if no direct `DRUGTREATSDISEASE` edge exists in the original data.

4. **Breadth vs. Specificity**  
   - Notice that **Fostamatinib**, for example, has similarly large `total_paths` to *multiple* Alzheimer variants (generic “Alzheimer’s Disease,” “Familial Alzheimer Disease,” “Early Onset,” “Late Onset,” etc.).  
   - This indicates that the underlying gene signals are not uniquely pointing to just “Alzheimer’s Disease”—instead, they cover a broader spectrum of related disease‐nodes that share many of the same gene‐associations.

### Future Implications

- **Candidate Ranking:**  
  If we want to predict new Drug→Disease edges, we could rank drug–disease pairs by their `total_paths` score.  For instance, Fostamatinib→Alzheimer’s has the highest ranking, so it would be one of our top candidates for “possible treat” edges.  

- **Filtering Threshold:**  
  Given that 27 983 drug–disease pairs have **at least one** meta‐path, but only 11 true Drug→Disease edges are known, we need to pick a sensible cutoff.  
  - For example, we might only consider pairs with `total_paths ≥ 1000` to limit the candidate set to a few hundred top pairs.

- **Feature Engineering:**  
  In a downstream classifier (e.g. a logistic regression or MLP), use `total_paths`—or individual meta‐path counts (`via_genes`, `via_processes`, etc.)—as numeric features.  Pairs with unusually high path counts should receive higher model scores.

- **Interpretability:**  
  Because each meta‐path type (e.g. via_genes, via_pathways) is a separate column, we can inspect which “channels” (genes vs. pathways vs. molecular functions, etc.) contribute most strongly to any given drug–disease candidate.  That provides a mechanistic justification—e.g., “Fostamatinib targets Gene X, which participates in Pathway Y, which is known to be dysregulated in Alzheimer’s.”

---

**Conclusion:**  
The meta‐path enumeration reveals nearly **28,000** possible Drug→Disease pairs with nonzero “support” via genes or related biological entities.  By ranking on the total number of meta‐paths, we obtain a prioritized list of candidates—led by drugs like Fostamatinib and Fedratinib—that can serve as a rich training set for supervised link‐prediction or as a set of hypotheses for further biological validation.