# RedrugAI

## Introduction

This notebook demonstrates **RedrugAI**, a drug repurposing recommendation system built on Google BigQuery and BigQuery AI. The system leverages machine learning embeddings and vector similarity search to identify potential therapeutic applications for existing drugs in new disease contexts.

### Dataset
- **Open Targets Platform**: A comprehensive biomedical database containing drug-disease associations, molecular targets, and mechanisms of action accessible through BigQuery's public datasets

### Key BigQuery AI Features
 
- **ML.GENERATE_EMBEDDING**: Leverages Gemini embedding models to create high-quality semantic vectors for diseases and drug mechanisms of action
- **VECTOR_SEARCH**: Utilizes BigQuery's native vector similarity functions for similar diseases.
- **bigframes.bigquery.create_vector_index()**: Creates optimized vector indices for efficient similarity search across disease embeddings (38,959 diseases)

### Workflow Overview

1. **Data Preparation**: Create embedding tables for diseases and drug mechanisms using BigQuery AI
2. **Similarity Computation**: Build similarity matrices using vector search capabilities  
3. **Recommendation Engine**: Score candidate drugs based on known therapeutic relationships
4. **Evaluation**: Validate recommendations against known drug-disease associations

This approach enables researchers to discover novel therapeutic opportunities by identifying drugs with similar mechanisms or targets that could be effective for related diseases.

```mermaid
graph TD
    subgraph BigQuery Environment
        A[BigQuery Public Dataset<br>(Open Targets Platform)] --> B{BigQuery AI<br>(BQML)};
        B --> C[Disease Embeddings Table];
        B --> D[Drug MoA Embeddings Table];
        C -- Vector Index --> E[VECTOR_SEARCH];
        D -- Self Join --> F[MoA Similarity Matrix];
    end

    subgraph Python Application (main.ipynb)
        G[User Input: Disease Name] --> H{Recommendation Engine};
        E -- Similar Diseases --> H;
        F -- MoA Similarities --> H;
        A -- Drug & Disease Data --> H;
        H -- Calls --> I[Scoring Logic<br>(score.py)];
        I --> J[Ranked Drug Recommendations];
    end

    subgraph Evaluation
        K[Known Drug-Disease<br>Associations] --> L{Evaluation Logic<br>(evaluation.py)};
        J -- Recommendations to Evaluate --> L;
        L --> M[Evaluation Metrics];
    end

    style BigQuery Environment fill:#e6f7ff,stroke:#333,stroke-width:2px
    style Python Application (main.ipynb) fill:#e6ffe6,stroke:#333,stroke-width:2px
    style Evaluation fill:#fff0e6,stroke:#333,stroke-width:2px
```

## Set up
### Biquery 
- Create Biquery project 'redrugai' and dataset 'redrugai_data'
- Create remote vertex model for text embeddings
   https://cloud.google.com/bigquery/docs/generate-text-embedding#console_1
### Python
version >= 3.12.7

In [None]:
%pip install --upgrade bigframes
%pip install tqdm

In [1]:
import bigframes.bigquery as bbq
import bigframes.pandas as bpd
region = "US"
project_id = "redrugai"
dataset_id = "redrugai_data"
embedding_model_name = 'embedding005'
source_project_id = "bigquery-public-data"
source_dataset_id = "open_targets_platform"
# Configure BigQuery client  
bpd.options.bigquery.project = project_id
bpd.options.bigquery.location = region


## Prepare Tables
- Only need to run once
- Purpose: Pre-build embedding tables and similarity matrices for efficient vector search on public datasets

### Embedding Table
1. disease_embedding table
    - Source Table: disease.
    - Embedding(`ML.GENERATE_EMBEDDING`) Column: name, synonyms, description 
    - Build index to speed up search: `bigframes.bigquery.create_vector_index()`
2. drug_mechanism_of_action_embedding table
    - Target Table: drug_mechanism_of_action. 
    - Convert Column: 'MoA'

In [9]:
# Create disease embeddings table
source_table_name = "disease"
embedding_table_name = "disease_embedding"

# Create table with disease embeddings by concatenating name, synonyms, and description
query_create_embeddings = f"""
CREATE OR REPLACE TABLE `{project_id}.{dataset_id}.{embedding_table_name}` AS
WITH source_table AS (
  SELECT
    d.id,
    d.name,
    d.synonyms,
    CONCAT(
      'Name: ', d.name,
      IFNULL(CONCAT('. Synonyms: ', STRING_AGG(DISTINCT syn.element, ', ')), ''),
      IFNULL(CONCAT('. Description: ', d.description), '')
    ) AS content
  FROM `{source_project_id}.{source_dataset_id}.{source_table_name}` AS d
  LEFT JOIN UNNEST(ARRAY_CONCAT(
    IFNULL(d.synonyms.hasExactSynonym.list, []),
    IFNULL(d.synonyms.hasRelatedSynonym.list, []),
    IFNULL(d.synonyms.hasNarrowSynonym.list, []),
    IFNULL(d.synonyms.hasBroadSynonym.list, [])
  )) AS syn
  GROUP BY d.id, d.name, d.description, d.synonyms
)
SELECT
  s.id,
  s.name,
  s.synonyms,
  e.ml_generate_embedding_result AS embedding
FROM
  source_table s
JOIN
  ML.GENERATE_EMBEDDING(
    MODEL `{project_id}.{dataset_id}.{embedding_model_name}`,
    (SELECT id, content FROM source_table),
    STRUCT(TRUE AS flatten_json_output)
  ) e
ON s.id = e.id
"""

# Execute the query to create the embeddings table
bpd.read_gbq(query_create_embeddings)
print("Disease embeddings table created successfully!")


Disease embeddings table created successfully!


In [10]:
# Create a vector index for efficient searching on the disease embeddings table
# Note: BigQuery requires minimum 5000 rows for IVF index type
# If table has fewer rows, use VECTOR_SEARCH function directly without index
full_table_id = f"{project_id}.{dataset_id}.{embedding_table_name}"
print(full_table_id)

try:
    bbq.create_vector_index(
        table_id=full_table_id,
        column_name='embedding',
    )
    print("Vector index created successfully!")
except Exception as e:
    print(f"Note: Vector index creation failed - {str(e)}")
    print("This is expected for tables with < 5000 rows. VECTOR_SEARCH will work without index.")


redrugai.redrugai_data.disease_embedding
Vector index created successfully!


In [None]:
# Create unique mechanism of action (MoA) embedding table
source_table_name = "drug_mechanism_of_action"
embedding_table_name = "drug_moa_embedding"

# Build embeddings for unique MoA values
query_create_moa_embeddings = f"""
CREATE OR REPLACE TABLE `{project_id}.{dataset_id}.{embedding_table_name}` AS
WITH unique_moa AS (
  SELECT DISTINCT TRIM(mechanismOfAction) AS content
  FROM `{source_project_id}.{source_dataset_id}.{source_table_name}`
  WHERE mechanismOfAction IS NOT NULL 
    AND TRIM(mechanismOfAction) != ''
)
SELECT
  e.content AS mechanismOfAction,
  e.ml_generate_embedding_result AS embedding
FROM ML.GENERATE_EMBEDDING(
  MODEL `{project_id}.{dataset_id}.{embedding_model_name}`,
  TABLE unique_moa
) AS e
ORDER BY mechanismOfAction
"""

# Execute the query to create the MoA embeddings table
bpd.read_gbq(query_create_moa_embeddings)
print("Mechanism of action embeddings table created successfully!")


Mechanism of action embeddings table created successfully!


### MoA Similarity Matrix
Create a comprehensive similarity table for all mechanism of action (MoA) pairs using the drug_moa_embedding table. This matrix will be used for drug recommendation scoring by comparing the similarity between different mechanisms of action.

In [11]:
# Build mechanism of action (MoA) pair similarity matrix table
moa_embedding_table_name = "drug_moa_embedding"
similarity_table_name = f"{moa_embedding_table_name}_similarity"

# Create comprehensive MoA pair similarity matrix for drug recommendation scoring
query_create_moa_similarity = f"""
CREATE OR REPLACE TABLE `{project_id}.{dataset_id}.{similarity_table_name}` AS
WITH moa_pairs AS (
  SELECT 
    a.mechanismOfAction AS moa_a,
    b.mechanismOfAction AS moa_b,
    a.embedding AS embedding_a,
    b.embedding AS embedding_b
  FROM `{project_id}.{dataset_id}.{moa_embedding_table_name}` a
  CROSS JOIN `{project_id}.{dataset_id}.{moa_embedding_table_name}` b
)
SELECT
  moa_a,
  moa_b,
  1 - ML.DISTANCE(embedding_a, embedding_b, 'COSINE') AS cosine_similarity
FROM moa_pairs
ORDER BY cosine_similarity DESC
"""

# Execute the query to create the MoA similarity table
bpd.read_gbq(query_create_moa_similarity)
print("Mechanism of action pair similarity matrix table created successfully!")


Mechanism of action pair similarity matrix table created successfully!


### MoA Flat Table
Create a flattened table that maps each drug to its mechanisms of action and target proteins. This table simplifies the complex many-to-many relationships in the original data by creating one row per drug-MoA-target combination, making it easier to analyze drug similarities based on their biological mechanisms.


In [19]:
# Create a flattened table that maps drugs to their mechanisms of action and targets
flat_table_name = "drug_moa_flat"

query_create_moa_flat = f"""
CREATE OR REPLACE TABLE `{project_id}.{dataset_id}.{flat_table_name}` AS
WITH
-- A) Drugs with a join key that prefers the parent when present
dm AS (
  SELECT
    id                         AS molecule_id,
    name                       AS drug_name,
    COALESCE(parentId, id)     AS join_id
  FROM bigquery-public-data.open_targets_platform.drug_molecule
),
-- B) Flatten MoA → target pairs from the MoA table
moa_flat AS (
  SELECT
    chembl.element AS moa_molecule_id,
    COALESCE(NULLIF(TRIM(dmoa.mechanismOfAction), ''),
             NULLIF(TRIM(dmoa.actionType), '')) AS moa,
    CAST(tgt.element AS STRING) AS target_id
  FROM bigquery-public-data.open_targets_platform.drug_mechanism_of_action AS dmoa
  CROSS JOIN UNNEST(dmoa.chemblIds.list) AS chembl
  LEFT JOIN UNNEST(dmoa.targets.list) AS tgt ON TRUE
),
-- C) Map MoA rows to the parent when available
moa_parentaware AS (
  SELECT
    COALESCE(dm2.parentId, dm2.id, mf.moa_molecule_id) AS join_id,
    mf.moa_molecule_id,
    mf.moa,
    mf.target_id
  FROM moa_flat AS mf
  LEFT JOIN bigquery-public-data.open_targets_platform.drug_molecule dm2
    ON dm2.id = mf.moa_molecule_id
)
-- D) Final
SELECT
  dm.molecule_id,
  dm.drug_name,
  mpa.moa,
  mpa.target_id
FROM dm
LEFT JOIN moa_parentaware mpa
  USING (join_id)
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY dm.molecule_id, mpa.moa, mpa.target_id
) = 1
"""

# Execute the query to create the flattened MoA table
bpd.read_gbq(query_create_moa_flat)
print("Drug MoA flat table created successfully!")

Drug MoA flat table created successfully!


## Vector Search

### First layer
1. Generate an embedding(`ML.GENERATE_EMBEDDING`) for the query disease name 
2. Search(`VECTOR_SEARCH`) for diseases with similar embeddings in our pre-computed disease embedding table 
3. Filter results by cosine distance threshold to ensure semantic relevance
4. Return a ranked list of similar diseases that could share therapeutic targets


In [2]:
# Configuration for vector search
base_table_name = "disease_embedding"
text_embedding_model_name = "embedding005"
query_text = 'trypanosomiasis'
distance_threshold = 0.3

# Construct vector search query to find similar diseases
vector_search_query = f"""
WITH query_table AS (
    SELECT *
    FROM ML.GENERATE_EMBEDDING(
        MODEL `{project_id}.{dataset_id}.{text_embedding_model_name}`,
        (SELECT '{query_text}' AS content)
    )
)
SELECT
    base.id,
    base.name AS disease_name,
    distance
FROM
    VECTOR_SEARCH(
        TABLE `{project_id}.{dataset_id}.{base_table_name}`,
        'embedding',
        (SELECT * FROM query_table),
        'ml_generate_embedding_result',
        top_k => 100,
        distance_type => 'COSINE'
    )
WHERE distance < {distance_threshold}
ORDER BY distance ASC
"""

# Execute the vector search query
similar_disease_df = bpd.read_gbq(vector_search_query)

# Exclude exact matches and filter results
similar_disease_df = similar_disease_df[
    similar_disease_df['disease_name'].str.lower() != query_text.lower()
].reset_index(drop=True)

# Display the results
print(f"Similar diseases to '{query_text}' (distance < {distance_threshold}):")
print(f"Found {len(similar_disease_df)} similar diseases")
print(similar_disease_df.head(10))

Similar diseases to 'trypanosomiasis' (distance < 0.3):
Found 3 similar diseases


              id                   disease_name  distance
0    EFO_0008559       American trypanosomiasis  0.213539
1    EFO_0005225  human african trypanosomiasis  0.224497
2  MONDO_0001444                 Chagas disease  0.277343

[3 rows x 3 columns]


### Second Layer
1. Finding diseases similar to the query disease using vector embeddings
2. Collecting known drugs for these similar diseases
3. Computing drug similarity scores based on mechanism of action (MOA)
4. Ranking and filtering drug candidates for the target disease


In [3]:
# Load required datasets for drug recommendation
print("Loading datasets...")

# Query to get disease-drug relationships with mechanism of action and targets
query_disease_drugs = """
SELECT 
    kd.diseaseId,
    d.name AS disease_name,
    kd.drugId,
    kd.mechanismOfAction AS moa,
    kd.targetId AS target_id
FROM
    `bigquery-public-data.open_targets_platform.known_drug` kd
INNER JOIN
    `bigquery-public-data.open_targets_platform.disease` d
ON
    d.id = kd.diseaseId
"""

# Query to get comprehensive drug information with MOA details
query_all_drugs = """
SELECT *
FROM `redrugai.redrugai_data.drug_moa_flat`
"""

# Query to get pre-computed MOA similarity scores
query_moa_pair_similarity = """
SELECT *
FROM `redrugai.redrugai_data.drug_moa_embedding_similarity`
"""

# Execute queries and load data
print("Executing BigQuery operations...")
all_drugs_df = bpd.read_gbq(query_all_drugs)
disease_drugs_df = bpd.read_gbq(query_disease_drugs)
moa_pair_sim_df = bpd.read_gbq(query_moa_pair_similarity)

print(f"Loaded {len(all_drugs_df)} drugs, {len(disease_drugs_df)} disease-drug relationships, and {len(moa_pair_sim_df)} MOA similarity pairs")

Loading datasets...
Executing BigQuery operations...
Loaded 27499 drugs, 253442 disease-drug relationships, and 2812329 MOA similarity pairs


In [5]:
from score import *

# Execute the recommendation system for the queried disease
result = recommend_for_disease_with_similars(
    disease_name=query_text,
    similar_disease_names=similar_disease_df['disease_name'].to_list(),
    disease_drugs_df=disease_drugs_df,
    all_drugs_df=all_drugs_df,
    moa_pair_sim_df=moa_pair_sim_df,
    top_overall=10,
    top_similar=5,
    similar_weight=0.5,
    evaluation_mode=False
)

In [9]:
# Display recommendation results
print(f"\n🔍 Disease Analysis Results for: {query_text}")
print("=" * 60)

print(f"\n💊 Known Drugs ({len(result['known_drugs'])} found):")
if not result["known_drugs"].empty:
    for idx, row in result["known_drugs"].iterrows():
        print(f"  {idx+1}. {row.get('drug_name', row['drugId'])}")
else:
    print("  No known drugs found for this disease")

print(f"\n🎯 Overall Recommendations ({len(result['overall_recommendations'])} drugs):")
if not result["overall_recommendations"].empty:
    for idx, row in result["overall_recommendations"].iterrows():
        score = row['final_score']
        drug_name = row.get('drug_name', row['drugId'])
        print(f"  {idx+1}. {drug_name} (Score: {score:.3f})")
else:
    print("  No recommendations available")

print(f"\n🔗 Similar Disease Recommendations ({len(result['similar_recommendations'])} drugs):")
if not result["similar_recommendations"].empty:
    for idx, row in result["similar_recommendations"].iterrows():
        score = row['score_against_primary']
        drug_name = row.get('drug_name', row['drugId'])
        print(f"  {idx+1}. {drug_name} (Score: {score:.3f})")
else:
    print("  No similar disease recommendations available")



🔍 Disease Analysis Results for: trypanosomiasis

💊 Known Drugs (5 found):
  1. CHEMBL265502
  2. CHEMBL413376
  3. CHEMBL52440
  4. CHEMBL655
  5. CHEMBL830

🎯 Overall Recommendations (10 drugs):
  1. EFLORNITHINE HYDROCHLORIDE (Score: 6.401)
  2. DEXTROMETHORPHAN POLISTIREX (Score: 5.278)
  3. DEXTROMETHORPHAN HYDROBROMIDE (Score: 5.278)
  4. CARBETAPENTANE CITRATE (Score: 5.264)
  5. CARBETAPENTANE (Score: 5.264)
  6. DIAZEPAM (Score: 3.009)
  7. MIDAZOLAM HYDROCHLORIDE (Score: 3.009)
  8. QUAZEPAM (Score: 3.009)
  9. CHLORDIAZEPOXIDE HYDROCHLORIDE (Score: 3.009)
  10. METHYPRYLON (Score: 3.009)

🔗 Similar Disease Recommendations (1 drugs):
  1. ATORVASTATIN (Score: 0.604)


## Performance Evaluation

**Important Caveat**: This recommendation system is designed to find potential drugs which cannot be validated as useful in real life without extensive clinical testing. These metrics are simply to evaluate if our approach can generate results that overlap with existing known drugs - this is not a strict academic evaluation method. If we find overlap, it shows to some level that our method has potential for drug discovery.

1. Randomly Sample 100 Disease from disease table
2. Get the recommendation drugs from similar drugs
3. Measure the overlay level with the following metrics:
    - **Precision**: How many recommendation drugs are in answer group (already known drugs)
    - **Recall**: How many answer group drugs are in recommendation group
    - **F1 Score**: Harmonic mean of precision and recall
    - **Overlap Count**: Total number of overlapping drugs between recommendations and known drugs

In [11]:
from evaluation import run_disease_evaluation, get_random_diseases_sample
import pandas as pd
import time
from tqdm import tqdm


In [12]:
# Sample 100 random diseases with at least 1 known drug
print("Sampling 100 random diseases...")
random_diseases_df = get_random_diseases_sample(
    n_samples=100,
    project_id=project_id,
    source_project_id=source_project_id,
    source_dataset_id=source_dataset_id,
    min_known_drugs=1
)

print(f"Sampled {len(random_diseases_df)} diseases:")
print(random_diseases_df.head())


Sampling 100 random diseases...
Sampled 100 diseases:


                    disease_name  known_drug_count
0              Uterine leiomyoma                 2
1  idiopathic pulmonary fibrosis                62
2       fetal growth restriction                 6
3                    Cholangitis                 1
4               Immunodeficiency                 3

[5 rows x 2 columns]


In [None]:
# Run evaluation experiments for all sampled diseases
print("Running evaluation experiments...")

# Convert random_diseases_df to pandas for iteration
diseases_list = random_diseases_df.to_pandas()['disease_name'].tolist()

# Store results
evaluation_results = []

# Run experiments with progress bar
for i, disease_name in enumerate(tqdm(diseases_list, desc="Evaluating diseases")):
    print(f"\nEvaluating disease {i+1}/100: {disease_name}")
    
    # Run single disease evaluation
    result = run_disease_evaluation(
        disease_name=disease_name,
        disease_drugs_df=disease_drugs_df,
        all_drugs_df=all_drugs_df,
        moa_pair_sim_df=moa_pair_sim_df,
        disease_embedding_df=None,  # Will be handled inside the function
        project_id=project_id,
        dataset_id=dataset_id,
        embedding_model_name=text_embedding_model_name,
        distance_threshold=distance_threshold,
        top_overall=10,
        top_similar=5,
        similar_weight=0.5
    )
    
    evaluation_results.append(result)
    
    # Print progress summary
    if result['success']:
        metrics = result['metrics']
        print(f"  Success: {result['known_drugs_count']} known drugs, "
              f"{result['overall_recommendations_count']} overall recs, "
              f"{result['similar_recommendations_count']} similar recs")
        if 'overall_precision' in metrics:
            print(f"  Overall: P={metrics['overall_precision']:.3f}, "
                  f"R={metrics['overall_recall']:.3f}, "
                  f"F1={metrics['overall_f1_score']:.3f}")
        if 'similar_precision' in metrics:
            print(f"  Similar: P={metrics['similar_precision']:.3f}, "
                  f"R={metrics['similar_recall']:.3f}, "
                  f"F1={metrics['similar_f1_score']:.3f}")
        if 'combined_precision' in metrics:
            print(f"  Combined: P={metrics['combined_precision']:.3f}, "
                  f"R={metrics['combined_recall']:.3f}, "
                  f"F1={metrics['combined_f1_score']:.3f}")
    else:
        print(f"  Failed: {result['error']}")
    
    # Small delay to avoid overwhelming BigQuery
    time.sleep(0.5)

print(f"\nCompleted evaluation of {len(evaluation_results)} diseases")


Running evaluation experiments...


Evaluating diseases:   0%|          | 0/100 [00:00<?, ?it/s]


Evaluating disease 1/100: Uterine leiomyoma
  Success: 2 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.200, R=0.500, F1=0.286
  Combined: P=0.067, R=0.500, F1=0.118


Evaluating diseases:   1%|          | 1/100 [01:00<1:40:05, 60.66s/it]


Evaluating disease 2/100: idiopathic pulmonary fibrosis
  Success: 62 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=1.000, R=0.081, F1=0.149
  Combined: P=0.333, R=0.081, F1=0.130


Evaluating diseases:   2%|▏         | 2/100 [02:00<1:37:51, 59.91s/it]


Evaluating disease 3/100: fetal growth restriction


  


  Success: 6 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.000, R=0.000, F1=0.000


Evaluating diseases:   3%|▎         | 3/100 [02:53<1:31:58, 56.89s/it]


Evaluating disease 4/100: Cholangitis
  Success: 1 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.200, R=1.000, F1=0.333
  Combined: P=0.067, R=1.000, F1=0.125


Evaluating diseases:   4%|▍         | 4/100 [03:47<1:29:22, 55.86s/it]


Evaluating disease 5/100: Immunodeficiency
  Success: 3 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.600, R=1.000, F1=0.750
  Combined: P=0.200, R=1.000, F1=0.333


Evaluating diseases:   5%|▌         | 5/100 [04:42<1:28:00, 55.58s/it]


Evaluating disease 6/100: refractive error


  


  Success: 1 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.100, R=1.000, F1=0.182
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.100, R=1.000, F1=0.182


Evaluating diseases:   6%|▌         | 6/100 [05:33<1:24:23, 53.87s/it]


Evaluating disease 7/100: berylliosis


  


  Success: 3 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.000, R=0.000, F1=0.000


Evaluating diseases:   7%|▋         | 7/100 [06:23<1:21:26, 52.54s/it]


Evaluating disease 8/100: X-Linked Combined Immunodeficiency Diseases
  Success: 1 known drugs, 10 overall recs, 1 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=1.000, R=1.000, F1=1.000
  Combined: P=0.091, R=1.000, F1=0.167


Evaluating diseases:   8%|▊         | 8/100 [07:17<1:21:28, 53.14s/it]


Evaluating disease 9/100: contracture
  Success: 2 known drugs, 10 overall recs, 1 similar recs
  Overall: P=0.200, R=1.000, F1=0.333
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.182, R=1.000, F1=0.308


Evaluating diseases:   9%|▉         | 9/100 [08:12<1:21:37, 53.82s/it]


Evaluating disease 10/100: chronic prostatitis
  Success: 1 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.100, R=1.000, F1=0.182
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.067, R=1.000, F1=0.125


Evaluating diseases:  10%|█         | 10/100 [09:10<1:22:20, 54.89s/it]


Evaluating disease 11/100: clostridium difficile infection


  


  Success: 2 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.000, R=0.000, F1=0.000


Evaluating diseases:  11%|█         | 11/100 [10:06<1:22:15, 55.46s/it]


Evaluating disease 12/100: precursor T-cell lymphoblastic leukemia-lymphoma
  Success: 6 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.200, R=0.167, F1=0.182
  Combined: P=0.067, R=0.167, F1=0.095


Evaluating diseases:  12%|█▏        | 12/100 [11:38<1:37:42, 66.62s/it]


Evaluating disease 13/100: Metrorrhagia
  Success: 5 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.200, R=0.400, F1=0.267
  Similar: P=0.400, R=0.400, F1=0.400
  Combined: P=0.267, R=0.800, F1=0.400


Evaluating diseases:  13%|█▎        | 13/100 [12:31<1:30:31, 62.44s/it]


Evaluating disease 14/100: Epiphora


  


  Success: 5 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.100, R=0.200, F1=0.133
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.100, R=0.200, F1=0.133


Evaluating diseases:  14%|█▍        | 14/100 [13:28<1:27:07, 60.78s/it]


Evaluating disease 15/100: advanced heart failure


  


  Success: 1 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.000, R=0.000, F1=0.000


Evaluating diseases:  15%|█▌        | 15/100 [14:21<1:22:52, 58.50s/it]


Evaluating disease 16/100: open-angle glaucoma


  


  Success: 27 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.100, R=0.037, F1=0.054
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.100, R=0.037, F1=0.054


Evaluating diseases:  16%|█▌        | 16/100 [15:23<1:23:19, 59.52s/it]


Evaluating disease 17/100: myelodysplastic syndrome
  Success: 190 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.200, R=0.011, F1=0.020
  Similar: P=0.600, R=0.016, F1=0.031
  Combined: P=0.333, R=0.026, F1=0.049


Evaluating diseases:  17%|█▋        | 17/100 [16:43<1:30:38, 65.52s/it]


Evaluating disease 18/100: acute stress reaction


  


  Success: 2 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.000, R=0.000, F1=0.000


Evaluating diseases:  18%|█▊        | 18/100 [17:37<1:25:05, 62.26s/it]


Evaluating disease 19/100: HIV wasting syndrome
  Success: 5 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.200, R=0.400, F1=0.267
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.133, R=0.400, F1=0.200


Evaluating diseases:  19%|█▉        | 19/100 [18:30<1:20:04, 59.31s/it]


Evaluating disease 20/100: ear infection
  Success: 1 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.000, R=0.000, F1=0.000


Evaluating diseases:  20%|██        | 20/100 [19:22<1:16:22, 57.28s/it]


Evaluating disease 21/100: hemorrhage
  Success: 77 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.600, R=0.039, F1=0.073
  Combined: P=0.200, R=0.039, F1=0.065


Evaluating diseases:  21%|██        | 21/100 [20:28<1:18:30, 59.63s/it]


Evaluating disease 22/100: acute erythroleukemia
  Success: 14 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.400, R=0.143, F1=0.211
  Combined: P=0.133, R=0.143, F1=0.138


Evaluating diseases:  22%|██▏       | 22/100 [22:51<1:50:01, 84.63s/it]


Evaluating disease 23/100: Dysmenorrhea


  


  Success: 29 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.000, R=0.000, F1=0.000


Evaluating diseases:  23%|██▎       | 23/100 [23:43<1:36:22, 75.10s/it]


Evaluating disease 24/100: familial lipoprotein lipase deficiency
  Success: 5 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.100, R=0.200, F1=0.133
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.067, R=0.200, F1=0.100


Evaluating diseases:  24%|██▍       | 24/100 [24:38<1:27:27, 69.04s/it]


Evaluating disease 25/100: Niemann-Pick disease
  Success: 1 known drugs, 10 overall recs, 4 similar recs
  Overall: P=0.100, R=1.000, F1=0.182
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.071, R=1.000, F1=0.133


Evaluating diseases:  25%|██▌       | 25/100 [25:35<1:21:46, 65.41s/it]


Evaluating disease 26/100: otitis media with effusion
  Success: 4 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.100, R=0.250, F1=0.143
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.067, R=0.250, F1=0.105


Evaluating diseases:  26%|██▌       | 26/100 [26:31<1:17:16, 62.65s/it]


Evaluating disease 27/100: generalized lipodystrophy
  Success: 1 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.100, R=1.000, F1=0.182
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.067, R=1.000, F1=0.125


Evaluating diseases:  27%|██▋       | 27/100 [27:22<1:11:53, 59.09s/it]


Evaluating disease 28/100: Premature rupture of membranes
  Success: 1 known drugs, 10 overall recs, 3 similar recs
  Overall: P=0.100, R=1.000, F1=0.182
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.077, R=1.000, F1=0.143


Evaluating diseases:  28%|██▊       | 28/100 [28:12<1:07:37, 56.36s/it]


Evaluating disease 29/100: hereditary angioedema


  


  Success: 13 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.200, R=0.154, F1=0.174
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.200, R=0.154, F1=0.174


Evaluating diseases:  29%|██▉       | 29/100 [29:08<1:06:25, 56.13s/it]


Evaluating disease 30/100: carcinoid heart disease


  


  Success: 2 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.200, R=1.000, F1=0.333
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.200, R=1.000, F1=0.333


Evaluating diseases:  30%|███       | 30/100 [29:59<1:03:53, 54.76s/it]


Evaluating disease 31/100: ovarian leiomyosarcoma
  Success: 1 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.200, R=1.000, F1=0.333
  Combined: P=0.067, R=1.000, F1=0.125


Evaluating diseases:  31%|███       | 31/100 [31:26<1:14:01, 64.37s/it]


Evaluating disease 32/100: skin neoplasm
  Success: 11 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.200, R=0.091, F1=0.125
  Combined: P=0.067, R=0.091, F1=0.077


Evaluating diseases:  32%|███▏      | 32/100 [32:20<1:09:30, 61.34s/it]


Evaluating disease 33/100: reactive arthritis


  


  Success: 4 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.200, R=0.500, F1=0.286
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.200, R=0.500, F1=0.286


Evaluating diseases:  33%|███▎      | 33/100 [33:10<1:04:31, 57.79s/it]


Evaluating disease 34/100: calcinosis
  Success: 1 known drugs, 10 overall recs, 2 similar recs
  Overall: P=0.100, R=1.000, F1=0.182
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.083, R=1.000, F1=0.154


Evaluating diseases:  34%|███▍      | 34/100 [34:00<1:01:09, 55.60s/it]


Evaluating disease 35/100: glaucoma
  Success: 63 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.300, R=0.048, F1=0.082
  Similar: P=0.800, R=0.063, F1=0.118
  Combined: P=0.467, R=0.111, F1=0.179


Evaluating diseases:  35%|███▌      | 35/100 [34:54<59:29, 54.92s/it]  


Evaluating disease 36/100: pharyngeal squamous cell carcinoma
  Success: 1 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.200, R=1.000, F1=0.333
  Combined: P=0.067, R=1.000, F1=0.125


Evaluating diseases:  36%|███▌      | 36/100 [36:22<1:09:22, 65.04s/it]


Evaluating disease 37/100: ulcer disease
  Success: 13 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.000, R=0.000, F1=0.000


Evaluating diseases:  37%|███▋      | 37/100 [37:14<1:04:03, 61.01s/it]


Evaluating disease 38/100: acquired hemophilia
  Success: 3 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.400, R=0.667, F1=0.500
  Combined: P=0.133, R=0.667, F1=0.222


Evaluating diseases:  38%|███▊      | 38/100 [38:08<1:00:42, 58.76s/it]


Evaluating disease 39/100: cocaine use disorder
  Success: 23 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.100, R=0.043, F1=0.061
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.067, R=0.043, F1=0.053


Evaluating diseases:  39%|███▉      | 39/100 [39:07<59:55, 58.94s/it]  


Evaluating disease 40/100: carcinoma of liver and intrahepatic biliary tract
  Success: 12 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.400, R=0.167, F1=0.235
  Combined: P=0.133, R=0.167, F1=0.148


Evaluating diseases:  40%|████      | 40/100 [40:49<1:12:01, 72.03s/it]


Evaluating disease 41/100: endocarditis
  Success: 1 known drugs, 10 overall recs, 1 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=1.000, R=1.000, F1=1.000
  Combined: P=0.091, R=1.000, F1=0.167


Evaluating diseases:  41%|████      | 41/100 [41:41<1:04:45, 65.85s/it]


Evaluating disease 42/100: refractory hairy cell leukemia
  Success: 5 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.100, R=0.200, F1=0.133
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.067, R=0.200, F1=0.100


Evaluating diseases:  42%|████▏     | 42/100 [42:35<1:00:12, 62.28s/it]


Evaluating disease 43/100: neoplasm of mature T-cells or NK-cells
  Success: 8 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.600, R=0.375, F1=0.462
  Combined: P=0.200, R=0.375, F1=0.261


Evaluating diseases:  43%|████▎     | 43/100 [43:26<55:53, 58.83s/it]  


Evaluating disease 44/100: chronic fatigue syndrome


  


  Success: 4 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.100, R=0.250, F1=0.143
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.100, R=0.250, F1=0.143


Evaluating diseases:  44%|████▍     | 44/100 [44:16<52:30, 56.26s/it]


Evaluating disease 45/100: vitamin D deficiency
  Success: 7 known drugs, 10 overall recs, 1 similar recs
  Overall: P=0.200, R=0.286, F1=0.235
  Similar: P=1.000, R=0.143, F1=0.250
  Combined: P=0.273, R=0.429, F1=0.333


Evaluating diseases:  45%|████▌     | 45/100 [45:07<50:03, 54.61s/it]


Evaluating disease 46/100: Chest pain


  


  Success: 2 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.100, R=0.500, F1=0.167
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.100, R=0.500, F1=0.167


Evaluating diseases:  46%|████▌     | 46/100 [46:00<48:53, 54.32s/it]


Evaluating disease 47/100: dedifferentiated chondrosarcoma
  Success: 1 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.200, R=1.000, F1=0.333
  Combined: P=0.067, R=1.000, F1=0.125


Evaluating diseases:  47%|████▋     | 47/100 [46:59<49:12, 55.70s/it]


Evaluating disease 48/100: pseudohypoaldosteronism
  Success: 1 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.000, R=0.000, F1=0.000


Evaluating diseases:  48%|████▊     | 48/100 [47:55<48:21, 55.80s/it]


Evaluating disease 49/100: Histiocytosis


  


  Success: 7 known drugs, 10 overall recs, 0 similar recs
  Overall: P=0.300, R=0.429, F1=0.353
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.300, R=0.429, F1=0.353


Evaluating diseases:  49%|████▉     | 49/100 [48:51<47:30, 55.90s/it]


Evaluating disease 50/100: limb-girdle muscular dystrophy
  Success: 2 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.200, R=0.500, F1=0.286
  Combined: P=0.067, R=0.500, F1=0.118


Evaluating diseases:  50%|█████     | 50/100 [49:53<48:04, 57.68s/it]


Evaluating disease 51/100: luminal B breast carcinoma
  Success: 3 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.200, R=0.667, F1=0.308
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.133, R=0.667, F1=0.222


Evaluating diseases:  51%|█████     | 51/100 [50:48<46:26, 56.88s/it]


Evaluating disease 52/100: vulvar intraepithelial neoplasia
  Success: 1 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.100, R=1.000, F1=0.182
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.067, R=1.000, F1=0.125


Evaluating diseases:  52%|█████▏    | 52/100 [51:46<45:42, 57.13s/it]


Evaluating disease 53/100: HER2 Positive Breast Carcinoma
  Success: 21 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.400, R=0.095, F1=0.154
  Combined: P=0.133, R=0.095, F1=0.111


Evaluating diseases:  53%|█████▎    | 53/100 [52:53<47:01, 60.02s/it]


Evaluating disease 54/100: adult acute respiratory distress syndrome
  Success: 15 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.600, R=0.200, F1=0.300
  Combined: P=0.200, R=0.200, F1=0.200


Evaluating diseases:  54%|█████▍    | 54/100 [53:51<45:34, 59.44s/it]


Evaluating disease 55/100: severe hemophilia A
  Success: 5 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.400, R=0.400, F1=0.400
  Combined: P=0.133, R=0.400, F1=0.200


Evaluating diseases:  55%|█████▌    | 55/100 [54:50<44:25, 59.24s/it]


Evaluating disease 56/100: Uveal Melanoma
  Success: 43 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.000, R=0.000, F1=0.000


Evaluating diseases:  56%|█████▌    | 56/100 [56:16<49:27, 67.44s/it]


Evaluating disease 57/100: placental site trophoblastic tumor
  Success: 2 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.400, R=1.000, F1=0.571
  Combined: P=0.133, R=1.000, F1=0.235


Evaluating diseases:  57%|█████▋    | 57/100 [57:13<46:07, 64.35s/it]


Evaluating disease 58/100: methylmalonic acidemia
  Success: 1 known drugs, 10 overall recs, 3 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.333, R=1.000, F1=0.500
  Combined: P=0.077, R=1.000, F1=0.143


Evaluating diseases:  58%|█████▊    | 58/100 [58:09<43:14, 61.78s/it]


Evaluating disease 59/100: congenital mitral valve insufficiency
  Success: 4 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.200, R=0.500, F1=0.286
  Similar: P=0.000, R=0.000, F1=0.000
  Combined: P=0.133, R=0.500, F1=0.211


Evaluating diseases:  59%|█████▉    | 59/100 [59:03<40:36, 59.42s/it]


Evaluating disease 60/100: anaplastic meningioma


In [10]:
# Analyze evaluation results
print("=== EVALUATION ANALYSIS ===")

# Filter successful evaluations only
successful_results = results_df[results_df['success'] == True]

if len(successful_results) > 0:
    print(f"\nAnalyzing {len(successful_results)} successful evaluations:")
    
    # Overall performance metrics
    numeric_cols = successful_results.select_dtypes(include=[float, int]).columns
    metric_cols = [col for col in numeric_cols if any(keyword in col.lower() 
                   for keyword in ['precision', 'recall', 'f1_score'])]
    
    if metric_cols:
        print(f"\nPerformance Metrics Summary:")
        for col in metric_cols:
            if col in successful_results.columns:
                mean_val = successful_results[col].mean()
                std_val = successful_results[col].std()
                print(f"  {col}: {mean_val:.4f} ± {std_val:.4f}")
    
    # Count-based statistics
    print(f"\nCount Statistics:")
    print(f"  Average known drugs per disease: {successful_results['known_drugs_count'].mean():.2f}")
    print(f"  Average recommendations per disease: {successful_results['overall_recommendations_count'].mean():.2f}")
    print(f"  Average similar diseases found: {successful_results['similar_diseases_count'].mean():.2f}")
    
    # Overlap analysis
    if 'overall_overlap_count' in successful_results.columns:
        overlap_stats = successful_results['overall_overlap_count']
        print(f"\nOverlap Analysis:")
        print(f"  Diseases with at least 1 overlap: {(overlap_stats > 0).sum()}/{len(successful_results)}")
        print(f"  Average overlap count: {overlap_stats.mean():.2f}")
        print(f"  Max overlap count: {overlap_stats.max()}")

    # Save results to CSV
    results_df.to_csv('evaluation_results.csv', index=False)
    print(f"\nResults saved to 'evaluation_results.csv'")
    
else:
    print("No successful evaluations to analyze.")

print("\n=== EVALUATION COMPLETE ===")

# Display final summary
print(f"\nFinal Summary:")
print(f"Total diseases evaluated: {len(results_df)}")
print(f"Successful evaluations: {results_df['success'].sum()}")
print(f"Failed evaluations: {(~results_df['success']).sum()}")

if len(successful_results) > 0 and 'overall_precision' in successful_results.columns:
    # Overall metrics
    avg_precision = successful_results['overall_precision'].mean()
    avg_recall = successful_results['overall_recall'].mean()
    avg_f1 = successful_results['overall_f1_score'].mean()
    print(f"Average Overall Precision: {avg_precision:.4f}")
    print(f"Average Overall Recall: {avg_recall:.4f}")
    print(f"Average Overall F1-Score: {avg_f1:.4f}")
    
    # Similar disease metrics
    if 'similar_precision' in successful_results.columns:
        sim_precision = successful_results['similar_precision'].mean()
        sim_recall = successful_results['similar_recall'].mean()
        sim_f1 = successful_results['similar_f1_score'].mean()
        print(f"Average Similar Precision: {sim_precision:.4f}")
        print(f"Average Similar Recall: {sim_recall:.4f}")
        print(f"Average Similar F1-Score: {sim_f1:.4f}")
    
    # Combined metrics  
    if 'combined_precision' in successful_results.columns:
        comb_precision = successful_results['combined_precision'].mean()
        comb_recall = successful_results['combined_recall'].mean()
        comb_f1 = successful_results['combined_f1_score'].mean()
        print(f"Average Combined Precision: {comb_precision:.4f}")
        print(f"Average Combined Recall: {comb_recall:.4f}")
        print(f"Average Combined F1-Score: {comb_f1:.4f}")


=== EVALUATION ANALYSIS ===

Analyzing 5 successful evaluations:

Performance Metrics Summary:
  overall_precision: 0.2200 ± 0.2280
  overall_recall: 0.1955 ± 0.2363
  overall_f1_score: 0.1148 ± 0.1089
  similar_precision: 0.1600 ± 0.3578
  similar_recall: 0.0087 ± 0.0194
  similar_f1_score: 0.0165 ± 0.0369
  combined_precision: 0.2174 ± 0.2385
  combined_recall: 0.1998 ± 0.2335
  combined_f1_score: 0.1134 ± 0.1075

Count Statistics:
  Average known drugs per disease: 55.60
  Average recommendations per disease: 10.00
  Average similar diseases found: 3.20

Overlap Analysis:
  Diseases with at least 1 overlap: 4/5
  Average overlap count: 2.20
  Max overlap count: 6

Results saved to 'evaluation_results.csv'

=== EVALUATION COMPLETE ===

Final Summary:
Total diseases evaluated: 5
Successful evaluations: 5
Failed evaluations: 0
Average Overall Precision: 0.2200
Average Overall Recall: 0.1955
Average Overall F1-Score: 0.1148
Average Similar Precision: 0.1600
Average Similar Recall: 0.008