# RedrugAI

## Introduction

This notebook demonstrates **RedrugAI**, a drug repurposing recommendation system built on Google BigQuery and BigQuery AI. The system leverages machine learning embeddings and vector similarity search to identify potential therapeutic applications for existing drugs in new disease contexts.

### Dataset
- **Open Targets Platform**: A comprehensive biomedical database containing drug-disease associations, molecular targets, and mechanisms of action accessible through BigQuery's public datasets

### Key BigQuery AI Features
 
- **ML.GENERATE_EMBEDDING**: Leverages Gemini embedding models to create high-quality semantic vectors for diseases and drug mechanisms of action
- **VECTOR_SEARCH**: Utilizes BigQuery's native vector similarity functions for similar diseases.
- **bigframes.bigquery.create_vector_index()**: Creates optimized vector indices for efficient similarity search across disease embeddings (38,959 diseases)

### Workflow Overview

1. **Data Preparation**: Create embedding tables for diseases and drug mechanisms using BigQuery AI
2. **Similarity Computation**: Build similarity matrices using vector search capabilities  
3. **Recommendation Engine**: Score candidate drugs based on known therapeutic relationships
4. **Evaluation**: Validate recommendations against known drug-disease associations

This approach enables researchers to discover novel therapeutic opportunities by identifying drugs with similar mechanisms or targets that could be effective for related diseases.

## Set up
### Biquery 
- Create Biquery project 'redrugai' and dataset 'redrugai_data'
- Create remote vertex model for text embeddings
   https://cloud.google.com/bigquery/docs/generate-text-embedding#console_1
### Python
version >= 3.12.7

In [None]:
%pip install --upgrade bigframes
%pip install tqdm

In [1]:
import bigframes.bigquery as bbq
import bigframes.pandas as bpd
region = "US"
project_id = "redrugai"
dataset_id = "redrugai_data"
embedding_model_name = 'embedding005'
source_project_id = "bigquery-public-data"
source_dataset_id = "open_targets_platform"
# Configure BigQuery client  
bpd.options.bigquery.project = project_id
bpd.options.bigquery.location = region


## Prepare Tables
- Only need to run once
- Purpose: Pre-build embedding tables and similarity matrices for efficient vector search on public datasets

### Embedding Table
1. disease_embedding table
    - Source Table: disease.
    - Convert Column: name, synonyms, description  
2. drug_mechanism_of_action_embedding table
    - Target Table: drug_mechanism_of_action. 
    - Convert Column: 'MoA'

In [9]:
# Create disease embeddings table
source_table_name = "disease"
embedding_table_name = "disease_embedding"

# Create table with disease embeddings by concatenating name, synonyms, and description
query_create_embeddings = f"""
CREATE OR REPLACE TABLE `{project_id}.{dataset_id}.{embedding_table_name}` AS
WITH source_table AS (
  SELECT
    d.id,
    d.name,
    d.synonyms,
    CONCAT(
      'Name: ', d.name,
      IFNULL(CONCAT('. Synonyms: ', STRING_AGG(DISTINCT syn.element, ', ')), ''),
      IFNULL(CONCAT('. Description: ', d.description), '')
    ) AS content
  FROM `{source_project_id}.{source_dataset_id}.{source_table_name}` AS d
  LEFT JOIN UNNEST(ARRAY_CONCAT(
    IFNULL(d.synonyms.hasExactSynonym.list, []),
    IFNULL(d.synonyms.hasRelatedSynonym.list, []),
    IFNULL(d.synonyms.hasNarrowSynonym.list, []),
    IFNULL(d.synonyms.hasBroadSynonym.list, [])
  )) AS syn
  GROUP BY d.id, d.name, d.description, d.synonyms
)
SELECT
  s.id,
  s.name,
  s.synonyms,
  e.ml_generate_embedding_result AS embedding
FROM
  source_table s
JOIN
  ML.GENERATE_EMBEDDING(
    MODEL `{project_id}.{dataset_id}.{embedding_model_name}`,
    (SELECT id, content FROM source_table),
    STRUCT(TRUE AS flatten_json_output)
  ) e
ON s.id = e.id
"""

# Execute the query to create the embeddings table
bpd.read_gbq(query_create_embeddings)
print("Disease embeddings table created successfully!")


Disease embeddings table created successfully!


In [10]:
# Create a vector index for efficient searching on the disease embeddings table
# Note: BigQuery requires minimum 5000 rows for IVF index type
# If table has fewer rows, use VECTOR_SEARCH function directly without index
full_table_id = f"{project_id}.{dataset_id}.{embedding_table_name}"
print(full_table_id)

try:
    bbq.create_vector_index(
        table_id=full_table_id,
        column_name='embedding',
    )
    print("Vector index created successfully!")
except Exception as e:
    print(f"Note: Vector index creation failed - {str(e)}")
    print("This is expected for tables with < 5000 rows. VECTOR_SEARCH will work without index.")


redrugai.redrugai_data.disease_embedding
Vector index created successfully!


In [None]:
# Create unique mechanism of action (MoA) embedding table
source_table_name = "drug_mechanism_of_action"
embedding_table_name = "drug_moa_embedding"

# Build embeddings for unique MoA values
query_create_moa_embeddings = f"""
CREATE OR REPLACE TABLE `{project_id}.{dataset_id}.{embedding_table_name}` AS
WITH unique_moa AS (
  SELECT DISTINCT TRIM(mechanismOfAction) AS content
  FROM `{source_project_id}.{source_dataset_id}.{source_table_name}`
  WHERE mechanismOfAction IS NOT NULL 
    AND TRIM(mechanismOfAction) != ''
)
SELECT
  e.content AS mechanismOfAction,
  e.ml_generate_embedding_result AS embedding
FROM ML.GENERATE_EMBEDDING(
  MODEL `{project_id}.{dataset_id}.{embedding_model_name}`,
  TABLE unique_moa
) AS e
ORDER BY mechanismOfAction
"""

# Execute the query to create the MoA embeddings table
bpd.read_gbq(query_create_moa_embeddings)
print("Mechanism of action embeddings table created successfully!")


Mechanism of action embeddings table created successfully!


### MoA Similarity Matrix
Create a comprehensive similarity table for all mechanism of action (MoA) pairs using the drug_moa_embedding table. This matrix will be used for drug recommendation scoring by comparing the similarity between different mechanisms of action.

In [11]:
# Build mechanism of action (MoA) pair similarity matrix table
moa_embedding_table_name = "drug_moa_embedding"
similarity_table_name = f"{moa_embedding_table_name}_similarity"

# Create comprehensive MoA pair similarity matrix for drug recommendation scoring
query_create_moa_similarity = f"""
CREATE OR REPLACE TABLE `{project_id}.{dataset_id}.{similarity_table_name}` AS
WITH moa_pairs AS (
  SELECT 
    a.mechanismOfAction AS moa_a,
    b.mechanismOfAction AS moa_b,
    a.embedding AS embedding_a,
    b.embedding AS embedding_b
  FROM `{project_id}.{dataset_id}.{moa_embedding_table_name}` a
  CROSS JOIN `{project_id}.{dataset_id}.{moa_embedding_table_name}` b
)
SELECT
  moa_a,
  moa_b,
  1 - ML.DISTANCE(embedding_a, embedding_b, 'COSINE') AS cosine_similarity
FROM moa_pairs
ORDER BY cosine_similarity DESC
"""

# Execute the query to create the MoA similarity table
bpd.read_gbq(query_create_moa_similarity)
print("Mechanism of action pair similarity matrix table created successfully!")


Mechanism of action pair similarity matrix table created successfully!


### MoA Flat Table
Create a flattened table that maps each drug to its mechanisms of action and target proteins. This table simplifies the complex many-to-many relationships in the original data by creating one row per drug-MoA-target combination, making it easier to analyze drug similarities based on their biological mechanisms.


In [19]:
# Create a flattened table that maps drugs to their mechanisms of action and targets
flat_table_name = "drug_moa_flat"

query_create_moa_flat = f"""
CREATE OR REPLACE TABLE `{project_id}.{dataset_id}.{flat_table_name}` AS
WITH
-- A) Drugs with a join key that prefers the parent when present
dm AS (
  SELECT
    id                         AS molecule_id,
    name                       AS drug_name,
    COALESCE(parentId, id)     AS join_id
  FROM bigquery-public-data.open_targets_platform.drug_molecule
),
-- B) Flatten MoA → target pairs from the MoA table
moa_flat AS (
  SELECT
    chembl.element AS moa_molecule_id,
    COALESCE(NULLIF(TRIM(dmoa.mechanismOfAction), ''),
             NULLIF(TRIM(dmoa.actionType), '')) AS moa,
    CAST(tgt.element AS STRING) AS target_id
  FROM bigquery-public-data.open_targets_platform.drug_mechanism_of_action AS dmoa
  CROSS JOIN UNNEST(dmoa.chemblIds.list) AS chembl
  LEFT JOIN UNNEST(dmoa.targets.list) AS tgt ON TRUE
),
-- C) Map MoA rows to the parent when available
moa_parentaware AS (
  SELECT
    COALESCE(dm2.parentId, dm2.id, mf.moa_molecule_id) AS join_id,
    mf.moa_molecule_id,
    mf.moa,
    mf.target_id
  FROM moa_flat AS mf
  LEFT JOIN bigquery-public-data.open_targets_platform.drug_molecule dm2
    ON dm2.id = mf.moa_molecule_id
)
-- D) Final
SELECT
  dm.molecule_id,
  dm.drug_name,
  mpa.moa,
  mpa.target_id
FROM dm
LEFT JOIN moa_parentaware mpa
  USING (join_id)
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY dm.molecule_id, mpa.moa, mpa.target_id
) = 1
"""

# Execute the query to create the flattened MoA table
bpd.read_gbq(query_create_moa_flat)
print("Drug MoA flat table created successfully!")

Drug MoA flat table created successfully!


## Vector Search

### First layer
1. Generate an embedding for the query disease name
2. Search for diseases with similar embeddings in our pre-computed disease embedding table
3. Filter results by cosine distance threshold to ensure semantic relevance
4. Return a ranked list of similar diseases that could share therapeutic targets


In [2]:
# Configuration for vector search
base_table_name = "disease_embedding"
text_embedding_model_name = "embedding005"
query_text = 'trypanosomiasis'
distance_threshold = 0.3

# Construct vector search query to find similar diseases
vector_search_query = f"""
WITH query_table AS (
    SELECT *
    FROM ML.GENERATE_EMBEDDING(
        MODEL `{project_id}.{dataset_id}.{text_embedding_model_name}`,
        (SELECT '{query_text}' AS content)
    )
)
SELECT
    base.id,
    base.name AS disease_name,
    distance
FROM
    VECTOR_SEARCH(
        TABLE `{project_id}.{dataset_id}.{base_table_name}`,
        'embedding',
        (SELECT * FROM query_table),
        'ml_generate_embedding_result',
        top_k => 100,
        distance_type => 'COSINE'
    )
WHERE distance < {distance_threshold}
ORDER BY distance ASC
"""

# Execute the vector search query
similar_disease_df = bpd.read_gbq(vector_search_query)

# Exclude exact matches and filter results
similar_disease_df = similar_disease_df[
    similar_disease_df['disease_name'].str.lower() != query_text.lower()
].reset_index(drop=True)

# Display the results
print(f"Similar diseases to '{query_text}' (distance < {distance_threshold}):")
print(f"Found {len(similar_disease_df)} similar diseases")
print(similar_disease_df.head(10))

Similar diseases to 'trypanosomiasis' (distance < 0.3):
Found 3 similar diseases


              id                   disease_name  distance
0    EFO_0008559       American trypanosomiasis  0.213539
1    EFO_0005225  human african trypanosomiasis  0.224497
2  MONDO_0001444                 Chagas disease  0.277343

[3 rows x 3 columns]


### Second Layer
1. Finding diseases similar to the query disease using vector embeddings
2. Collecting known drugs for these similar diseases
3. Computing drug similarity scores based on mechanism of action (MOA)
4. Ranking and filtering drug candidates for the target disease


In [3]:
# Load required datasets for drug recommendation
print("Loading datasets...")

# Query to get disease-drug relationships with mechanism of action and targets
query_disease_drugs = """
SELECT 
    kd.diseaseId,
    d.name AS disease_name,
    kd.drugId,
    kd.mechanismOfAction AS moa,
    kd.targetId AS target_id
FROM
    `bigquery-public-data.open_targets_platform.known_drug` kd
INNER JOIN
    `bigquery-public-data.open_targets_platform.disease` d
ON
    d.id = kd.diseaseId
"""

# Query to get comprehensive drug information with MOA details
query_all_drugs = """
SELECT *
FROM `redrugai.redrugai_data.drug_moa_flat`
"""

# Query to get pre-computed MOA similarity scores
query_moa_pair_similarity = """
SELECT *
FROM `redrugai.redrugai_data.drug_moa_embedding_similarity`
"""

# Execute queries and load data
print("Executing BigQuery operations...")
all_drugs_df = bpd.read_gbq(query_all_drugs)
disease_drugs_df = bpd.read_gbq(query_disease_drugs)
moa_pair_sim_df = bpd.read_gbq(query_moa_pair_similarity)

print(f"Loaded {len(all_drugs_df)} drugs, {len(disease_drugs_df)} disease-drug relationships, and {len(moa_pair_sim_df)} MOA similarity pairs")

Loading datasets...
Executing BigQuery operations...
Loaded 27499 drugs, 253442 disease-drug relationships, and 2812329 MOA similarity pairs


In [5]:
from score import *

# Execute the recommendation system for the queried disease
result = recommend_for_disease_with_similars(
    disease_name=query_text,
    similar_disease_names=similar_disease_df['disease_name'].to_list(),
    disease_drugs_df=disease_drugs_df,
    all_drugs_df=all_drugs_df,
    moa_pair_sim_df=moa_pair_sim_df,
    top_overall=10,
    top_similar=5,
    similar_weight=0.5,
    evaluation_mode=False
)

In [9]:
# Display recommendation results
print(f"\n🔍 Disease Analysis Results for: {query_text}")
print("=" * 60)

print(f"\n💊 Known Drugs ({len(result['known_drugs'])} found):")
if not result["known_drugs"].empty:
    for idx, row in result["known_drugs"].iterrows():
        print(f"  {idx+1}. {row.get('drug_name', row['drugId'])}")
else:
    print("  No known drugs found for this disease")

print(f"\n🎯 Overall Recommendations ({len(result['overall_recommendations'])} drugs):")
if not result["overall_recommendations"].empty:
    for idx, row in result["overall_recommendations"].iterrows():
        score = row['final_score']
        drug_name = row.get('drug_name', row['drugId'])
        print(f"  {idx+1}. {drug_name} (Score: {score:.3f})")
else:
    print("  No recommendations available")

print(f"\n🔗 Similar Disease Recommendations ({len(result['similar_recommendations'])} drugs):")
if not result["similar_recommendations"].empty:
    for idx, row in result["similar_recommendations"].iterrows():
        score = row['score_against_primary']
        drug_name = row.get('drug_name', row['drugId'])
        print(f"  {idx+1}. {drug_name} (Score: {score:.3f})")
else:
    print("  No similar disease recommendations available")



🔍 Disease Analysis Results for: trypanosomiasis

💊 Known Drugs (5 found):
  1. CHEMBL265502
  2. CHEMBL413376
  3. CHEMBL52440
  4. CHEMBL655
  5. CHEMBL830

🎯 Overall Recommendations (10 drugs):
  1. EFLORNITHINE HYDROCHLORIDE (Score: 6.401)
  2. DEXTROMETHORPHAN POLISTIREX (Score: 5.278)
  3. DEXTROMETHORPHAN HYDROBROMIDE (Score: 5.278)
  4. CARBETAPENTANE CITRATE (Score: 5.264)
  5. CARBETAPENTANE (Score: 5.264)
  6. DIAZEPAM (Score: 3.009)
  7. MIDAZOLAM HYDROCHLORIDE (Score: 3.009)
  8. QUAZEPAM (Score: 3.009)
  9. CHLORDIAZEPOXIDE HYDROCHLORIDE (Score: 3.009)
  10. METHYPRYLON (Score: 3.009)

🔗 Similar Disease Recommendations (1 drugs):
  1. ATORVASTATIN (Score: 0.604)


## Performance Evaluation

**Important Caveat**: This recommendation system is designed to find potential drugs which cannot be validated as useful in real life without extensive clinical testing. These metrics are simply to evaluate if our approach can generate results that overlap with existing known drugs - this is not a strict academic evaluation method. If we find overlap, it shows to some level that our method has potential for drug discovery.

1. Randomly Sample 100 Disease from disease table
2. Get the recommendation drugs from similar drugs
3. Measure the overlay level with the following metrics:
    - **Precision**: How many recommendation drugs are in answer group (already known drugs)
    - **Recall**: How many answer group drugs are in recommendation group
    - **F1 Score**: Harmonic mean of precision and recall
    - **Overlap Count**: Total number of overlapping drugs between recommendations and known drugs

In [11]:
from evaluation import run_disease_evaluation, get_random_diseases_sample
import pandas as pd
import time
from tqdm import tqdm


In [12]:
# Sample 100 random diseases with at least 1 known drug
print("Sampling 100 random diseases...")
random_diseases_df = get_random_diseases_sample(
    n_samples=100,
    project_id=project_id,
    source_project_id=source_project_id,
    source_dataset_id=source_dataset_id,
    min_known_drugs=1
)

print(f"Sampled {len(random_diseases_df)} diseases:")
print(random_diseases_df.head())


Sampling 100 random diseases...
Sampled 100 diseases:


                    disease_name  known_drug_count
0              Uterine leiomyoma                 2
1  idiopathic pulmonary fibrosis                62
2       fetal growth restriction                 6
3                    Cholangitis                 1
4               Immunodeficiency                 3

[5 rows x 2 columns]


In [None]:
# Run evaluation experiments for all sampled diseases
print("Running evaluation experiments...")

# Convert random_diseases_df to pandas for iteration
diseases_list = random_diseases_df.to_pandas()['disease_name'].tolist()

# Store results
evaluation_results = []

# Run experiments with progress bar
for i, disease_name in enumerate(tqdm(diseases_list, desc="Evaluating diseases")):
    print(f"\nEvaluating disease {i+1}/100: {disease_name}")
    
    # Run single disease evaluation
    result = run_disease_evaluation(
        disease_name=disease_name,
        disease_drugs_df=disease_drugs_df,
        all_drugs_df=all_drugs_df,
        moa_pair_sim_df=moa_pair_sim_df,
        disease_embedding_df=None,  # Will be handled inside the function
        project_id=project_id,
        dataset_id=dataset_id,
        embedding_model_name=text_embedding_model_name,
        distance_threshold=distance_threshold,
        top_overall=10,
        top_similar=5,
        similar_weight=0.5
    )
    
    evaluation_results.append(result)
    
    # Print progress summary
    if result['success']:
        metrics = result['metrics']
        print(f"  Success: {result['known_drugs_count']} known drugs, "
              f"{result['overall_recommendations_count']} overall recs, "
              f"{result['similar_recommendations_count']} similar recs")
        if 'overall_precision' in metrics:
            print(f"  Overall: P={metrics['overall_precision']:.3f}, "
                  f"R={metrics['overall_recall']:.3f}, "
                  f"F1={metrics['overall_f1_score']:.3f}")
        if 'similar_precision' in metrics:
            print(f"  Similar: P={metrics['similar_precision']:.3f}, "
                  f"R={metrics['similar_recall']:.3f}, "
                  f"F1={metrics['similar_f1_score']:.3f}")
        if 'combined_precision' in metrics:
            print(f"  Combined: P={metrics['combined_precision']:.3f}, "
                  f"R={metrics['combined_recall']:.3f}, "
                  f"F1={metrics['combined_f1_score']:.3f}")
    else:
        print(f"  Failed: {result['error']}")
    
    # Small delay to avoid overwhelming BigQuery
    time.sleep(0.5)

print(f"\nCompleted evaluation of {len(evaluation_results)} diseases")


Running evaluation experiments...


Evaluating diseases:   0%|          | 0/100 [00:00<?, ?it/s]


Evaluating disease 1/100: Uterine leiomyoma
  Success: 2 known drugs, 10 overall recs, 5 similar recs
  Overall: P=0.000, R=0.000, F1=0.000
  Similar: P=0.200, R=0.500, F1=0.286
  Combined: P=0.067, R=0.500, F1=0.118


Evaluating diseases:   1%|          | 1/100 [01:00<1:40:05, 60.66s/it]


Evaluating disease 2/100: idiopathic pulmonary fibrosis


In [9]:
# Aggregate results as DataFrame
print("Aggregating evaluation results...")

# Flatten the results into a DataFrame
results_data = []
for result in evaluation_results:
    # Base information
    row = {
        'disease_name': result['disease_name'],
        'success': result['success'],
        'error': result.get('error', None),
        'similar_diseases_count': result.get('similar_diseases_count', 0),
        'train_drugs_count': result.get('train_drugs_count', 0),
        'test_drugs_count': result.get('test_drugs_count', 0),
        'known_drugs_count': result.get('known_drugs_count', 0),
        'overall_recommendations_count': result.get('overall_recommendations_count', 0),
        'similar_recommendations_count': result.get('similar_recommendations_count', 0)
    }
    
    # Add metrics if available
    if 'metrics' in result and result['metrics']:
        row.update(result['metrics'])
    
    results_data.append(row)

# Create DataFrame
results_df = pd.DataFrame(results_data)

print(f"Created results DataFrame with {len(results_df)} rows and {len(results_df.columns)} columns")
print("\nDataFrame columns:")
print(results_df.columns.tolist())

# Display summary statistics
print(f"\nEvaluation Summary:")
print(f"Successful evaluations: {results_df['success'].sum()}/{len(results_df)}")
print(f"Failed evaluations: {(~results_df['success']).sum()}/{len(results_df)}")

# Display sample results
print(f"\nFirst 5 results:")
display_cols = ['disease_name', 'success', 'train_drugs_count', 'test_drugs_count', 'overall_recommendations_count', 
                'overall_precision', 'overall_recall', 'overall_f1_score']
available_cols = [col for col in display_cols if col in results_df.columns]
print(results_df[available_cols].head())


Aggregating evaluation results...
Created results DataFrame with 5 rows and 27 columns

DataFrame columns:
['disease_name', 'success', 'error', 'similar_diseases_count', 'train_drugs_count', 'test_drugs_count', 'known_drugs_count', 'overall_recommendations_count', 'similar_recommendations_count', 'overall_known_count', 'overall_recs_count', 'overall_overlap_count', 'overall_precision', 'overall_recall', 'overall_f1_score', 'similar_known_count', 'similar_recs_count', 'similar_overlap_count', 'similar_precision', 'similar_recall', 'similar_f1_score', 'combined_known_count', 'combined_recs_count', 'combined_overlap_count', 'combined_precision', 'combined_recall', 'combined_f1_score']

Evaluation Summary:
Successful evaluations: 5/5
Failed evaluations: 0/5

First 5 results:
                            disease_name  success  train_drugs_count  \
0                      esophageal cancer     True                  0   
1     proliferative diabetic retinopathy     True                  0   
2 

In [10]:
# Analyze evaluation results
print("=== EVALUATION ANALYSIS ===")

# Filter successful evaluations only
successful_results = results_df[results_df['success'] == True]

if len(successful_results) > 0:
    print(f"\nAnalyzing {len(successful_results)} successful evaluations:")
    
    # Overall performance metrics
    numeric_cols = successful_results.select_dtypes(include=[float, int]).columns
    metric_cols = [col for col in numeric_cols if any(keyword in col.lower() 
                   for keyword in ['precision', 'recall', 'f1_score'])]
    
    if metric_cols:
        print(f"\nPerformance Metrics Summary:")
        for col in metric_cols:
            if col in successful_results.columns:
                mean_val = successful_results[col].mean()
                std_val = successful_results[col].std()
                print(f"  {col}: {mean_val:.4f} ± {std_val:.4f}")
    
    # Count-based statistics
    print(f"\nCount Statistics:")
    print(f"  Average known drugs per disease: {successful_results['known_drugs_count'].mean():.2f}")
    print(f"  Average recommendations per disease: {successful_results['overall_recommendations_count'].mean():.2f}")
    print(f"  Average similar diseases found: {successful_results['similar_diseases_count'].mean():.2f}")
    
    # Overlap analysis
    if 'overall_overlap_count' in successful_results.columns:
        overlap_stats = successful_results['overall_overlap_count']
        print(f"\nOverlap Analysis:")
        print(f"  Diseases with at least 1 overlap: {(overlap_stats > 0).sum()}/{len(successful_results)}")
        print(f"  Average overlap count: {overlap_stats.mean():.2f}")
        print(f"  Max overlap count: {overlap_stats.max()}")

    # Save results to CSV
    results_df.to_csv('evaluation_results.csv', index=False)
    print(f"\nResults saved to 'evaluation_results.csv'")
    
else:
    print("No successful evaluations to analyze.")

print("\n=== EVALUATION COMPLETE ===")

# Display final summary
print(f"\nFinal Summary:")
print(f"Total diseases evaluated: {len(results_df)}")
print(f"Successful evaluations: {results_df['success'].sum()}")
print(f"Failed evaluations: {(~results_df['success']).sum()}")

if len(successful_results) > 0 and 'overall_precision' in successful_results.columns:
    # Overall metrics
    avg_precision = successful_results['overall_precision'].mean()
    avg_recall = successful_results['overall_recall'].mean()
    avg_f1 = successful_results['overall_f1_score'].mean()
    print(f"Average Overall Precision: {avg_precision:.4f}")
    print(f"Average Overall Recall: {avg_recall:.4f}")
    print(f"Average Overall F1-Score: {avg_f1:.4f}")
    
    # Similar disease metrics
    if 'similar_precision' in successful_results.columns:
        sim_precision = successful_results['similar_precision'].mean()
        sim_recall = successful_results['similar_recall'].mean()
        sim_f1 = successful_results['similar_f1_score'].mean()
        print(f"Average Similar Precision: {sim_precision:.4f}")
        print(f"Average Similar Recall: {sim_recall:.4f}")
        print(f"Average Similar F1-Score: {sim_f1:.4f}")
    
    # Combined metrics  
    if 'combined_precision' in successful_results.columns:
        comb_precision = successful_results['combined_precision'].mean()
        comb_recall = successful_results['combined_recall'].mean()
        comb_f1 = successful_results['combined_f1_score'].mean()
        print(f"Average Combined Precision: {comb_precision:.4f}")
        print(f"Average Combined Recall: {comb_recall:.4f}")
        print(f"Average Combined F1-Score: {comb_f1:.4f}")


=== EVALUATION ANALYSIS ===

Analyzing 5 successful evaluations:

Performance Metrics Summary:
  overall_precision: 0.2200 ± 0.2280
  overall_recall: 0.1955 ± 0.2363
  overall_f1_score: 0.1148 ± 0.1089
  similar_precision: 0.1600 ± 0.3578
  similar_recall: 0.0087 ± 0.0194
  similar_f1_score: 0.0165 ± 0.0369
  combined_precision: 0.2174 ± 0.2385
  combined_recall: 0.1998 ± 0.2335
  combined_f1_score: 0.1134 ± 0.1075

Count Statistics:
  Average known drugs per disease: 55.60
  Average recommendations per disease: 10.00
  Average similar diseases found: 3.20

Overlap Analysis:
  Diseases with at least 1 overlap: 4/5
  Average overlap count: 2.20
  Max overlap count: 6

Results saved to 'evaluation_results.csv'

=== EVALUATION COMPLETE ===

Final Summary:
Total diseases evaluated: 5
Successful evaluations: 5
Failed evaluations: 0
Average Overall Precision: 0.2200
Average Overall Recall: 0.1955
Average Overall F1-Score: 0.1148
Average Similar Precision: 0.1600
Average Similar Recall: 0.008