# üß¨ HAI-DEF Drug Discovery Pipeline

> **End-to-end drug discovery using Google's Health AI Developer Foundations**
>
> Models: **TxGemma** (2B/9B/27B) + **MedGemma** (4B)

This notebook demonstrates a complete drug discovery workflow using HAI-DEF models for:
1. üéØ **Target Identification** ‚Äî Finding protein targets for diseases
2. üíä **Lead Discovery** ‚Äî Screening and scoring drug candidates
3. üî¨ **Binding Affinity** ‚Äî Predicting drug-target interactions
4. ‚öóÔ∏è **ADMET Profiling** ‚Äî Safety and pharmacokinetic assessment
5. üß™ **Clinical Reasoning** ‚Äî Viability prediction for clinical trials

In [None]:
# Install dependencies (uncomment if needed)
# !pip install -q transformers torch accelerate pandas rdkit-pypi tabulate matplotlib seaborn

In [None]:
import sys
import os
import logging
import warnings
warnings.filterwarnings('ignore')

# Add parent directory to path so we can import the pipeline package
sys.path.insert(0, os.path.dirname(os.getcwd()) if 'notebooks' in os.getcwd() else os.getcwd())

logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(name)s] %(message)s', datefmt='%H:%M:%S')
print('‚úÖ Setup complete')

---
## üéØ Stage 1: Target Identification

Using **TxGemma-Chat** to identify and characterize protein targets for a disease.

In [None]:
from pipeline.target_identification import identify_targets, print_targets

DISEASE = "Non-Small Cell Lung Cancer"

targets = identify_targets(DISEASE)
print_targets(targets)

---
## üíä Stage 2: Lead Discovery ‚Äî Compound Screening

Screen compounds from our library using **RDKit** molecular descriptors + **TxGemma-Predict** scoring.

In [None]:
from pipeline.lead_discovery import screen_compounds, print_screening_results

TARGET = "EGFR"

screening_df = screen_compounds(target_filter=TARGET)
print_screening_results(screening_df)

# Show the full dataframe
screening_df.head(10)

---
## üî¨ Stage 3: Binding Affinity Prediction

Predict drug-target binding using **TxGemma-Predict** with SMILES + protein sequence input.

In [None]:
from pipeline.binding_affinity import batch_binding_prediction, print_binding_results

# Prepare compounds from screening results
compounds = []
for _, row in screening_df.iterrows():
    compounds.append({'name': row['compound_name'], 'smiles': row['smiles']})

binding_results = batch_binding_prediction(compounds, TARGET)
print_binding_results(binding_results)

---
## ‚öóÔ∏è Stage 4: ADMET Profiling

Evaluate Absorption, Distribution, Metabolism, Excretion & Toxicity with **TxGemma-Predict**.

In [None]:
from pipeline.admet_profiling import batch_admet_profiling, print_batch_summary, print_admet_profile

admet_profiles = batch_admet_profiling(compounds)
print_batch_summary(admet_profiles)

# Show detailed profile for the top candidate
print_admet_profile(admet_profiles[0])

---
## üß™ Stage 5: Clinical Viability Assessment

Using **TxGemma-Chat** for interactive clinical reasoning about our top candidates.

In [None]:
from pipeline.clinical_reasoning import assess_clinical_viability, print_clinical_assessment

# Assess the top 3 candidates
clinical_assessments = []
for i, compound in enumerate(compounds[:3]):
    assessment = assess_clinical_viability(
        smiles=compound['smiles'],
        compound_name=compound['name'],
        target_name=TARGET,
        disease=DISEASE,
        admet_profile=admet_profiles[i] if i < len(admet_profiles) else None,
        binding_result=binding_results[i] if i < len(binding_results) else None,
    )
    clinical_assessments.append(assessment)
    print_clinical_assessment(assessment)

---
## üìä Final Report & Visualization

In [None]:
from pipeline.visualization import print_final_report, create_pipeline_summary_chart

print_final_report(
    targets=targets,
    screening_results=screening_df,
    binding_results=binding_results,
    admet_profiles=admet_profiles,
    clinical_assessments=clinical_assessments,
)

# Generate summary chart
chart_data = []
for i, c in enumerate(clinical_assessments):
    chart_data.append({
        'compound_name': c['compound_name'],
        'binding_score': binding_results[i].get('confidence', 0.5) if i < len(binding_results) else 0.5,
        'admet_score': 1.0 - len(admet_profiles[i].get('flags', [])) * 0.2 if i < len(admet_profiles) else 0.5,
        'clinical_score': c.get('avg_probability', 0.5),
    })

chart_path = create_pipeline_summary_chart(chart_data)
if chart_path:
    from IPython.display import Image, display
    display(Image(filename=chart_path))

---
## ‚ö†Ô∏è Disclaimer

This pipeline is for **research and educational purposes only**. It is NOT validated for clinical use.
Drug development requires extensive regulatory testing. Always consult qualified professionals.

**Models**: TxGemma + MedGemma from Google Health AI Developer Foundations (HAI-DEF)

**License**: Apache 2.0