# 🩺 Breast Cancer CORE Memory Demo

Transform breast cancer clinical trial and patient data into mCODE format and store in CORE Memory!

## 🎯 What you'll accomplish:

- 📥 Download breast cancer clinical trials from ClinicalTrials.gov
- 👥 Fetch synthetic breast cancer patient data
- 🔄 Convert data to standardized mCODE format
- 🧠 Store mCODE summaries in CORE Memory
- 🔍 Search and explore stored medical knowledge

## ⚡ Features:

- Configurable number of patients and trials
- Automatic space creation in CORE Memory
- Interactive progress tracking with emojis
- Full mCODE compliance for oncology data

In [None]:
# 📋 Configuration & Setup

import os
import sys
import json
from pathlib import Path

# Change to project root directory
current_dir = Path.cwd()
if current_dir.name == 'examples':
    project_root = current_dir.parent
    os.chdir(project_root)
    print(f"📁 Changed working directory to: {project_root}")
else:
    project_root = current_dir

# Add project root to Python path
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))
    print("✅ Added project root to Python path")

# Now we can import from src
from src.utils.core_memory_client import CoreMemoryClient

print("🎉 Environment ready!")
print(f"📍 Current working directory: {Path.cwd()}")

In [None]:
# 🎛️ Demo Configuration

# Set the number of patients and trials to process
NUM_TRIALS = 3      # Number of breast cancer clinical trials to download
NUM_PATIENTS = 2    # Number of synthetic breast cancer patients to fetch

print(f"🎯 Demo Configuration:")
print(f"   📊 Clinical Trials: {NUM_TRIALS}")
print(f"   👥 Synthetic Patients: {NUM_PATIENTS}")
print(f"   🎨 Total Records: {NUM_TRIALS + NUM_PATIENTS}")

# Initialize CORE Memory client
try:
    client = CoreMemoryClient()
    print("🧠 CORE Memory client initialized successfully!")
except Exception as e:
    print(f"❌ Failed to initialize CORE Memory client: {e}")
    print("💡 Make sure COREAI_API_KEY is set in your environment")
    raise

## 🏗️ Space Management

Create dedicated spaces in CORE Memory for clinical trials and patient data.

In [None]:
# 🏗️ Create CORE Memory Spaces

print("🏗️ Setting up CORE Memory spaces...")

# Create space for clinical trials
trials_space_id = client.get_clinical_trials_space_id()
print(f"📋 Clinical Trials Space: {trials_space_id}")

# Create space for patients
patients_space_id = client.get_patients_space_id()
print(f"👥 Patients Space: {patients_space_id}")

# Show all spaces
all_spaces = client.get_spaces()
print(f"\n📂 Available Spaces ({len(all_spaces)}):")
for space in all_spaces:
    marker = "🎯" if space.get('id') == trials_space_id else "👤" if space.get('id') == patients_space_id else "📁"
    print(f"   {marker} {space.get('name')} (ID: {space.get('id')})")

print("✅ Spaces ready for data storage!")

## 📥 Data Acquisition

Download real clinical trial data and synthetic patient records.

In [None]:
# 📥 Download Breast Cancer Clinical Trials

print(f"📥 Downloading {NUM_TRIALS} breast cancer clinical trials...")

# Use the CLI to fetch trials (this creates raw_trials.ndjson)
trials_command = f"python -m src.cli.trials_fetcher --condition 'breast cancer' --limit {NUM_TRIALS} --out raw_trials.ndjson --workers 4 --verbose"
print(f"🚀 Running: {trials_command}")

# Execute the command
!{trials_command}

# Check if file was created
if Path('raw_trials.ndjson').exists():
    print("✅ Clinical trials downloaded successfully!")
else:
    print("❌ Failed to download clinical trials")
    print("💡 Check your ClinicalTrials.gov API key and internet connection")

In [None]:
# 👥 Download Synthetic Breast Cancer Patients

print(f"👥 Downloading {NUM_PATIENTS} synthetic breast cancer patients...")

# Use the CLI to fetch patients (this creates raw_patients.ndjson)
patients_command = f"python -m src.cli.patients_fetcher --archive breast_cancer_10_years --limit {NUM_PATIENTS} --out raw_patients.ndjson --verbose"
print(f"🚀 Running: {patients_command}")

# Execute the command
!{patients_command}

# Check if file was created
if Path('raw_patients.ndjson').exists():
    print("✅ Synthetic patients downloaded successfully!")
else:
    print("❌ Failed to download synthetic patients")
    print("💡 Check your internet connection and try downloading archives manually")

## 🔄 mCODE Conversion

Convert raw clinical data to standardized mCODE format.

In [None]:
# 🔄 Convert Trials to mCODE Format

print("🔄 Converting clinical trials to mCODE format...")

# Use the CLI to process trials to mCODE
mcode_trials_command = "python -m src.cli.trials_processor raw_trials.ndjson --out mcode_trials.ndjson --workers 4 --verbose"
print(f"🚀 Running: {mcode_trials_command}")

# Execute the command
!{mcode_trials_command}

# Check results
if Path('mcode_trials.ndjson').exists():
    print("✅ Clinical trials converted to mCODE!")
else:
    print("❌ Failed to convert trials to mCODE")
    print("💡 Check the raw_trials.ndjson file and processing logs")

In [None]:
# 👨‍⚕️ Convert Patients to mCODE Format

print("👨‍⚕️ Converting patient data to mCODE format...")

# Use the CLI to process patients to mCODE
mcode_patients_command = "python -m src.cli.patients_processor --in raw_patients.ndjson --out mcode_patients.ndjson --workers 4 --verbose"
print(f"🚀 Running: {mcode_patients_command}")

# Execute the command
!{mcode_patients_command}

# Check results
if Path('mcode_patients.ndjson').exists():
    print("✅ Patient data converted to mCODE!")
else:
    print("❌ Failed to convert patients to mCODE")
    print("💡 Check the raw_patients.ndjson file and processing logs")

## 📝 Generate Summaries

Create human-readable summaries of the mCODE data.

In [None]:
# 📝 Generate Trial Summaries

print("📝 Generating clinical trial summaries...")

# Use the CLI to create summaries and store in CORE Memory
summarize_trials_command = "python -m src.cli.trials_summarizer --in mcode_trials.ndjson --ingest --verbose"
print(f"🚀 Running: {summarize_trials_command}")

# Execute the command
!{summarize_trials_command}

print("✅ Clinical trial summaries generated and stored in CORE Memory!")

In [None]:
# 📝 Generate Patient Summaries

print("📝 Generating patient summaries...")

# Use the CLI to create summaries and store in CORE Memory
summarize_patients_command = "python -m src.cli.patients_summarizer --in mcode_patients.ndjson --ingest --verbose"
print(f"🚀 Running: {summarize_patients_command}")

# Execute the command
!{summarize_patients_command}

print("✅ Patient summaries generated and stored in CORE Memory!")

## 🔍 Knowledge Exploration

Search and explore the medical knowledge stored in CORE Memory.

In [None]:
# 🔍 Search for Breast Cancer Information

print("🔍 Searching CORE Memory for breast cancer information...")

# Search across all spaces
breast_cancer_results = client.search("breast cancer", limit=10)

print("🎯 Search Results for 'breast cancer':")
print(f"   📄 Episodes found: {len(breast_cancer_results.get('episodes', []))}")
print(f"   🧠 Facts found: {len(breast_cancer_results.get('facts', []))}")

# Show sample episodes
episodes = breast_cancer_results.get('episodes', [])[:3]
if episodes:
    print("\n📖 Sample Episodes:")
    for i, episode in enumerate(episodes, 1):
        content = episode.get('content', '')[:100]
        print(f"   {i}. {content}...")
else:
    print("   No episodes found yet - data may still be processing")

print("\n💡 Tip: Search results may take a few moments to appear as data is processed asynchronously")

In [None]:
# 🔍 Search Clinical Trials Space

print("🔍 Searching clinical trials space specifically...")

# Search within the trials space
trials_search = client.search("clinical trial", space_id=trials_space_id, limit=5)

print("📋 Clinical Trials Space Search Results:")
print(f"   📄 Episodes: {len(trials_search.get('episodes', []))}")
print(f"   🧠 Facts: {len(trials_search.get('facts', []))}")

# Show trial-specific content
for episode in trials_search.get('episodes', [])[:2]:
    content = episode.get('content', '')[:150]
    print(f"   📊 {content}...")

In [None]:
# 🔍 Search Patients Space

print("🔍 Searching patients space...")

# Search within the patients space
patients_search = client.search("patient", space_id=patients_space_id, limit=5)

print("👥 Patients Space Search Results:")
print(f"   📄 Episodes: {len(patients_search.get('episodes', []))}")
print(f"   🧠 Facts: {len(patients_search.get('facts', []))}")

# Show patient-specific content
for episode in patients_search.get('episodes', [])[:2]:
    content = episode.get('content', '')[:150]
    print(f"   👤 {content}...")

## 📊 Results Summary

Review what we've accomplished and check our data files.

In [None]:
# 📊 Demo Results Summary

print("📊 Breast Cancer CORE Memory Demo - Results Summary")
print("=" * 60)

# Check generated files
files_to_check = [
    'raw_trials.ndjson',
    'raw_patients.ndjson',
    'mcode_trials.ndjson',
    'mcode_patients.ndjson'
]

print("📁 Generated Files:")
total_size = 0
for filename in files_to_check:
    file_path = Path(filename)
    if file_path.exists():
        size = file_path.stat().st_size
        total_size += size
        print(f"   ✅ {filename} ({size} bytes)")
    else:
        print(f"   ❌ {filename} (not found)")

print(f"\n💾 Total data processed: {total_size} bytes")

# Show CORE Memory spaces
print("\n🧠 CORE Memory Spaces:")
spaces = client.get_spaces()
for space in spaces:
    space_id = space.get('id')
    marker = "🎯" if space_id == trials_space_id else "👤" if space_id == patients_space_id else "📁"
    print(f"   {marker} {space.get('name')}")

# Show search statistics
total_episodes = len(breast_cancer_results.get('episodes', []))
total_facts = len(breast_cancer_results.get('facts', []))

print(f"\n🔍 Knowledge Base Status:")
print(f"   📄 Medical Episodes: {total_episodes}")
print(f"   🧠 Medical Facts: {total_facts}")
print(f"   🎯 Total Knowledge Items: {total_episodes + total_facts}")

print("\n🎉 Demo completed successfully!")
print("🏆 Breast cancer data is now stored and searchable in CORE Memory!")

## 🎯 Next Steps

Your breast cancer knowledge base is now ready! Here are some things you can do:

- 🔍 **Search**: Use different queries to explore the medical knowledge
- 📊 **Analyze**: Look at treatment patterns and patient outcomes
- 🔬 **Research**: Compare clinical trials and eligibility criteria
- 🤖 **AI Integration**: Use the stored knowledge for clinical decision support
- 📈 **Scale Up**: Process more trials and patients for comprehensive analysis

## 🏆 Achievements

✅ Downloaded real clinical trial data  
✅ Fetched realistic synthetic patient data  
✅ Converted to standardized mCODE format  
✅ Generated human-readable summaries  
✅ Stored everything in CORE Memory  
✅ Created searchable medical knowledge base  

**You're now ready to explore oncology data with AI! 🚀**