# 🩺 Breast Cancer CORE Memory Demo

Transform breast cancer clinical trial and patient data into mCODE format and store in CORE Memory!

## 🎯 What you'll accomplish:

- 📥 Download breast cancer clinical trials from ClinicalTrials.gov
- 👥 Fetch synthetic breast cancer patient data
- 🔄 Convert data to standardized mCODE format
- 🧠 Store mCODE summaries in CORE Memory
- 🔍 Search and explore stored medical knowledge

## ⚡ Features:

- Configurable number of patients and trials
- Automatic space creation in CORE Memory
- Interactive progress tracking with emojis
- Full mCODE compliance for oncology data

In [1]:
# 📋 Configuration & Setup

import os
import sys
import json
from pathlib import Path

# Change to project root directory
current_dir = Path.cwd()
if current_dir.name == 'examples':
    project_root = current_dir.parent
    os.chdir(project_root)
    print(f"📁 Changed working directory to: {project_root}")
else:
    project_root = current_dir

# Add project root to Python path
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))
    print("✅ Added project root to Python path")

# Now we can import from src
from src.utils.core_memory_client import CoreMemoryClient

print("🎉 Environment ready!")
print(f"📍 Current working directory: {Path.cwd()}")

📁 Changed working directory to: /Users/idrdex/mcode_translator
✅ Added project root to Python path
🎉 Environment ready!
📍 Current working directory: /Users/idrdex/mcode_translator


In [2]:
# 🎛️ Demo Configuration

# Set the number of patients and trials to process
NUM_TRIALS = 3      # Number of breast cancer clinical trials to download
NUM_PATIENTS = 2    # Number of synthetic breast cancer patients to fetch

print(f"🎯 Demo Configuration:")
print(f"   📊 Clinical Trials: {NUM_TRIALS}")
print(f"   👥 Synthetic Patients: {NUM_PATIENTS}")
print(f"   🎨 Total Records: {NUM_TRIALS + NUM_PATIENTS}")

# Initialize CORE Memory client
try:
    client = CoreMemoryClient()
    print("🧠 CORE Memory client initialized successfully!")
except Exception as e:
    print(f"❌ Failed to initialize CORE Memory client: {e}")
    print("💡 Make sure COREAI_API_KEY is set in your environment")
    raise

🎯 Demo Configuration:
   📊 Clinical Trials: 3
   👥 Synthetic Patients: 2
   🎨 Total Records: 5
🧠 CORE Memory client initialized successfully!


## 🏗️ Space Management

Create dedicated spaces in CORE Memory for clinical trials and patient data.

In [3]:
# 🏗️ Create CORE Memory Spaces

print("🏗️ Setting up CORE Memory spaces...")

# Create space for clinical trials
trials_space_id = client.get_clinical_trials_space_id()
print(f"📋 Clinical Trials Space: {trials_space_id}")

# Create space for patients
patients_space_id = client.get_patients_space_id()
print(f"👥 Patients Space: {patients_space_id}")

# Show all spaces
all_spaces = client.get_spaces()
print(f"\n📂 Available Spaces ({len(all_spaces)}):")
for space in all_spaces:
    marker = "🎯" if space.get('id') == trials_space_id else "👤" if space.get('id') == patients_space_id else "📁"
    print(f"   {marker} {space.get('name')} (ID: {space.get('id')})")

print("✅ Spaces ready for data storage!")

🏗️ Setting up CORE Memory spaces...
📋 Clinical Trials Space: cmfr7b8qt00fpny1vkqu9gr7o
👥 Patients Space: cmfr7b92x00frny1vjr71po6l

📂 Available Spaces (3):
   📁 Profile (ID: cmflmxmas00rvqf1vz3bslziu)
   👤 Patients (ID: cmfr7b92x00frny1vjr71po6l)
   🎯 Clinical Trials (ID: cmfr7b8qt00fpny1vkqu9gr7o)
✅ Spaces ready for data storage!


## 📥 Data Acquisition

Download real clinical trial data and synthetic patient records.

In [4]:
# 📥 Download Breast Cancer Clinical Trials

print(f"📥 Downloading {NUM_TRIALS} breast cancer clinical trials...")

# Use the CLI to fetch trials (this creates raw_trials.ndjson)
trials_command = f"python -m src.cli.trials_fetcher --condition 'breast cancer' --limit {NUM_TRIALS} --out raw_trials.ndjson --workers 4 --verbose"
print(f"🚀 Running: {trials_command}")

# Execute the command
!{trials_command}

# Check if file was created
if Path('raw_trials.ndjson').exists():
    print("✅ Clinical trials downloaded successfully!")
else:
    print("❌ Failed to download clinical trials")
    print("💡 Check your ClinicalTrials.gov API key and internet connection")

📥 Downloading 3 breast cancer clinical trials...
🚀 Running: python -m src.cli.trials_fetcher --condition 'breast cancer' --limit 3 --out raw_trials.ndjson --workers 4 --verbose
[32mINFO    [0m [34mTrialsFetcherWorkflow:105[0m 🔍 Searching for trials: 'breast cancer' (limit: 3)[0m
[32mINFO    [0m [34msrc.utils.fetcher:83[0m API returned 3 studies for search 'breast cancer'[0m
[32mINFO    [0m [34mTrialsFetcherWorkflow:130[0m 📋 Found 3 trials in search[0m
[32mINFO    [0m [34mTrialsFetcherWorkflow:154[0m 📥 Fetching full study data for 3 NCT IDs[0m
[32mINFO    [0m [34mFullDataFetcherQueue:301[0m 🚀 FullDataFetcherQueue: Processing 3 tasks[0m
[32mINFO    [0m [34mFullDataFetcherQueue:220[0m 🚀 FullDataFetcherQueue: Started with 4 workers[0m
[32mINFO    [0m [34mFullDataFetcherQueue:246[0m ⚡ FullDataFetcherQueue: STARTED full_fetch_0[0m
[32mINFO    [0m [34mFullDataFetcherQueue:246[0m ⚡ FullDataFetcherQueue: STARTED full_fetch_1[0m
[32mINFO    [0m [34mFull

In [5]:
# 👥 Download Synthetic Breast Cancer Patients

print(f"👥 Downloading {NUM_PATIENTS} synthetic breast cancer patients...")

# Use the CLI to fetch patients (this creates raw_patients.ndjson)
patients_command = f"python -m src.cli.patients_fetcher --archive breast_cancer_10_years --limit {NUM_PATIENTS} --out raw_patients.ndjson --verbose"
print(f"🚀 Running: {patients_command}")

# Execute the command
!{patients_command}

# Check if file was created
if Path('raw_patients.ndjson').exists():
    print("✅ Synthetic patients downloaded successfully!")
else:
    print("❌ Failed to download synthetic patients")
    print("💡 Check your internet connection and try downloading archives manually")

👥 Downloading 2 synthetic breast cancer patients...
🚀 Running: python -m src.cli.patients_fetcher --archive breast_cancer_10_years --limit 2 --out raw_patients.ndjson --verbose
[32mINFO    [0m [34mPatientsFetcherWorkflow:122[0m 📥 Fetching up to 2 patients from breast_cancer_10_years[0m
[32mINFO    [0m [34msrc.utils.patient_generator:147[0m Resolved named archive 'breast_cancer_10_years' to: data/synthetic_patients/breast_cancer/10_years/breast_cancer_10_years.zip[0m
[32mINFO    [0m [34msrc.utils.patient_generator:171[0m Scanning patient data archive: data/synthetic_patients/breast_cancer/10_years/breast_cancer_10_years.zip[0m
[32mINFO    [0m [34msrc.utils.patient_generator:184[0m Found 1115 patient data files[0m
[32mINFO    [0m [34mPatientsFetcherWorkflow:151[0m ✅ Successfully fetched 2 patients[0m
[32mINFO    [0m [34mPatientsFetcherWorkflow:189[0m 💾 Patient data saved to: raw_patients.ndjson (NDJSON format)[0m
✅ Patients fetch completed successfully!
📊 To

## 🔄 mCODE Conversion

Convert raw clinical data to standardized mCODE format.

In [6]:
# 🔄 Convert Trials to mCODE Format

print("🔄 Converting clinical trials to mCODE format...")

# Use the CLI to process trials to mCODE
mcode_trials_command = "python -m src.cli.trials_processor raw_trials.ndjson --out mcode_trials.ndjson --workers 4 --verbose"
print(f"🚀 Running: {mcode_trials_command}")

# Execute the command
!{mcode_trials_command}

# Check results
if Path('mcode_trials.ndjson').exists():
    print("✅ Clinical trials converted to mCODE!")
else:
    print("❌ Failed to convert trials to mCODE")
    print("💡 Check the raw_trials.ndjson file and processing logs")

🔄 Converting clinical trials to mCODE format...
🚀 Running: python -m src.cli.trials_processor raw_trials.ndjson --out mcode_trials.ndjson --workers 4 --verbose
[32mINFO    [0m [34m__main__:141[0m 📄 JSON parsing failed, trying NDJSON format...[0m
[32mINFO    [0m [34m__main__:146[0m 📄 Read 3 trials from NDJSON format[0m
[32mINFO    [0m [34m__main__:179[0m 🔬 Processing 3 trials...[0m
[32mINFO    [0m [34m__main__:220[0m Initializing trials processor workflow...[0m
[32mINFO    [0m [34msrc.utils.api_manager:312[0m APIManager initialized with config TTL: 0 seconds[0m
[32mINFO    [0m [34msrc.utils.api_manager:34[0m Initialized API cache for namespace 'trials_processor' at .api_cache/trials_processor with TTL 0[0m
[32mINFO    [0m [34msrc.utils.api_manager:34[0m Initialized API cache for namespace 'mcode_extraction' at .api_cache/mcode_extraction with TTL 0[0m
[32mINFO    [0m [34msrc.utils.api_manager:34[0m Initialized API cache for namespace 'trial_summari

In [7]:
# 👨‍⚕️ Convert Patients to mCODE Format

print("👨‍⚕️ Converting patient data to mCODE format...")

# Use the CLI to process patients to mCODE
mcode_patients_command = "python -m src.cli.patients_processor --in raw_patients.ndjson --out mcode_patients.ndjson --workers 4 --verbose"
print(f"🚀 Running: {mcode_patients_command}")

# Execute the command
!{mcode_patients_command}

# Check results
if Path('mcode_patients.ndjson').exists():
    print("✅ Patient data converted to mCODE!")
else:
    print("❌ Failed to convert patients to mCODE")
    print("💡 Check the raw_patients.ndjson file and processing logs")

👨‍⚕️ Converting patient data to mCODE format...
🚀 Running: python -m src.cli.patients_processor --in raw_patients.ndjson --out mcode_patients.ndjson --workers 4 --verbose
📄 JSON parsing failed, trying NDJSON format...
📄 Read 2 patients from NDJSON format
🔬 Processing 2 patient records...
Patient 1: 289 entries
  Entry 0: type=<class 'dict'>
    Resource: type=<class 'dict'>
    ResourceType: Patient
    Patient 1 name: [{'use': 'official', 'family': 'Reichert620', 'given': ['Katherine209']}]
  Entry 1: type=<class 'dict'>
    Resource: type=<class 'dict'>
    ResourceType: Encounter
  Entry 2: type=<class 'dict'>
    Resource: type=<class 'dict'>
    ResourceType: Observation
Patient 2: 790 entries
  Entry 0: type=<class 'dict'>
    Resource: type=<class 'dict'>
    ResourceType: Patient
    Patient 2 name: [{'use': 'official', 'family': 'Conn188', 'given': ['Madalene903'], 'prefix': ['Ms.']}]
  Entry 1: type=<class 'dict'>
    Resource: type=<class 'dict'>
    ResourceType: Encounter


## 📝 Generate Summaries

Create human-readable summaries of the mCODE data.

In [8]:
# 📝 Generate Trial Summaries

print("📝 Generating clinical trial summaries...")

# Use the CLI to create summaries and store in CORE Memory
summarize_trials_command = "python -m src.cli.trials_summarizer --in mcode_trials.ndjson --ingest --verbose"
print(f"🚀 Running: {summarize_trials_command}")

# Execute the command
!{summarize_trials_command}

print("✅ Clinical trial summaries generated and stored in CORE Memory!")

📝 Generating clinical trial summaries...
🚀 Running: python -m src.cli.trials_summarizer --in mcode_trials.ndjson --ingest --verbose
[32mINFO    [0m [34m__main__:134[0m Loading mCODE trial data...[0m
[32mINFO    [0m [34m__main__:136[0m Loaded 3 mCODE trial records[0m
[32mINFO    [0m [34m__main__:150[0m 🧠 Initialized CORE Memory storage (source: mcode_translator)[0m
[32mINFO    [0m [34m__main__:180[0m Initializing trials summarizer workflow...[0m
[32mINFO    [0m [34m__main__:183[0m Generating natural language summaries...[0m
[32mINFO    [0m [34mTrialsSummarizerWorkflow:50[0m Starting trials summarizer workflow execution[0m
[32mINFO    [0m [34mTrialsSummarizerWorkflow:67[0m 📝 Generating summaries for 3 trials[0m
[32mINFO    [0m [34msrc.storage.mcode_memory_storage:120[0m ✅ Stored trial NCT00243165 mCODE summary in CORE Memory[0m
[32mINFO    [0m [34mTrialsSummarizerWorkflow:104[0m ✅ Stored trial NCT00243165 summary[0m
[32mINFO    [0m [34msrc.

In [9]:
# 📝 Generate Patient Summaries

print("📝 Generating patient summaries...")

# Use the CLI to create summaries and store in CORE Memory
summarize_patients_command = "python -m src.cli.patients_summarizer --in mcode_patients.ndjson --ingest --verbose"
print(f"🚀 Running: {summarize_patients_command}")

# Execute the command
!{summarize_patients_command}

print("✅ Patient summaries generated and stored in CORE Memory!")

📝 Generating patient summaries...
🚀 Running: python -m src.cli.patients_summarizer --in mcode_patients.ndjson --ingest --verbose
[32mINFO    [0m [34m__main__:134[0m Loading mCODE patient data...[0m
[32mINFO    [0m [34m__main__:136[0m Loaded 2 mCODE patient records[0m
[32mINFO    [0m [34m__main__:150[0m 🧠 Initialized CORE Memory storage (source: mcode_translator)[0m
[32mINFO    [0m [34m__main__:165[0m Initializing patients summarizer workflow...[0m
[32mINFO    [0m [34m__main__:168[0m Generating natural language summaries...[0m
[32mINFO    [0m [34mPatientsSummarizerWorkflow:67[0m Starting patients summarizer workflow execution[0m
[32mINFO    [0m [34mPatientsSummarizerWorkflow:84[0m 📝 Generating summaries for 2 patients[0m
[32mINFO    [0m [34mPatientsSummarizerWorkflow:120[0m Using original_patient_data, has 289 entries[0m
[32mINFO    [0m [34msrc.storage.mcode_memory_storage:155[0m ✅ Stored patient 8c0d6e0a-2cbe-436d-0cfc-2e645de0c71a mCODE summ

## 🔍 Knowledge Exploration

Search and explore the medical knowledge stored in CORE Memory.

In [10]:
# 🔍 Search for Breast Cancer Information

print("🔍 Searching CORE Memory for breast cancer information...")

# Search across all spaces
breast_cancer_results = client.search("breast cancer", limit=10)

print("🎯 Search Results for 'breast cancer':")
print(f"   📄 Episodes found: {len(breast_cancer_results.get('episodes', []))}")
print(f"   🧠 Facts found: {len(breast_cancer_results.get('facts', []))}")

# Show sample episodes
episodes = breast_cancer_results.get('episodes', [])[:3]
if episodes:
    print("\n📖 Sample Episodes:")
    for i, episode in enumerate(episodes, 1):
        if isinstance(episode, dict):
            content = episode.get('content', '')[:100]
        else:
            content = str(episode)[:100]
        print(f"   {i}. {content}...")
else:
    print("   No episodes found yet - data may still be processing")

print("\n💡 Tip: Search results may take a few moments to appear as data is processed asynchronously")

🔍 Searching CORE Memory for breast cancer information...
🎯 Search Results for 'breast cancer':
   📄 Episodes found: 8
   🧠 Facts found: 26

📖 Sample Episodes:
   1. Katherine209 Reichert620 (ID: 8c0d6e0a-2cbe-436d-0cfc-2e645de0c71a) is a deceased Patient (mCODE: Pa...
   2. <h3>Summary of Patterns in Demo Space Episodes</h3>
<p>The episodes in this space reveal three main ...
   3. NCT01219907 is a clinical trial (mCODE: Trial) entitled 'Ex Vivo-Expanded HER2-Specific T Cells and ...

💡 Tip: Search results may take a few moments to appear as data is processed asynchronously


In [11]:
# 🔍 Search Clinical Trials Space

print("🔍 Searching clinical trials space specifically...")

# Search within the trials space
trials_search = client.search("clinical trial", space_id=trials_space_id, limit=5)

print("📋 Clinical Trials Space Search Results:")
print(f"   📄 Episodes: {len(trials_search.get('episodes', []))}")
print(f"   🧠 Facts: {len(trials_search.get('facts', []))}")

# Show trial-specific content
for episode in trials_search.get('episodes', [])[:2]:
    if isinstance(episode, dict):
        content = episode.get('content', '')[:150]
    else:
        content = str(episode)[:150]
    print(f"   📊 {content}...")

🔍 Searching clinical trials space specifically...
📋 Clinical Trials Space Search Results:
   📄 Episodes: 9
   🧠 Facts: 37
   📊 <h3>Summary of Patterns in Demo Space Episodes</h3>
<p>The episodes in this space reveal three main themes: (1) the use and structure of the mCODE Cor...
   📊 NCT00243165 is a clinical trial (mCODE: Trial) entitled 'Lifemel Honey to Reduce Leucopenia During Chemotherapy'. Trial study type (mCODE: TrialStudyT...


In [12]:
# 🔍 Search Patients Space

print("🔍 Searching patients space...")

# Search within the patients space
patients_search = client.search("patient", space_id=patients_space_id, limit=5)

print("👥 Patients Space Search Results:")
print(f"   📄 Episodes: {len(patients_search.get('episodes', []))}")
print(f"   🧠 Facts: {len(patients_search.get('facts', []))}")

# Show patient-specific content
for episode in patients_search.get('episodes', [])[:2]:
    if isinstance(episode, dict):
        content = episode.get('content', '')[:150]
    else:
        content = str(episode)[:150]
    print(f"   👤 {content}...")

🔍 Searching patients space...
👥 Patients Space Search Results:
   📄 Episodes: 1
   🧠 Facts: 1
   👤 <h3>Summary of Patterns in "Patients" Space Episodes</h3>
<p>The analysis of the available episode reveals three main themes supported by the content:...


## 📊 Results Summary

Review what we've accomplished and check our data files.

In [13]:
# 📊 Demo Results Summary

print("📊 Breast Cancer CORE Memory Demo - Results Summary")
print("=" * 60)

# Check generated files
files_to_check = [
    'raw_trials.ndjson',
    'raw_patients.ndjson',
    'mcode_trials.ndjson',
    'mcode_patients.ndjson'
]

print("📁 Generated Files:")
total_size = 0
for filename in files_to_check:
    file_path = Path(filename)
    if file_path.exists():
        size = file_path.stat().st_size
        total_size += size
        print(f"   ✅ {filename} ({size} bytes)")
    else:
        print(f"   ❌ {filename} (not found)")

print(f"\n💾 Total data processed: {total_size} bytes")

# Show CORE Memory spaces
print("\n🧠 CORE Memory Spaces:")
spaces = client.get_spaces()
for space in spaces:
    space_id = space.get('id')
    marker = "🎯" if space_id == trials_space_id else "👤" if space_id == patients_space_id else "📁"
    print(f"   {marker} {space.get('name')}")

# Show search statistics
total_episodes = len(breast_cancer_results.get('episodes', []))
total_facts = len(breast_cancer_results.get('facts', []))

print(f"\n🔍 Knowledge Base Status:")
print(f"   📄 Medical Episodes: {total_episodes}")
print(f"   🧠 Medical Facts: {total_facts}")
print(f"   🎯 Total Knowledge Items: {total_episodes + total_facts}")

print("\n🎉 Demo completed successfully!")
print("🏆 Breast cancer data is now stored and searchable in CORE Memory!")

📊 Breast Cancer CORE Memory Demo - Results Summary
📁 Generated Files:
   ✅ raw_trials.ndjson (23047 bytes)
   ✅ raw_patients.ndjson (1782097 bytes)
   ✅ mcode_trials.ndjson (49827 bytes)
   ✅ mcode_patients.ndjson (2442329 bytes)

💾 Total data processed: 4297300 bytes

🧠 CORE Memory Spaces:
   📁 Profile
   👤 Patients
   🎯 Clinical Trials

🔍 Knowledge Base Status:
   📄 Medical Episodes: 8
   🧠 Medical Facts: 26
   🎯 Total Knowledge Items: 34

🎉 Demo completed successfully!
🏆 Breast cancer data is now stored and searchable in CORE Memory!


## 🎯 Next Steps

Your breast cancer knowledge base is now ready! Here are some things you can do:

- 🔍 **Search**: Use different queries to explore the medical knowledge
- 📊 **Analyze**: Look at treatment patterns and patient outcomes
- 🔬 **Research**: Compare clinical trials and eligibility criteria
- 🤖 **AI Integration**: Use the stored knowledge for clinical decision support
- 📈 **Scale Up**: Process more trials and patients for comprehensive analysis

## 🏆 Achievements

✅ Downloaded real clinical trial data  
✅ Fetched realistic synthetic patient data  
✅ Converted to standardized mCODE format  
✅ Generated human-readable summaries  
✅ Stored everything in CORE Memory  
✅ Created searchable medical knowledge base  

**You're now ready to explore oncology data with AI! 🚀**