# 🚀 **mCODE Translation: Quick Start Tutorial**

Welcome to the mCODE (minimal Common Oncology Data Elements) Translation System! This interactive tutorial will get you up and running with clinical data processing in just 15 minutes.

## 🎯 **What You'll Learn**

By the end of this tutorial, you'll be able to:
- ✅ Fetch clinical trial data from public APIs
- ✅ Process patient data into standardized mCODE format
- ✅ Generate human-readable summaries
- ✅ Understand the complete clinical data workflow

## 📋 **Prerequisites**

- Python 3.11+ installed
- Basic familiarity with command line
- ClinicalTrials.gov API access (no key required)

---

## 🏁 **Step 1: Setup & Verification**

First, let's verify that our mCODE translation system is properly installed and all components are working.

In [None]:
# Check that we're in the right directory and the system is ready
import os
print(f"Current directory: {os.getcwd()}")
print(f"Python version: {os.sys.version}")

# Verify the main CLI script exists
import pathlib
cli_path = pathlib.Path("mcode_translate.py")
print(f"CLI script exists: {cli_path.exists()}")

# Show available commands
print("\n🚀 Available CLI commands:")
print("- fetch-trials: Download clinical trial data")
print("- fetch-patients: Download synthetic patient data")
print("- process-trials: Extract mCODE elements from trials")
print("- process-patients: Extract mCODE elements from patients")
print("- summarize-trials: Generate trial summaries")
print("- summarize-patients: Generate patient summaries")
print("- optimize-trials: Compare AI models and prompts")
print("- run-tests: Execute test suite")

## 📊 **Step 2: Fetch Clinical Trial Data**

Clinical trials contain detailed information about cancer treatments, eligibility criteria, and study designs. We'll start by fetching some breast cancer trials.

### What happens in this step:
- Searches ClinicalTrials.gov for breast cancer trials
- Downloads trial metadata and protocols
- Saves data in NDJSON format for efficient processing

In [None]:
# Configuration for our tutorial
NUM_TRIALS = 3  # Start small for quick processing
CONDITION = "breast cancer"  # Focus on breast cancer trials

print(f"🎯 Fetching {NUM_TRIALS} {CONDITION} trials...")
print("This may take 10-30 seconds depending on API response time.")

In [None]:
# Fetch clinical trials
# This command searches ClinicalTrials.gov and downloads trial data
!python mcode_translate.py fetch-trials --condition "{CONDITION}" --limit {NUM_TRIALS} --out raw_trials.ndjson

In [None]:
# Verify the data was downloaded
import json

print("📊 Checking downloaded trial data...")
try:
    with open('raw_trials.ndjson', 'r') as f:
        lines = f.readlines()
        print(f"✅ Downloaded {len(lines)} trials")
        
        # Show a sample trial
        if lines:
            sample_trial = json.loads(lines[0])
            trial_id = sample_trial.get('protocolSection', {}).get('identificationModule', {}).get('nctId', 'Unknown')
            title = sample_trial.get('protocolSection', {}).get('identificationModule', {}).get('briefTitle', 'No title')
            print(f"\n📋 Sample Trial:")
            print(f"   NCT ID: {trial_id}")
            print(f"   Title: {title[:80]}...")
            
except FileNotFoundError:
    print("❌ Trial data file not found. Check the fetch command above.")
except Exception as e:
    print(f"❌ Error reading trial data: {e}")

## 👥 **Step 3: Fetch Patient Data**

Patient data provides real-world context for treatment patterns. We'll use synthetic patient data that maintains clinical realism while protecting privacy.

### What happens in this step:
- Downloads synthetic patient records from MITRE's oncology data archives
- Filters for breast cancer patients with 10-year follow-up
- Provides realistic clinical scenarios for testing

In [None]:
# Configuration for patient data
NUM_PATIENTS = 3  # Small sample for quick processing
PATIENT_ARCHIVE = "breast_cancer_10_years"  # 10-year follow-up data

print(f"👥 Fetching {NUM_PATIENTS} patients from {PATIENT_ARCHIVE} archive...")
print("This downloads synthetic but clinically realistic patient data.")

In [None]:
# Fetch synthetic patient data
# This downloads patient records from MITRE's oncology archives
!python mcode_translate.py fetch-patients --archive {PATIENT_ARCHIVE} --limit {NUM_PATIENTS} --out raw_patients.ndjson

In [None]:
# Verify patient data was downloaded
print("👤 Checking downloaded patient data...")
try:
    with open('raw_patients.ndjson', 'r') as f:
        lines = f.readlines()
        print(f"✅ Downloaded {len(lines)} patient records")
        
        # Show sample patient info
        if lines:
            sample_patient = json.loads(lines[0])
            # Extract patient info from FHIR Bundle
            patient_resource = None            conditions = []                        for entry in sample_patient.get('entry', []):                resource = entry.get('resource', {})                resource_type = resource.get('resourceType')                                if resource_type == 'Patient':                    patient_resource = resource                elif resource_type == 'Condition':                    condition_text = resource.get('code', {}).get('text', 'Unknown condition')                    conditions.append(condition_text)                        # Get patient ID and name            patient_id = patient_resource.get('id', 'Unknown') if patient_resource else 'Unknown'            patient_name = 'Unknown'            if patient_resource and 'name' in patient_resource:                name_data = patient_resource['name'][0] if patient_resource['name'] else {}                given = name_data.get('given', [])                family = name_data.get('family', '')                given_str = ' '.join(given) if given else ''                patient_name = f"{given_str} {family}".strip()                        print(f"\n👤 Sample Patient:")            print(f"   ID: {patient_id}")            print(f"   Name: {patient_name}")            print(f"   Conditions: {", ".join(conditions[:3])}{'...' if len(conditions) > 3 else ''}")            
except FileNotFoundError:
    print("❌ Patient data file not found. Check the fetch command above.")
except Exception as e:
    print(f"❌ Error reading patient data: {e}")

## 🤖 **Step 4: AI Processing - Trials to mCODE**

Now for the exciting part! We'll use advanced AI models to extract structured mCODE elements from the unstructured clinical trial text.

### What happens in this step:
- AI analyzes trial protocols and eligibility criteria
- Extracts standardized mCODE elements (conditions, treatments, demographics)
- Validates extracted information for clinical accuracy
- Stores results in CORE Memory for future reference

In [None]:
# Configuration for AI processing
MODEL = "deepseek-coder"  # Our recommended model for structured data
PROMPT = "direct_mcode_evidence_based_concise"  # Optimized prompt template

print(f"🤖 Processing trials with {MODEL} model...")
print(f"📝 Using prompt: {PROMPT}")
print("This extracts mCODE elements from clinical trial text.")
print("Processing time: ~30-60 seconds per trial")

In [None]:
# Process trial data into mCODE format
# This is where the AI magic happens!
!python mcode_translate.py process-trials raw_trials.ndjson --out mcode_trials.ndjson --ingest --model {MODEL} --prompt {PROMPT}

In [None]:
# Examine the processed mCODE trial data
print("🔬 Analyzing processed trial data...")
try:
    with open('mcode_trials.ndjson', 'r') as f:
        lines = f.readlines()
        print(f"✅ Processed {len(lines)} trials into mCODE format")
        
        if lines:
            sample_mcode = json.loads(lines[0])
            mcode_elements = sample_mcode.get('mcode_elements', {}).get('mcode_mappings', [])
            print(f"\n📊 Sample Trial mCODE Elements:")
            for i, element in enumerate(mcode_elements[:5]):  # Show first 5 elements
                element_type = element.get('element_type', 'Unknown')
                code = element.get('code', 'No code')
                print(f"   {i+1}. {element_type}: {code}")
            if len(mcode_elements) > 5:
                print(f"   ... and {len(mcode_elements) - 5} more elements")
                
except FileNotFoundError:
    print("❌ mCODE trial data not found. Check the processing command above.")
except Exception as e:
    print(f"❌ Error reading mCODE trial data: {e}")

## 🏥 **Step 5: AI Processing - Patients to mCODE**

Now we'll process the patient data, linking it with the trial information for comprehensive clinical context.

### What happens in this step:
- AI analyzes patient records and treatment histories
- Extracts mCODE elements (diagnoses, treatments, outcomes)
- Links patient data with relevant clinical trials
- Creates comprehensive patient profiles

In [None]:
print("🏥 Processing patient data into mCODE format...")
print("This links patient records with clinical trial eligibility.")
print("Processing time: ~20-40 seconds per patient")

In [None]:
# Process patient data into mCODE format
# Links patients with relevant clinical trials
!python mcode_translate.py process-patients --in raw_patients.ndjson --out mcode_patients.ndjson --trials mcode_trials.ndjson --ingest --model {MODEL} --prompt {PROMPT}

In [None]:
# Examine the processed mCODE patient data
print("👤 Analyzing processed patient data...")
try:
    with open('mcode_patients.ndjson', 'r') as f:
        lines = f.readlines()
        print(f"✅ Processed {len(lines)} patients into mCODE format")
        
        if lines:
            sample_patient = json.loads(lines[0])
            mcode_elements = sample_patient.get('mcode_elements', [])
            print(f"\n📊 Sample Patient mCODE Elements:")
            for i, element in enumerate(mcode_elements[:5]):  # Show first 5 elements
                element_type = element.get('element_type', 'Unknown')
                code = element.get('code', 'No code')
                print(f"   {i+1}. {element_type}: {code}")
            if len(mcode_elements) > 5:
                print(f"   ... and {len(mcode_elements) - 5} more elements")
                
except FileNotFoundError:
    print("❌ mCODE patient data not found. Check the processing command above.")
except Exception as e:
    print(f"❌ Error reading mCODE patient data: {e}")

## 📝 **Step 6: Generate Summaries**

Finally, let's create human-readable summaries of our processed data. This transforms the structured mCODE elements into natural language descriptions.

### What happens in this step:
- Converts mCODE elements into readable sentences
- Groups related information by clinical priority
- Creates comprehensive summaries for research and clinical use

In [None]:
print("📝 Generating human-readable summaries...")
print("This converts mCODE elements into natural language descriptions.")
print("Processing time: ~10-20 seconds per summary")

In [None]:
# Generate trial summaries
!python mcode_translate.py summarize-trials --in mcode_trials.ndjson --out trials_summary.md --ingest --workers 2

In [None]:
# Generate patient summaries
!python mcode_translate.py summarize-patients --in mcode_patients.ndjson --out patients_summary.md --ingest --workers 2

In [None]:
# Display the generated summaries
print("📖 Displaying generated summaries...")

# Show trial summary
print("\n" + "="*60)
print("📋 CLINICAL TRIALS SUMMARY")
print("="*60)
try:
    with open('trials_summary.md', 'r') as f:
        content = f.read()
        # Show first 1000 characters to avoid overwhelming output
        print(content[:1000] + "..." if len(content) > 1000 else content)
except FileNotFoundError:
    print("❌ Trial summary not found")

# Show patient summary
print("\n" + "="*60)
print("👥 PATIENTS SUMMARY")
print("="*60)
try:
    with open('patients_summary.md', 'r') as f:
        content = f.read()
        # Show first 1000 characters
        print(content[:1000] + "..." if len(content) > 1000 else content)
except FileNotFoundError:
    print("❌ Patient summary not found")

## 🎉 **Tutorial Complete!**

Congratulations! You've successfully completed the mCODE translation workflow. Here's what you accomplished:

### ✅ **What You Did:**
1. **Fetched** clinical trial data from public APIs
2. **Downloaded** synthetic patient data with clinical realism
3. **Processed** trial data into standardized mCODE elements using AI
4. **Processed** patient data and linked it with relevant trials
5. **Generated** human-readable summaries of all processed data

### 📊 **Files Created:**
- `raw_trials.ndjson` - Original clinical trial data
- `raw_patients.ndjson` - Original patient data
- `mcode_trials.ndjson` - Trials converted to mCODE format
- `mcode_patients.ndjson` - Patients converted to mCODE format
- `trials_summary.md` - Human-readable trial summaries
- `patients_summary.md` - Human-readable patient summaries

### 🚀 **Next Steps:**

**For Advanced Features:** Check out `mcode_cli_deep_dive.ipynb` for:
- Model optimization and inter-rater reliability analysis
- Batch processing for large datasets
- Advanced configuration and customization
- Quality assurance and validation techniques

**For Production Use:**
- Scale up the number of trials/patients
- Use different AI models and prompts
- Integrate with your clinical workflows
- Set up automated processing pipelines

### 💡 **Key Takeaways:**
- mCODE provides standardized clinical data elements
- AI can reliably extract structured information from unstructured text
- The system maintains clinical accuracy while enabling interoperability
- Results are stored in CORE Memory for persistent research context

**Happy mCODE translating! 🚀**