# 🔬 **mCODE CLI Deep Dive: Advanced Features & Optimization**

Welcome to the comprehensive guide to the mCODE Translation System CLI! This notebook explores all available commands, optimization techniques, and advanced features for clinical data processing.

## 🎯 **What You'll Learn**

This deep dive covers:
- ✅ Complete CLI command reference with examples
- ✅ AI model optimization and inter-rater reliability
- ✅ Batch processing and performance tuning
- ✅ Quality assurance and validation techniques
- ✅ Configuration management and customization
- ✅ Production deployment strategies

## 📋 **Prerequisites**

- Completed the [Quick Start Tutorial](mcode_quick_start.ipynb)
- Familiarity with basic CLI usage
- Understanding of clinical data concepts

---

## 🏗️ **System Architecture Overview**

Before diving into the CLI, let's understand the system components:

In [None]:
# Display the complete CLI help to see all available commands
print("🔧 Complete mCODE CLI Command Reference")
print("=" * 50)
!python mcode_translate.py --help

## 📊 **Section 1: Data Acquisition Commands**

The system supports multiple data sources and formats for comprehensive clinical data collection.

### 1.1 Clinical Trials Data (`fetch-trials`)

Fetch clinical trial data from ClinicalTrials.gov with advanced filtering options.

In [None]:
# Show detailed help for trial fetching
print("📋 Clinical Trials Fetch Command Options")
print("-" * 40)
!python mcode_translate.py fetch-trials --help

In [None]:
# Example: Advanced trial filtering
print("🔍 Advanced Trial Filtering Examples")
print("=" * 40)

# Fetch trials by specific condition with date filtering
print("Example 1: Lung cancer trials from 2023")
!python mcode_translate.py fetch-trials --condition "lung cancer" --limit 2 --out lung_cancer_trials.ndjson

# Fetch trials by NCT ID
print("\nExample 2: Specific trial by NCT ID")
!python mcode_translate.py fetch-trials --nct-id NCT12345678 --out specific_trial.ndjson

# Fetch trials by sponsor
print("\nExample 3: Trials sponsored by pharmaceutical companies")
!python mcode_translate.py fetch-trials --sponsor "Pfizer" --limit 2 --out pfizer_trials.ndjson

### 1.2 Patient Data (`fetch-patients`)

Download synthetic patient data from MITRE's oncology archives with clinical realism.

In [None]:
# Show patient data options
print("👥 Patient Data Archives Available")
print("-" * 35)
!python mcode_translate.py download-data --list

In [None]:
# Example: Different patient data archives
print("📊 Patient Data Archive Examples")
print("=" * 35)

# Fetch lung cancer patients with lifetime follow-up
print("Example 1: Lung cancer patients (lifetime follow-up)")
!python mcode_translate.py fetch-patients --archive lung_cancer_lifetime --limit 2 --out lung_patients.ndjson

# Fetch mixed cancer types (10-year horizon)
print("\nExample 2: Mixed cancer types (10-year follow-up)")
!python mcode_translate.py fetch-patients --archive mixed_cancer_10_years --limit 2 --out mixed_patients.ndjson

## 🤖 **Section 2: AI Processing Commands**

Transform unstructured clinical text into standardized mCODE elements using advanced AI models.

### 2.1 Trial Processing (`process-trials`)

Extract mCODE elements from clinical trial protocols using configurable AI models and prompts.

In [None]:
# Show processing options
print("⚙️ Trial Processing Configuration Options")
print("-" * 40)
!python mcode_translate.py process-trials --help

In [None]:
# Example: Different AI models and prompts
print("🔄 AI Model and Prompt Comparison")
print("=" * 35)

# First, fetch some trial data for testing
!python mcode_translate.py fetch-trials --condition "breast cancer" --limit 1 --out test_trial.ndjson

# Process with different configurations
print("\nExample 1: DeepSeek Coder with evidence-based prompt")
!python mcode_translate.py process-trials test_trial.ndjson --out trial_deepseek.ndjson --model deepseek-coder --prompt direct_mcode_evidence_based_concise

print("\nExample 2: DeepSeek Chat with comprehensive prompt")
!python mcode_translate.py process-trials test_trial.ndjson --out trial_chat.ndjson --model deepseek-chat --prompt direct_mcode_evidence_based

### 2.2 Patient Processing (`process-patients`)

Extract mCODE elements from patient records and link them with relevant clinical trials.

In [None]:
# Show patient processing options
print("🏥 Patient Processing with Trial Matching")
print("-" * 40)
!python mcode_translate.py process-patients --help

In [None]:
# Example: Patient processing with trial eligibility matching
print("🔗 Patient Processing with Trial Matching")
print("=" * 40)

# Fetch patient data
!python mcode_translate.py fetch-patients --archive breast_cancer_10_years --limit 1 --out test_patient.ndjson

# Process patients with trial matching
print("\nProcessing patient with trial eligibility matching...")
!python mcode_translate.py process-patients --in test_patient.ndjson --out patient_matched.ndjson --trials trial_deepseek.ndjson --model deepseek-coder --prompt direct_mcode_evidence_based_concise

## 📝 **Section 3: Summarization Commands**

Generate human-readable summaries from processed mCODE data with clinical context.

In [None]:
# Example: Generate comprehensive summaries
print("📖 Generating Clinical Summaries")
print("=" * 30)

# Summarize trials with different detail levels
print("Trial Summary (Standard Detail):")
!python mcode_translate.py summarize-trials --in trial_deepseek.ndjson --out trial_summary_standard.md

# Summarize patients
print("\nPatient Summary:")
!python mcode_translate.py summarize-patients --in patient_matched.ndjson --out patient_summary.md

## 🎯 **Section 4: AI Model Optimization**

The real power of the system: systematic comparison of AI models and prompts with statistical validation.

### 4.1 Available Models and Prompts

First, let's see what AI models and prompt templates are available for optimization.

In [None]:
# List available AI models
print("🤖 Available AI Models")
print("=" * 25)
!python mcode_translate.py optimize-trials --list-models

In [None]:
# List available prompt templates
print("\n📝 Available Prompt Templates")
print("=" * 30)
!python mcode_translate.py optimize-trials --list-prompts

### 4.2 Running Optimization

Compare multiple AI models and prompts using cross-validation to find the optimal configuration.

In [None]:
# Show optimization command options
print("🔬 Optimization Command Options")
print("=" * 30)
!python mcode_translate.py optimize-trials --help

In [None]:
# Example: Full optimization run
print("🧪 Running AI Model Optimization")
print("=" * 35)
print("This compares multiple models and prompts using cross-validation")
print("Expected runtime: 5-15 minutes depending on trial count")

# Fetch more trials for meaningful optimization
!python mcode_translate.py fetch-trials --condition "breast cancer" --limit 5 --out optimization_trials.ndjson

# Run optimization comparing two models and two prompts
!python mcode_translate.py optimize-trials --trials-file optimization_trials.ndjson --models deepseek-coder,deepseek-chat --prompts direct_mcode_evidence_based_concise,direct_mcode_evidence_based --cv-folds 3

### 4.3 Understanding Optimization Results

The optimization generates comprehensive reports and analysis.

In [None]:
# Examine optimization results
print("📊 Optimization Results Analysis")
print("=" * 35)

# List generated reports
print("Generated optimization reports:")
!ls -la optimization_runs/ | tail -5

# Show the mega optimization report
print("\n🏆 Mega Optimization Report (Top Recommendations):")
print("-" * 50)
!ls optimization_runs/mega_optimization_report_*.md | tail -1 | xargs head -30

## 🔍 **Section 5: Inter-Rater Reliability Analysis**

A key innovation: measuring how consistently different AI models extract the same mCODE elements.

In [None]:
# Inter-rater reliability is automatically calculated during optimization
print("📈 Inter-Rater Reliability Metrics")
print("=" * 35)
print("\nInter-rater reliability measures agreement between AI models on:")
print("• Presence of mCODE elements")
print("• Values of extracted elements")
print("• Overall consistency of extraction")
print("\nKey metrics:")
print("• Cohen's Kappa: Agreement beyond chance")
print("• Percentage Agreement: Raw agreement rate")
print("• Fleiss' Kappa: Multi-rater agreement")

# Show inter-rater reliability report if available
print("\n🔍 Inter-Rater Reliability Report:")
!ls optimization_runs/inter_rater_reliability_report_*.md 2>/dev/null | tail -1 | xargs cat || echo "Run optimization first to generate inter-rater reliability analysis"

## ⚡ **Section 6: Performance & Scaling**

Techniques for processing large datasets efficiently.

### 6.1 Concurrent Processing

Use multiple workers for faster processing of large datasets.

In [None]:
# Example: Batch processing with concurrency
print("⚡ High-Performance Batch Processing")
print("=" * 35)

# Process large trial dataset with multiple workers
print("Example: Processing breast cancer trial collection")
!python mcode_translate.py process-trials data/select_breast_cancer_trials.ndjson --out large_scale_trials.ndjson --model deepseek-coder --prompt direct_mcode_evidence_based_concise --concurrency 3

# Generate summaries with parallel workers
print("\nGenerating summaries with 4 workers:")
!python mcode_translate.py summarize-trials --in large_scale_trials.ndjson --out large_scale_summary.md --workers 4

### 6.2 Memory Management

CORE Memory integration for persistent context and research continuity.

In [None]:
# Example: Processing with CORE Memory storage
print("🧠 CORE Memory Integration")
print("=" * 25)
print("\nAll processing results are automatically stored in CORE Memory")
print("This enables:")
print("• Persistent research context across sessions")
print("• Incremental learning and improvement")
print("• Research continuity and collaboration")

# The --ingest flag enables CORE Memory storage
print("\nExample: Processing with memory storage enabled")
!python mcode_translate.py process-trials optimization_trials.ndjson --out memory_enabled_trials.ndjson --ingest --model deepseek-coder --prompt direct_mcode_evidence_based_concise

## 🧪 **Section 7: Quality Assurance & Testing**

Comprehensive testing and validation to ensure clinical accuracy and system reliability.

In [None]:
# Run the complete test suite
print("🧪 Running Complete Test Suite")
print("=" * 30)
print("This validates all system components and ensures reliability")

!python mcode_translate.py run-tests all

In [None]:
# Show test command options
print("\n📋 Test Command Options")
print("=" * 25)
!python mcode_translate.py run-tests --help

## 🔧 **Section 8: Configuration & Customization**

Advanced configuration options for production deployment.

In [None]:
# Show configuration file structure
print("⚙️ Configuration Files")
print("=" * 25)
print("\nThe system uses JSON configuration files in src/config/:")

!ls -la src/config/

print("\n📁 Key Configuration Areas:")
print("• llms_config.json - AI model API keys and settings")
print("• prompts_config.json - Available prompt templates")
print("• validation_config.json - Data quality rules")
print("• logging_config.json - Logging configuration")
print("• core_memory_config.json - Memory storage settings")

## 📚 **Section 9: Educational Resources**

Understanding mCODE elements and clinical data standards.

In [None]:
# mCODE Element Reference
print("📖 mCODE Element Types Reference")
print("=" * 35)

mcode_elements = {
    "CancerCondition": "Primary and secondary cancers, metastases",
    "CancerTreatment": "Chemotherapy, radiation, immunotherapy, surgery",
    "TNMStage": "Tumor staging (T1N0M0, Stage I, etc.)",
    "PatientDemographics": "Age, sex, ethnicity, vital status",
    "TumorMarker": "Biomarkers (HER2+, ER/PR status, etc.)",
    "CancerRelatedMedication": "Specific drug treatments",
    "Procedure": "Biopsies, surgeries, diagnostic procedures",
    "LaboratoryResult": "Blood tests, pathology results",
    "AdverseEvent": "Treatment side effects and complications",
    "DiseaseStatus": "Progression, remission, stable disease"
}

for element, description in mcode_elements.items():
    print(f"• **{element}**: {description}")

print("\n💡 Each element includes standardized codes (SNOMED CT, ICD-10, etc.)")
print("   for interoperability across healthcare systems.")

## 🚀 **Section 10: Production Deployment**

Strategies for deploying the mCODE system in clinical and research environments.

In [None]:
# Production deployment checklist
print("🏭 Production Deployment Checklist")
print("=" * 35)

checklist = [
    "✅ Configure API keys for AI models (src/config/llms_config.json)",
    "✅ Set up CORE Memory storage for persistent context",
    "✅ Test with representative clinical datasets",
    "✅ Validate mCODE output against clinical standards",
    "✅ Configure concurrent processing for large datasets",
    "✅ Set up automated quality assurance checks",
    "✅ Establish monitoring and logging for production use",
    "✅ Document workflows and create training materials",
    "✅ Plan for regular model updates and prompt optimization",
    "✅ Establish data governance and privacy compliance"
]

for item in checklist:
    print(item)

print("\n🎯 Production Success Metrics:")
print("• >95% mCODE extraction accuracy")
print("• <5% error rate in processing")
print("• <10 minute processing time per 100 trials")
print("• >90% inter-rater reliability agreement")

## 🎉 **CLI Deep Dive Complete!**

Congratulations! You've explored the complete mCODE Translation System CLI. Here's what you now know:

### ✅ **Mastered Commands:**
- **Data Acquisition**: `fetch-trials`, `fetch-patients`, `download-data`
- **AI Processing**: `process-trials`, `process-patients`
- **Analysis**: `summarize-trials`, `summarize-patients`, `optimize-trials`
- **Quality Assurance**: `run-tests`

### ✅ **Advanced Features:**
- **Inter-rater reliability** analysis for AI model validation
- **Cross-validation** optimization for robust evaluation
- **Concurrent processing** for large-scale data handling
- **CORE Memory integration** for persistent research context
- **Comprehensive configuration** management

### ✅ **Production Capabilities:**
- **Scalable processing** with multiple workers
- **Quality assurance** through automated testing
- **Clinical standards** compliance (mCODE, SNOMED CT)
- **Research workflows** with optimization and analysis

### 🚀 **Next Steps:**

**For Clinical Implementation:**
- Deploy in healthcare environments with proper security
- Integrate with EHR systems and clinical workflows
- Establish clinical validation protocols
- Train clinical staff on mCODE concepts

**For Research Applications:**
- Process large clinical trial databases
- Enable cross-institutional research collaboration
- Develop AI models for specific cancer types
- Contribute to the advancement of oncology informatics

### 💡 **Key Insights:**
- **AI Reliability**: Modern LLMs can reliably extract structured clinical data
- **Interoperability**: mCODE enables seamless data exchange across systems
- **Quality Matters**: Inter-rater reliability ensures clinical trustworthiness
- **Scale Enables Impact**: Large-scale processing powers real research insights

**The mCODE Translation System is ready to transform oncology research and clinical care! 🚀**