# 🚀 **mCODE Translation: Quick Start Tutorial**

Welcome to the mCODE (minimal Common Oncology Data Elements) Translation System! This interactive tutorial will get you up and running with clinical data processing in just 15 minutes.

## 🎯 **What You'll Learn**

By the end of this tutorial, you'll be able to:

- ✅ Fetch clinical trial data from public APIs
- ✅ Process patient data into standardized mCODE format
- ✅ Generate human-readable summaries
- ✅ Understand the complete clinical data workflow

## 📋 **Prerequisites**

- Python 3.11+ installed
- Basic familiarity with command line
- ClinicalTrials.gov API access (no key required)

---

## 🏁 **Step 1: Setup & Verification**

In [None]:
# Check that we're in the right directory and the system is ready
!pwd && python --version

In [None]:
# Verify the main CLI script exists
!ls -la mcode_translate.py

In [None]:
# Show available commands
!python mcode_translate.py --help | head -15

## 📊 **Step 2: Fetch Clinical Trial Data**

Clinical trials contain detailed information about cancer treatments, eligibility criteria, and study designs. We'll start by fetching some breast cancer trials.

In [None]:
# Configuration for our tutorial
NUM_TRIALS=3  # Start small for quick processing
CONDITION="breast cancer"  # Focus on breast cancer trials

!echo "🎯 Fetching $NUM_TRIALS $CONDITION trials..."

In [None]:
# Fetch clinical trials
!python mcode_translate.py fetch-trials --condition "$CONDITION" --limit $NUM_TRIALS --out raw_trials.ndjson

In [None]:
# Verify the data was downloaded
!echo "📊 Downloaded $(wc -l < raw_trials.ndjson) trials" && echo "Sample trial:" && head -1 raw_trials.ndjson | jq -r '.protocolSection.identificationModule | "NCT ID: \(.nctId)", "Title: \(.briefTitle | .[0:60])..."'

## 👥 **Step 3: Fetch Patient Data**

Patient data provides real-world context for treatment patterns. We'll use synthetic patient data that maintains clinical realism while protecting privacy.

In [None]:
# Configuration for patient data
NUM_PATIENTS=3  # Small sample for quick processing
PATIENT_ARCHIVE="breast_cancer_10_years"  # 10-year follow-up data

!echo "👥 Fetching $NUM_PATIENTS patients from $PATIENT_ARCHIVE archive..."

In [None]:
# Fetch synthetic patient data
!python mcode_translate.py fetch-patients --archive $PATIENT_ARCHIVE --limit $NUM_PATIENTS --out raw_patients.ndjson

In [None]:
# Verify patient data was downloaded
!echo "👤 Downloaded $(wc -l < raw_patients.ndjson) patient records" && echo "Sample patient:" && head -1 raw_patients.ndjson | jq -r '.entry[0].resource | "ID: \(.id)", "Name: \(.name[0].given[0]) \(.name[0].family)"'

## 🤖 **Step 4: AI Processing - Trials to mCODE**

Now for the exciting part! We'll use advanced AI models to extract structured mCODE elements from the unstructured clinical trial text.

In [None]:
# Configuration for AI processing
MODEL="deepseek-coder"  # Our recommended model for structured data
PROMPT="direct_mcode_evidence_based_concise"  # Optimized prompt template

!echo "🤖 Processing trials with $MODEL model..."
!echo "📝 Using prompt: $PROMPT"

In [None]:
# Process trial data into mCODE format
!python mcode_translate.py process-trials raw_trials.ndjson --out mcode_trials.ndjson --ingest --model $MODEL --prompt $PROMPT

In [None]:
# Examine the processed mCODE trial data
!echo "🔬 Processed $(wc -l < mcode_trials.ndjson) trials into mCODE format" && echo "Sample mCODE elements:" && head -1 mcode_trials.ndjson | jq -r '.mcode_elements.mcode_mappings[0:3][] | "\(.element_type): \(.code)"'

## 🏥 **Step 5: AI Processing - Patients to mCODE**

Now we'll process the patient data, linking it with the trial information for comprehensive clinical context.

In [None]:
# Process patient data into mCODE format
!python mcode_translate.py process-patients --in raw_patients.ndjson --out mcode_patients.ndjson --trials mcode_trials.ndjson --ingest --model $MODEL --prompt $PROMPT

In [None]:
# Examine the processed mCODE patient data
!echo "👤 Processed $(wc -l < mcode_patients.ndjson) patients into mCODE format" && echo "Sample mCODE elements:" && head -1 mcode_patients.ndjson | jq -r '.mcode_elements[0:3][] | "\(.element_type): \(.code)"'

## 📝 **Step 6: Generate Summaries**

Finally, let's create human-readable summaries of our processed data. This transforms the structured mCODE elements into natural language descriptions.

In [None]:
# Generate trial summaries
!python mcode_translate.py summarize-trials --in mcode_trials.ndjson --out trials_summary.md --ingest --workers 2

In [None]:
# Generate patient summaries
!python mcode_translate.py summarize-patients --in mcode_patients.ndjson --out patients_summary.md --ingest --workers 2

In [None]:
# Display the generated summaries
!echo "📖 TRIALS SUMMARY:" && echo "================" && head -20 trials_summary.md && echo -e "\n👥 PATIENTS SUMMARY:" && echo "==================" && head -20 patients_summary.md

## 🎉 **Tutorial Complete!**

Congratulations! You've successfully completed the mCODE translation workflow.

### ✅ **What You Did:**
1. **Fetched** clinical trial data from public APIs
2. **Downloaded** synthetic patient data with clinical realism
3. **Processed** trial data into standardized mCODE elements using AI
4. **Processed** patient data and linked it with relevant trials
5. **Generated** human-readable summaries of all processed data

### 📊 **Files Created:**
- `raw_trials.ndjson` - Original clinical trial data
- `raw_patients.ndjson` - Original patient data
- `mcode_trials.ndjson` - Trials converted to mCODE format
- `mcode_patients.ndjson` - Patients converted to mCODE format
- `trials_summary.md` - Human-readable trial summaries
- `patients_summary.md` - Human-readable patient summaries

### 🚀 **Next Steps:**

**For Advanced Features:** Check out `mcode_cli_deep_dive.ipynb` for:
- Model optimization and inter-rater reliability analysis
- Batch processing for large datasets
- Advanced configuration and customization
- Quality assurance and validation techniques

**For Production Use:**
- Scale up the number of trials/patients
- Use different AI models and prompts
- Integrate with your clinical workflows
- Set up automated processing pipelines

### 💡 **Key Takeaways:**
- mCODE provides standardized clinical data elements
- AI can reliably extract structured information from unstructured text
- The system maintains clinical accuracy while enabling interoperability
- Results are stored in CORE Memory for persistent research context

**Happy mCODE translating! 🚀**