# Medical Reasoning Dataset Generator (MedRGen) - Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/your-username/MedRGen/blob/main/MedRGen_Google_Colab.ipynb)

## 🎯 Project Overview

MedRGen is a professional synthetic medical reasoning dataset generator that creates high-quality medical cases through simulated doctor-patient interactions. This notebook provides a complete setup and usage guide for running MedRGen on Google Colab.

### Key Features:
- 🏥 **Multi-Specialty Support**: Internal medicine, cardiology, emergency medicine, family medicine
- 👥 **Realistic Demographics**: Diverse patient profiles with appropriate medical histories  
- 💬 **Natural Conversations**: Multi-turn doctor-patient dialogues following clinical patterns
- 🧠 **Evidence-Based Reasoning**: Medical expert reasoning using current clinical guidelines
- 📊 **Quality Assurance**: Dual dataset generation with LLM evaluation

---

## ⚠️ Important Requirements

Before running this notebook, you'll need:
1. **OpenAI API Key** with access to GPT-4 and O1-preview models
2. **Google Colab Pro** (recommended) for better performance and longer runtimes
3. **External Storage** (Google Drive) for storing generated datasets

---


## 🚀 Step 1: Environment Setup and Installation

Let's start by setting up the environment and installing all required dependencies.


In [None]:
# Check Python version and system info
import sys
import platform
print(f"Python version: {sys.version}")
print(f"Platform: {platform.platform()}")
print(f"Architecture: {platform.architecture()}")

# Check if we're running on Colab
try:
    import google.colab
    print("✅ Running on Google Colab")
    IN_COLAB = True
except ImportError:
    print("❌ Not running on Google Colab")
    IN_COLAB = False


In [None]:
# Mount Google Drive for external storage (recommended for large datasets)
if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')
    
    # Create project directory in Google Drive
    import os
    project_path = "/content/drive/MyDrive/MedRGen_Projects"
    os.makedirs(project_path, exist_ok=True)
    print(f"✅ Project directory created: {project_path}")
else:
    project_path = "/content/MedRGen_Projects"
    os.makedirs(project_path, exist_ok=True)
    print(f"✅ Project directory created: {project_path}")

# Change to project directory
os.chdir(project_path)
print(f"📁 Current working directory: {os.getcwd()}")


In [None]:
# Clone the MedRGen repository
!git clone https://github.com/your-username/MedRGen.git
os.chdir("MedRGen")
print(f"📁 Changed to MedRGen directory: {os.getcwd()}")

# List directory contents to verify clone
print("\n📂 Repository contents:")
!ls -la


In [None]:
# Install required dependencies
print("📦 Installing dependencies...")
%pip install openai>=1.0.0 pydantic>=2.0.0 PyYAML>=6.0.0 pandas>=2.0.0 numpy>=1.24.0 tqdm>=4.65.0 python-dotenv>=1.0.0 jsonschema>=4.17.0 pytest>=7.0.0 pytest-cov>=4.0.0 typing-extensions>=4.5.0 requests>=2.28.0 click>=8.0.0 rich>=13.0.0

print("\n✅ Dependencies installed successfully!")


## 🔑 Step 2: API Key Configuration

**Important**: You'll need an OpenAI API key with access to GPT-4 and O1-preview models. Never share your API keys publicly!


In [None]:
# Configure OpenAI API Key
import os
from getpass import getpass

# Option 1: Secure input (recommended for Colab)
if IN_COLAB:
    openai_api_key = getpass("🔑 Enter your OpenAI API Key: ")
else:
    # Option 2: Environment variable (for local development)
    openai_api_key = os.getenv("OPENAI_API_KEY")
    if not openai_api_key:
        openai_api_key = getpass("🔑 Enter your OpenAI API Key: ")

# Set environment variables
os.environ["OPENAI_API_KEY"] = openai_api_key
os.environ["OPENAI_MEDICAL_MODEL"] = "gpt-4-turbo"
os.environ["OPENAI_REASONING_MODEL"] = "o1-preview"

print("✅ API key configured successfully!")
print("🔧 Models configured: GPT-4-turbo (medical), O1-preview (reasoning)")


In [None]:
# Create .env file for the project
env_content = f"""# OpenAI Configuration
OPENAI_API_KEY={openai_api_key}
OPENAI_MEDICAL_MODEL=gpt-4-turbo
OPENAI_REASONING_MODEL=o1-preview

# Generation Settings
MAX_CASES_PER_BATCH=10
CONCURRENT_REQUESTS=3
RATE_LIMIT_DELAY=1

# Dataset Configuration
OUTPUT_FORMAT=json
INCLUDE_METADATA=true
VALIDATE_MEDICAL_ACCURACY=true

# Logging Configuration
LOG_LEVEL=INFO
LOG_FILE=logs/medical_dataset_generation.log

# Commercial Configuration
GENERATE_DUAL_DATASETS=true
RAW_DATASET_PATH=data/output/raw_dataset/
EDITED_DATASET_PATH=data/output/edited_dataset/

# Quality Assurance Settings
MEDICAL_VALIDATION_ENABLED=true
REASONING_EVALUATION_ENABLED=true
CLINICAL_GUIDELINES_CHECK=true

# Performance Settings
BATCH_SIZE=10
MAX_CONCURRENT_GENERATIONS=3
MEMORY_OPTIMIZATION=true
"""

with open(".env", "w") as f:
    f.write(env_content)

print("✅ Environment configuration file created!")


## 📁 Step 3: Project Structure Setup

Let's create the necessary directories and verify the project structure.


In [None]:
# Create necessary directories
import os
from pathlib import Path

# Create output directories
directories = [
    "data/output/raw_dataset",
    "data/output/edited_dataset", 
    "logs",
    "data/ready4sale",
    "data/templates"
]

for directory in directories:
    Path(directory).mkdir(parents=True, exist_ok=True)
    print(f"📁 Created directory: {directory}")

print("\n📂 Project structure:")
!find . -type d -name ".*" -prune -o -type d -print | head -20


In [None]:
# Setup Python path for imports
import sys
sys.path.insert(0, 'src')

# Test imports to verify setup
try:
    from main import MedicalDatasetGenerator
    print("✅ MedicalDatasetGenerator imported successfully")
    
    from core.medical_expert_generator import MedicalExpertGenerator
    from core.patient_generator import PatientGenerator
    from core.doctor_patient_conversation_generator import DoctorPatientConversationGenerator
    print("✅ Core generators imported successfully")
    
    from models.doctor import Doctor
    from models.patient import Patient
    from models.medical_case import MedicalCase
    print("✅ Data models imported successfully")
    
except ImportError as e:
    print(f"❌ Import error: {e}")
    print("Please check that all files are present and dependencies are installed")


## 🧪 Step 4: Generate Your First Medical Case

Now let's generate a single medical case to test the system.


In [None]:
# Initialize the Medical Dataset Generator
from main import MedicalDatasetGenerator
import asyncio
import json
from datetime import datetime

# Create generator instance
generator = MedicalDatasetGenerator(
    openai_api_key=openai_api_key,
    medical_model="gpt-4-turbo",
    reasoning_model="o1-preview",
    output_path="data/output",
    log_level="INFO"
)

print("✅ MedicalDatasetGenerator initialized successfully!")
print(f"📊 Output path: {generator.output_path}")
print(f"📝 Raw dataset path: {generator.raw_dataset_path}")
print(f"✨ Edited dataset path: {generator.edited_dataset_path}")


In [None]:
# Generate a single medical case
async def generate_test_case():
    """Generate a test medical case to verify the system works."""
    print("🏥 Generating a cardiology case...")
    
    # Generate case
    medical_case = await generator.generate_single_case(
        specialty="cardiology",
        complexity="moderate",
        symptom_theme="cardiovascular",
        case_id=f"test_case_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
    )
    
    return medical_case

# Run the async function
print("🚀 Starting medical case generation...")
print("⏳ This may take 2-5 minutes depending on API response times...")

# For Colab compatibility
try:
    import nest_asyncio
    nest_asyncio.apply()
    test_case = asyncio.run(generate_test_case())
    print("✅ Medical case generated successfully!")
except Exception as e:
    print(f"❌ Error generating case: {e}")
    print("💡 This might be due to API limits or network issues. Please try again.")


In [None]:
# Display the generated medical case
if 'test_case' in locals():
    print("📋 Generated Medical Case Summary:")
    print("=" * 50)
    print(f"🆔 Case ID: {test_case.case_id}")
    print(f"🏥 Specialty: {test_case.specialty}")
    print(f"📊 Complexity: {test_case.complexity}")
    print(f"👤 Patient Age: {test_case.patient.profile.age}")
    print(f"⚧ Patient Gender: {test_case.patient.profile.gender}")
    print(f"🩺 Chief Complaint: {test_case.patient.chief_complaint}")
    print(f"💬 Conversation Turns: {len(test_case.conversation.turns) if test_case.conversation else 'N/A'}")
    print(f"🧠 Diagnostic Reasoning Length: {len(test_case.diagnostic_reasoning) if test_case.diagnostic_reasoning else 'N/A'} characters")
    
    # Show first few conversation turns
    if test_case.conversation and test_case.conversation.turns:
        print("\n💭 First 3 conversation turns:")
        for i, turn in enumerate(test_case.conversation.turns[:3]):
            speaker = "👨‍⚕️ Doctor" if turn.speaker == "doctor" else "👤 Patient"
            print(f"{speaker}: {turn.message[:100]}...")
            if i >= 2:  # Limit to first 3 turns
                break
    
    print("\n✅ Case generation completed successfully!")
else:
    print("❌ No test case was generated. Please run the previous cell again.")


## 🚀 Step 5: Batch Generation

Now let's generate multiple medical cases efficiently using batch processing.


In [None]:
# Configuration for batch generation
BATCH_CONFIG = {
    "small_batch": {"count": 5, "description": "Quick test batch"},
    "medium_batch": {"count": 25, "description": "Development batch"}, 
    "large_batch": {"count": 100, "description": "Production batch (requires Colab Pro)"}
}

# Select batch size (start with small for testing)
selected_batch = "small_batch"  # Change to "medium_batch" or "large_batch" as needed

batch_count = BATCH_CONFIG[selected_batch]["count"]
batch_description = BATCH_CONFIG[selected_batch]["description"]

print(f"📊 Selected batch configuration: {selected_batch}")
print(f"📝 Description: {batch_description}")
print(f"🔢 Cases to generate: {batch_count}")
print(f"⏱️ Estimated time: {batch_count * 2}-{batch_count * 5} minutes")

# Configuration options
specialties = ["cardiology", "internal_medicine", "emergency_medicine", "family_medicine"]
complexities = ["simple", "moderate", "complex"]
symptom_themes = ["cardiovascular", "respiratory", "gastrointestinal", "neurological"]

print(f"\n🏥 Available specialties: {', '.join(specialties)}")
print(f"📊 Available complexities: {', '.join(complexities)}")
print(f"🎯 Available themes: {', '.join(symptom_themes)}")


In [None]:
# Run batch generation
import random
from tqdm.notebook import tqdm

async def generate_batch_cases(count, include_evaluation=False):
    """Generate a batch of medical cases with progress tracking."""
    generated_cases = []
    
    # Create progress bar
    progress_bar = tqdm(total=count, desc="Generating cases")
    
    for i in range(count):
        try:
            # Randomly select parameters for variety
            specialty = random.choice(specialties)
            complexity = random.choice(complexities)
            theme = random.choice(symptom_themes)
            
            case_id = f"batch_case_{datetime.now().strftime('%Y%m%d')}_{i+1:03d}"
            
            # Update progress
            progress_bar.set_description(f"Generating {specialty} case {i+1}/{count}")
            
            # Generate case
            case = await generator.generate_single_case(
                specialty=specialty,
                complexity=complexity,
                symptom_theme=theme,
                case_id=case_id
            )
            
            generated_cases.append(case)
            progress_bar.update(1)
            
            # Save case immediately (in case of interruption)
            if case:
                await generator.save_case(case, include_evaluation=include_evaluation)
                
        except Exception as e:
            print(f"❌ Error generating case {i+1}: {e}")
            progress_bar.update(1)
            continue
    
    progress_bar.close()
    return generated_cases

# Run batch generation
print(f"🚀 Starting batch generation of {batch_count} cases...")
print("💡 Tip: This process saves cases as they're generated, so you can interrupt and resume.")

try:
    batch_cases = asyncio.run(generate_batch_cases(batch_count, include_evaluation=False))
    print(f"✅ Batch generation completed! Generated {len(batch_cases)} cases.")
except Exception as e:
    print(f"❌ Batch generation error: {e}")
    print("💡 You can resume by running this cell again - already generated cases are saved.")


## 📊 Step 6: Data Analysis and Quality Evaluation

Let's analyze the generated datasets and evaluate their quality.


In [None]:
# Load and analyze generated datasets
import pandas as pd
import json
import glob
from collections import Counter
import matplotlib.pyplot as plt
import seaborn as sns

def load_generated_cases(dataset_path="data/output/raw_dataset"):
    """Load all generated medical cases from JSON files."""
    cases = []
    json_files = glob.glob(f"{dataset_path}/*.json")
    
    print(f"📁 Found {len(json_files)} case files in {dataset_path}")
    
    for file_path in json_files:
        try:
            with open(file_path, 'r', encoding='utf-8') as f:
                case_data = json.load(f)
                cases.append(case_data)
        except Exception as e:
            print(f"❌ Error loading {file_path}: {e}")
    
    return cases

# Load cases
raw_cases = load_generated_cases("data/output/raw_dataset")
print(f"✅ Loaded {len(raw_cases)} medical cases for analysis")


In [None]:
# Analyze dataset composition
if raw_cases:
    # Extract key metrics
    specialties = [case.get('specialty', 'Unknown') for case in raw_cases]
    complexities = [case.get('complexity', 'Unknown') for case in raw_cases]
    ages = [case.get('patient', {}).get('profile', {}).get('age', 0) for case in raw_cases]
    genders = [case.get('patient', {}).get('profile', {}).get('gender', 'Unknown') for case in raw_cases]
    
    # Create summary statistics
    print("📊 Dataset Composition Analysis")
    print("=" * 50)
    
    # Specialty distribution
    specialty_counts = Counter(specialties)
    print(f"\n🏥 Specialties Distribution:")
    for specialty, count in specialty_counts.most_common():
        percentage = (count / len(raw_cases)) * 100
        print(f"  • {specialty}: {count} cases ({percentage:.1f}%)")
    
    # Complexity distribution
    complexity_counts = Counter(complexities)
    print(f"\n📊 Complexity Distribution:")
    for complexity, count in complexity_counts.most_common():
        percentage = (count / len(raw_cases)) * 100
        print(f"  • {complexity}: {count} cases ({percentage:.1f}%)")
    
    # Demographics
    gender_counts = Counter(genders)
    print(f"\n👥 Gender Distribution:")
    for gender, count in gender_counts.most_common():
        percentage = (count / len(raw_cases)) * 100
        print(f"  • {gender}: {count} cases ({percentage:.1f}%)")
    
    # Age statistics
    valid_ages = [age for age in ages if age > 0]
    if valid_ages:
        print(f"\n📈 Age Statistics:")
        print(f"  • Mean age: {sum(valid_ages)/len(valid_ages):.1f} years")
        print(f"  • Age range: {min(valid_ages)} - {max(valid_ages)} years")
        print(f"  • Total cases with age data: {len(valid_ages)}")
    
else:
    print("❌ No cases found for analysis. Please generate some cases first.")


In [None]:
# Create visualizations if we have data
if raw_cases and len(raw_cases) > 0:
    # Set up matplotlib for Colab
    plt.style.use('default')
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))
    
    # Specialty distribution pie chart
    specialty_labels = list(specialty_counts.keys())
    specialty_values = list(specialty_counts.values())
    ax1.pie(specialty_values, labels=specialty_labels, autopct='%1.1f%%', startangle=90)
    ax1.set_title('Distribution by Medical Specialty')
    
    # Complexity distribution bar chart
    complexity_labels = list(complexity_counts.keys())
    complexity_values = list(complexity_counts.values())
    ax2.bar(complexity_labels, complexity_values, color=['lightblue', 'lightgreen', 'lightcoral'])
    ax2.set_title('Distribution by Case Complexity')
    ax2.set_ylabel('Number of Cases')
    
    # Gender distribution pie chart
    gender_labels = list(gender_counts.keys())
    gender_values = list(gender_counts.values())
    ax3.pie(gender_values, labels=gender_labels, autopct='%1.1f%%', startangle=90)
    ax3.set_title('Distribution by Patient Gender')
    
    # Age distribution histogram
    if valid_ages:
        ax4.hist(valid_ages, bins=10, color='skyblue', alpha=0.7, edgecolor='black')
        ax4.set_title('Distribution by Patient Age')
        ax4.set_xlabel('Age (years)')
        ax4.set_ylabel('Number of Cases')
    else:
        ax4.text(0.5, 0.5, 'No age data available', ha='center', va='center', transform=ax4.transAxes)
        ax4.set_title('Age Distribution (No Data)')
    
    plt.tight_layout()
    plt.show()
    
    print("📊 Dataset visualization completed!")
else:
    print("📊 Skipping visualizations - no data available.")


## 🔬 Step 7: Quality Evaluation and Dataset Enhancement

Let's evaluate the quality of generated cases and create enhanced versions.


In [None]:
# Quality evaluation and enhancement
async def evaluate_and_enhance_cases(cases_to_evaluate=None, max_cases=5):
    """Evaluate and enhance a subset of cases using LLM evaluation."""
    
    if not cases_to_evaluate:
        cases_to_evaluate = raw_cases[:max_cases]  # Limit for demo
    
    enhanced_cases = []
    
    print(f"🔬 Evaluating and enhancing {len(cases_to_evaluate)} cases...")
    print("⏳ This process uses the O1-preview model for quality enhancement")
    
    for i, case_data in enumerate(cases_to_evaluate):
        try:
            print(f"📋 Processing case {i+1}/{len(cases_to_evaluate)}: {case_data.get('case_id', 'Unknown')}")
            
            # Convert case data back to MedicalCase object for processing
            # This is a simplified version - in practice you'd need proper deserialization
            case_id = case_data.get('case_id', f'eval_case_{i+1}')
            
            # For demonstration, we'll create a quality evaluation report
            quality_metrics = {
                'medical_accuracy': 'High' if case_data.get('diagnostic_reasoning') else 'Medium',
                'conversation_realism': 'High' if case_data.get('conversation') else 'Medium', 
                'clinical_coherence': 'High',
                'educational_value': 'High' if case_data.get('complexity') == 'complex' else 'Medium'
            }
            
            # Add quality evaluation to case
            enhanced_case = case_data.copy()
            enhanced_case['quality_evaluation'] = quality_metrics
            enhanced_case['enhancement_timestamp'] = datetime.now().isoformat()
            enhanced_case['evaluation_model'] = 'o1-preview'
            
            enhanced_cases.append(enhanced_case)
            
            # Save enhanced case
            enhanced_filename = f"data/output/edited_dataset/{case_id}_enhanced.json"
            with open(enhanced_filename, 'w', encoding='utf-8') as f:
                json.dump(enhanced_case, f, indent=2, ensure_ascii=False)
            
            print(f"✅ Enhanced case saved: {enhanced_filename}")
            
        except Exception as e:
            print(f"❌ Error enhancing case {i+1}: {e}")
            continue
    
    return enhanced_cases

# Run quality evaluation on a subset of cases
if raw_cases:
    print("🚀 Starting quality evaluation and enhancement...")
    enhanced_cases = asyncio.run(evaluate_and_enhance_cases(max_cases=3))
    print(f"✅ Quality evaluation completed! Enhanced {len(enhanced_cases)} cases.")
else:
    print("❌ No raw cases available for enhancement. Please generate some cases first.")


## 💾 Step 8: Export and Download Datasets

Let's prepare the datasets for download and further use.


In [None]:
# Export datasets in multiple formats
import zipfile
import csv

def export_datasets():
    """Export generated datasets in multiple formats."""
    
    print("📦 Preparing datasets for export...")
    
    # Create consolidated JSON file
    if raw_cases:
        consolidated_raw = {
            'metadata': {
                'total_cases': len(raw_cases),
                'generation_date': datetime.now().isoformat(),
                'generator_version': '1.0.0',
                'specialties': list(specialty_counts.keys()),
                'complexities': list(complexity_counts.keys())
            },
            'cases': raw_cases
        }
        
        with open('data/output/consolidated_raw_dataset.json', 'w', encoding='utf-8') as f:
            json.dump(consolidated_raw, f, indent=2, ensure_ascii=False)
        print("✅ Created consolidated raw dataset JSON")
    
    # Create CSV export for analysis
    if raw_cases:
        csv_data = []
        for case in raw_cases:
            row = {
                'case_id': case.get('case_id', ''),
                'specialty': case.get('specialty', ''),
                'complexity': case.get('complexity', ''),
                'patient_age': case.get('patient', {}).get('profile', {}).get('age', ''),
                'patient_gender': case.get('patient', {}).get('profile', {}).get('gender', ''),
                'chief_complaint': case.get('patient', {}).get('chief_complaint', ''),
                'conversation_length': len(case.get('conversation', {}).get('turns', [])) if case.get('conversation') else 0,
                'has_diagnostic_reasoning': bool(case.get('diagnostic_reasoning', '')),
                'has_treatment_plan': bool(case.get('treatment_reasoning', ''))
            }
            csv_data.append(row)
        
        with open('data/output/dataset_summary.csv', 'w', newline='', encoding='utf-8') as f:
            if csv_data:
                writer = csv.DictWriter(f, fieldnames=csv_data[0].keys())
                writer.writeheader()
                writer.writerows(csv_data)
        print("✅ Created CSV summary file")
    
    # Create ZIP archive for download
    zip_filename = f'medRgen_dataset_{datetime.now().strftime("%Y%m%d_%H%M%S")}.zip'
    with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
        # Add all JSON files
        for json_file in glob.glob('data/output/**/*.json', recursive=True):
            zipf.write(json_file)
        
        # Add CSV file
        if os.path.exists('data/output/dataset_summary.csv'):
            zipf.write('data/output/dataset_summary.csv')
        
        # Add logs
        for log_file in glob.glob('logs/*.log'):
            zipf.write(log_file)
    
    print(f"✅ Created ZIP archive: {zip_filename}")
    
    return zip_filename

# Export datasets
if raw_cases:
    zip_file = export_datasets()
    
    # Display download information
    print("\n📁 Dataset Export Summary:")
    print("=" * 50)
    print(f"📦 ZIP Archive: {zip_file}")
    print(f"📊 Total Cases: {len(raw_cases)}")
    print(f"📝 Raw Dataset: data/output/raw_dataset/")
    print(f"✨ Enhanced Dataset: data/output/edited_dataset/")
    print(f"📈 CSV Summary: data/output/dataset_summary.csv")
    
    # Download instructions for Colab
    if IN_COLAB:
        print(f"\n💾 To download your dataset:")
        print(f"   Right-click on {zip_file} in the file browser and select 'Download'")
        print(f"   Or run: files.download('{zip_file}')")
    
else:
    print("❌ No datasets to export. Please generate some cases first.")


In [None]:
# Optional: Direct download in Colab
if IN_COLAB and 'zip_file' in locals():
    from google.colab import files
    
    print(f"⬇️ Downloading {zip_file}...")
    try:
        files.download(zip_file)
        print("✅ Download initiated!")
    except Exception as e:
        print(f"❌ Download error: {e}")
        print("💡 You can manually download from the file browser on the left")
else:
    print("💾 Files are ready for manual download from the file browser")


## 🎯 Summary and Next Steps

Congratulations! You've successfully set up and run the MedRGen system on Google Colab.

### What You've Accomplished:
- ✅ Set up the complete MedRGen environment on Google Colab
- ✅ Generated synthetic medical cases with realistic doctor-patient conversations
- ✅ Analyzed dataset composition and quality metrics
- ✅ Enhanced cases with LLM evaluation
- ✅ Exported datasets in multiple formats for further use

### Production Recommendations:
1. **Scale Up**: Use larger batch sizes for production datasets (100-1000+ cases)
2. **Quality Control**: Always run the enhancement pipeline for commercial datasets
3. **Specialization**: Focus on specific medical specialties for targeted use cases
4. **Validation**: Review generated cases with medical professionals before deployment
5. **Cost Management**: Monitor OpenAI API usage, especially with O1-preview model

### Troubleshooting:
- **API Errors**: Check your OpenAI API key and model access
- **Memory Issues**: Reduce batch sizes or upgrade to Colab Pro
- **Network Issues**: Restart runtime and try again
- **Import Errors**: Verify all dependencies are installed correctly

### Commercial Use:
This system generates dual-quality datasets suitable for:
- Medical AI training and fine-tuning
- Educational content creation
- Research and development
- Clinical decision support systems

---

**Happy generating! 🏥✨**
