# OpenMed Getting Started Guide

Welcome to OpenMed! This notebook will walk you through the basics of using OpenMed for healthcare-focused Named Entity Recognition (NER) tasks.

## What is OpenMed?

OpenMed is a Python toolkit that makes it easy to work with healthcare-focused NER models from Hugging Face. It provides:
- **Curated model registry** with healthcare-specific models
- **Easy model loading** and pipeline creation
- **Advanced NER processing** with filtering and grouping
- **Text preprocessing** utilities for medical text
- **Multiple output formats** (JSON, HTML, CSV, dict)
- **Built-in validation** and safety checks

## Installation

First, let's install OpenMed and its dependencies:


In [1]:
# Install OpenMed
%pip install openmed

# Install required dependencies
%pip install transformers torch

# Set up logging suppression for external libraries
import logging
import warnings

# Suppress external library logs BEFORE importing OpenMed
logging.getLogger("transformers").setLevel(logging.ERROR)
logging.getLogger("torch").setLevel(logging.ERROR)
logging.getLogger("datasets").setLevel(logging.ERROR)
logging.getLogger("huggingface_hub").setLevel(logging.ERROR)

# Suppress warnings
warnings.filterwarnings("ignore")

# Set up OpenMed logging
from openmed.utils import setup_logging
setup_logging(level="WARNING")


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

In [3]:
from openmed.core import OpenMedConfig
from openmed import setup_logging

# Set up logging with minimal output
setup_logging(level="ERROR")

# Or create a custom config
config = OpenMedConfig(log_level="ERROR")

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

## Quick Start - Basic Text Analysis

Let's start with the simplest way to analyze medical text:


In [4]:
from openmed import analyze_text

# Simple medical text analysis
text = "Patient diagnosed with acute lymphoblastic leukemia and started on imatinib."

result = analyze_text(
    text,
    model_name="disease_detection_superclinical",
    group_entities=True,
    include_confidence=True,
    output_format="dict",
)

print(result)
print("Detected entities:")
for entity in result.entities:
    print(f"{entity.label}: '{entity.text}' (confidence: {entity.confidence:.3f})")


ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

In [5]:
# Try another example with pharmaceutical entities
text2 = "Patient received 75mg clopidogrel for NSTEMI and was also prescribed metformin 500mg twice daily."

result2 = analyze_text(
    text2,
    model_name="pharma_detection_superclinical"
)

print("Detected pharmaceutical entities:")
for entity in result2.entities:
    print(f"{entity.label}:'{entity.text}' (confidence: {entity.confidence:.3f})")


NameError: name 'analyze_text' is not defined

## Model Discovery and Selection

OpenMed includes a curated registry of healthcare models. Let's explore what's available:


In [None]:
from openmed import list_models

# See all available models
models = list_models()
print(f"Total models available: {len(models)}")
print("\nFirst 10 models:")
for model in models[:10]:
    print(f"  - {model}")


In [None]:
from openmed.core.model_registry import list_model_categories
# List all model categories
categories = list_model_categories()
print("Available model categories:")
for category in categories:
    print(f"  - {category}")


In [None]:
from openmed.core.model_registry import get_models_by_category
# Get models by category
disease_models = get_models_by_category("Disease")
print("Disease detection models:")
for model in disease_models:
    print(f"  - {model.display_name} ({model.size_category})")


In [None]:
from openmed import get_model_suggestions

# Get model suggestions based on text content
text = "Metastatic breast cancer treated with paclitaxel and trastuzumab"
suggestions = get_model_suggestions(text)

print("Model suggestions for the text above:")
for key, info, reason in suggestions:
    print(f"  {info.display_name} -> {reason}")


## Visualization Example

Let's create a simple visualization of entity types detected:


In [None]:
# Analyze a longer clinical text and visualize results
clinical_text = """
Patient: John Doe, 65-year-old male
Chief Complaint: Chest pain and shortness of breath
History: Patient presents with acute onset chest pain radiating to left arm,
associated with shortness of breath and diaphoresis.
Past Medical History: Type 2 diabetes mellitus diagnosed 2015, Hypertension diagnosed 2010
Current Medications: Metformin 1000mg BID, Lisinopril 10mg daily, Atorvastatin 40mg HS
"""

# Get results from different models with proper entity grouping
disease_result = analyze_text(
    clinical_text,
    model_name="disease_detection_superclinical",
    group_entities=True
)
pharma_result = analyze_text(
    clinical_text,
    model_name="pharma_detection_superclinical",
    group_entities=True
)

print("Entity Distribution by Model:")
print("=" * 40)

# Count entities by type for each model
disease_counts = {}
for entity in disease_result.entities:
    disease_counts[entity.label] = disease_counts.get(entity.label, 0) + 1

pharma_counts = {}
for entity in pharma_result.entities:
    pharma_counts[entity.label] = pharma_counts.get(entity.label, 0) + 1

print("Disease Model:")
for label, count in disease_counts.items():
    print(f"  {label}: {count}")

print("\nPharma Model:")
for label, count in pharma_counts.items():
    print(f"  {label}: {count}")

# Create a simple text-based chart
print("\nEntity Distribution Chart:")
print("=" * 40)

# Combine all entities for visualization
all_entities = []
for entity in disease_result.entities:
    all_entities.append(("Disease Model", entity.label, entity.text))
for entity in pharma_result.entities:
    all_entities.append(("Pharma Model", entity.label, entity.text))

# Show first 10 entities in a table format
print(f"{'Model':<12} {'Entity Type':<15} {'Text'}")
print("-" * 50)
for model, label, text in all_entities[:10]:
    print(f"{model:<12} {label:<15} {text}")


## Summary

You've learned how to:

1. **Install and set up** OpenMed
2. **Perform basic text analysis** with `analyze_text()`
3. **Discover and select models** using the registry
4. **Use advanced processing** for more control
5. **Preprocess medical text** for better results
6. **Format outputs** in JSON, HTML, and CSV
7. **Process multiple texts** efficiently
8. **Validate inputs** for safety
9. **Configure** OpenMed for production use
10. **Visualize results** with entity distributions

OpenMed makes healthcare NER accessible and easy to integrate into your applications!

### Next Steps

- Explore the [OpenMed website](https://openmed.life) for more examples
- Check out the [GitHub repository](https://github.com/maziyarpanahi/openmed) for advanced usage
- Join the community discussions for support and feedback

### Useful Resources

- **Documentation**: Available on the website
- **Model Registry**: All available models with descriptions
- **Examples**: More notebooks and code samples
- **API Reference**: Complete function documentation
