# MIHCSME OMERO Package Demo

This notebook demonstrates how to use the `mihcsme-omero` package to:

1. Parse MIHCSME Excel files into validated Pydantic models
2. Inspect and manipulate metadata programmatically
3. Export metadata to JSON
4. Upload metadata to OMERO servers
5. Create metadata programmatically

## Prerequisites

```bash
# Install the package
pip install -e .
```

## 1. Setup and Imports

In [1]:
from pathlib import Path
import json
from pprint import pprint

from mihcsme_omero import parse_excel_to_model, upload_metadata_to_omero
from mihcsme_omero.models import (
    MIHCSMEMetadata,
    AssayCondition,
    InvestigationInformation,
    StudyInformation,
    AssayInformation,
)

# Optional: For nice table display
import pandas as pd

print("‚úÖ Imports successful!")

‚úÖ Imports successful!


## 2. Parse MIHCSME Excel File

Let's parse an Excel file containing MIHCSME metadata.

In [12]:
# Path to your MIHCSME Excel file
excel_path = Path("../MIHCSME Template_MH.xlsx")

# Check if file exists
if not excel_path.exists():
    print(f"‚ö†Ô∏è  File not found: {excel_path}")
    print("Please update the path to your MIHCSME Excel file")
else:
    print(f"üìÑ Found Excel file: {excel_path}")

üìÑ Found Excel file: ../MIHCSME Template_MH.xlsx


In [13]:
# Parse the Excel file into a Pydantic model
metadata = parse_excel_to_model(excel_path)

print(f"‚úÖ Successfully parsed metadata!")
print(f"   Number of wells: {len(metadata.assay_conditions)}")
print(f"   Number of reference sheets: {len(metadata.reference_sheets)}")

‚úÖ Successfully parsed metadata!
   Number of wells: 72
   Number of reference sheets: 4


## 3. Inspect Metadata Structure

The parsed metadata is a fully typed Pydantic model with four main sections:
- Investigation Information
- Study Information
- Assay Information
- Assay Conditions (per-well data)

### 3.1 Investigation Information

In [4]:
print("üìã Investigation Information Groups:")
for group_name, fields in metadata.investigation_information.groups.items():
    print(f"\n{group_name}:")
    for key, value in fields.items():
        print(f"  {key}: {value}")

üìã Investigation Information Groups:

DataOwner:
  First Name: Mazene
  Middle Name(s): None
  Last Name: Hochane
  User name: hochanem
  Institute: Universiteit Leiden
  E-Mail Address: test@leidenuniv.nl
  ORCID investigator: https://orcid.org/0000-0002-7990-6010

DataCollaborator:
  ORCID  Data Collaborator: None

InvestigationInformation:
  Project ID: 1337
  Investigation Title: What are we seeing here
  Investigation internal ID: WAWSH
  Investigation description: cells


### 3.2 Study Information

In [5]:
print("üî¨ Study Information:")
for key, value in metadata.study_information.groups.items():
    print(f"  {key}: {value}")

üî¨ Study Information:
  Study: {'Study Title': 'Microscopy investiation', 'Study internal ID': 1337, 'Study Description': 'Interesting stuff', 'Study Key Words': '[microscopy, high-content screening]'}
  Biosample: {'Biosample Taxon': None, 'Biosample description': 'IPSc', 'Biosample Organism': 'Human', 'Number of cell lines used': 1}
  Library: {'Library File Name': 'whatever.xlsx', 'Library File Format': 'xlsx', 'Library Type': 'List', 'Library Manufacturer': None, 'Library Version': None, 'Library Experimental Conditions': None, 'Quality Control Description': 'Nothing'}
  Protocols: {'HCS library protocol': 'http://eln', 'growth protocol': None, 'treatment protocol': None, 'HCS data analysis protocol': None}
  Plate: {'Plate type': 'uclear', 'Plate type Manufacturer': 'Geiner', 'Plate type Catalog number': 1337}


### 3.3 Assay Information

In [6]:
print("üß™ Assay Information:")
for key, value in metadata.assay_information.groups.items():
    print(f"  {key}: {value}")

üß™ Assay Information:
  Assay: {'Assay Title': 'Look no further', 'Assay internal ID': 1234, 'Assay Description': None, 'Assay number of biological replicates': None, 'Number of plates': None, 'Assay Technology Type': None, 'Assay Type': 'high content analysis of cells', 'Assay External URL': None, 'Assay data URL': None}
  AssayComponent: {'Imaging protocol': None, 'Sample preparation protocol': None}
  Biosample: {'Cell lines storage location': None, 'Cell lines clone number': None, 'Cell lines Passage number': None}
  ImageData: {'Image number of pixelsX': 512, 'Image number of pixelsY': 512, 'Image number of  z-stacks': 7, 'Image number of channels': 3, 'Image number of timepoints': 1, 'Image sites per well': 1}
  ImageAcquisition: {'Microscope id': 3444}
  Specimen: {'Channel Transmission id': None, 'Channel 1 visualization method': 'Hoechst 33258', 'Channel 1 entity': 'DNA', 'Channel 1 label': 'Nuclei', 'Channel 1 id': 0, 'Channel 2 visualization method': 'EGFP', 'Channel 2 ent

### 3.4 Assay Conditions (Well-level Data)

Let's look at the first few wells and their conditions.

In [7]:
print(f"üî¨ Total wells with conditions: {len(metadata.assay_conditions)}\n")

# Show first 5 wells
print("First 5 wells:")
for condition in metadata.assay_conditions[:5]:
    print(f"\nPlate: {condition.plate}, Well: {condition.well}")
    print(f"Conditions: {condition.conditions}")

üî¨ Total wells with conditions: 72

First 5 wells:

Plate: plate_day_7, Well: B01
Conditions: {'Treatment': 'TNFa', 'Concentration': '5', 'Unit': 'uM', 'CellLine': 'iPS', 'TimeTreatment': '1h', 'RepID': '1'}

Plate: plate_day_7, Well: B02
Conditions: {'Treatment': 'IL6', 'Concentration': '5', 'Unit': 'uM', 'CellLine': 'iPS', 'TimeTreatment': '1h', 'RepID': '1'}

Plate: plate_day_7, Well: B03
Conditions: {'Treatment': 'IGF-1', 'Concentration': '6', 'Unit': 'uM', 'CellLine': 'iPS', 'TimeTreatment': '1h', 'RepID': '1'}

Plate: plate_day_7, Well: B04
Conditions: {'Treatment': 'Insulin', 'Concentration': '6', 'Unit': 'uM', 'CellLine': 'iPS', 'TimeTreatment': '1h', 'RepID': '1'}

Plate: plate_day_7, Well: B05
Conditions: {'Treatment': 'Leptin', 'Concentration': '6.5', 'Unit': 'uM', 'CellLine': 'iPS', 'TimeTreatment': '1h', 'RepID': '1'}


### 3.5 Display as DataFrame

Convert well conditions to a pandas DataFrame for easy viewing.

In [4]:
# Convert assay conditions to a list of dicts
conditions_data = []
for condition in metadata.assay_conditions:
    row = {
        "Plate": condition.plate,
        "Well": condition.well,
        **condition.conditions,  # Unpack all condition fields
    }
    conditions_data.append(row)

# Create DataFrame
df = pd.DataFrame(conditions_data)
print(f"üìä Assay Conditions DataFrame ({len(df)} rows):")
df.head(10)

üìä Assay Conditions DataFrame (72 rows):


Unnamed: 0,Plate,Well,Concentration,Unit,RepID
0,plate_day_7,B01,5.0,uM,1
1,plate_day_7,B02,5.0,uM,1
2,plate_day_7,B03,6.0,uM,1
3,plate_day_7,B04,6.0,uM,1
4,plate_day_7,B05,6.5,uM,1
5,plate_day_7,B06,6.9,uM,1
6,plate_day_7,B07,7.3,uM,1
7,plate_day_7,B08,7.7,uM,1
8,plate_day_7,B09,8.1,uM,1
9,plate_day_7,B10,8.5,uM,1


## 4. Filter and Query Metadata

Programmatically access and filter the metadata.

In [5]:
# Get all unique plates
plates = {condition.plate for condition in metadata.assay_conditions}
print(f"üìã Unique plates: {plates}")

# Filter by plate
if plates:
    first_plate = list(plates)[0]
    plate_conditions = [
        c for c in metadata.assay_conditions if c.plate == first_plate
    ]
    print(f"\nüîç Wells in {first_plate}: {len(plate_conditions)}")

üìã Unique plates: {'plate_day_7'}

üîç Wells in plate_day_7: 72


In [6]:
# Get all unique condition keys
all_keys = set()
for condition in metadata.assay_conditions:
    all_keys.update(condition.conditions.keys())

print(f"üîë All condition keys found: {sorted(all_keys)}")

üîë All condition keys found: ['Concentration', 'RepID', 'Unit']


In [7]:
# Filter wells by specific condition
# Example: Find all wells with a specific compound (adjust key name as needed)
condition_key = "CellLine"  # Change this to match your data

if condition_key in all_keys:
    unique_values = {
        c.conditions.get(condition_key)
        for c in metadata.assay_conditions
        if condition_key in c.conditions
    }
    print(f"\nüíä Unique values for '{condition_key}': {unique_values}")
else:
    print(f"\n‚ö†Ô∏è  Condition key '{condition_key}' not found. Available keys: {sorted(all_keys)}")


‚ö†Ô∏è  Condition key 'CellLine' not found. Available keys: ['Concentration', 'RepID', 'Unit']


## 5. Export to JSON

Export the metadata to JSON format for storage or sharing.

In [13]:
# Export as JSON (Pydantic's native format)
output_json = Path("metadata_export.json")

with open(output_json, "w") as f:
    json.dump(metadata.model_dump(), f, indent=2)

print(f"‚úÖ Exported metadata to: {output_json}")
print(f"   File size: {output_json.stat().st_size / 1024:.1f} KB")

‚úÖ Exported metadata to: metadata_export.json
   File size: 45.6 KB


In [None]:
# Preview the JSON structure (first 30 lines)
if output_json.exists():
    with open(output_json) as f:
        lines = f.readlines()[:30]
        print("üìÑ JSON Preview (first 30 lines):")
        print("=" * 60)
        print("".join(lines))
        if len(f.readlines()) > 30:
            print("...")
else:
    print(f"‚ö†Ô∏è  File not found: {output_json.absolute()}")
    print("   Please run the previous cell first to export the JSON.")

## 6. Convert to Legacy OMERO Format

Convert the Pydantic model to the legacy dictionary format used by OMERO.

In [8]:
# Convert to OMERO dict format
omero_dict = metadata.to_omero_dict()

print("üì¶ OMERO Dictionary Structure:")
print(f"   Keys: {list(omero_dict.keys())}")
print(f"\n   Investigation keys: {list(omero_dict.get('InvestigationInformation', {}).keys())[:5]}")
print(f"   Study keys: {list(omero_dict.get('StudyInformation', {}).keys())[:5]}")
print(f"   Assay keys: {list(omero_dict.get('AssayInformation', {}).keys())[:5]}")
print(f"   Assay conditions count: {len(omero_dict.get('AssayConditions', []))}")

üì¶ OMERO Dictionary Structure:
   Keys: ['InvestigationInformation', 'StudyInformation', 'AssayInformation', 'AssayConditions', '_fbbiVisualizationMethods', '_fbbiImagingMethods', '_efo_studytypes', '_efo_assaytypes']

   Investigation keys: ['DataOwner', 'InvestigationInformation']
   Study keys: ['Study', 'Biosample', 'Library', 'Protocols', 'Plate']
   Assay keys: ['Assay', 'ImageData', 'ImageAcquisition', 'Specimen']
   Assay conditions count: 72


## 7. Create Metadata Programmatically

You can also create metadata objects from scratch in Python.

In [9]:
# Create a simple metadata object
custom_metadata = MIHCSMEMetadata(
    investigation_information=InvestigationInformation(
        groups={
            "Project": {
                "Investigation Title": "Demo Investigation",
                "Investigation Description": "Created programmatically",
            }
        }
    ),
    study_information=StudyInformation(
        fields={
            "Study Title": "Demo Study",
            "Study Description": "Example study",
        }
    ),
    assay_information=AssayInformation(
        fields={
            "Assay Title": "Demo Assay",
            "Assay Type": "High Content Screening",
        }
    ),
    assay_conditions=[
        AssayCondition(
            plate="DemoPlate",
            well="A1",  # Automatically normalized to "A01"
            conditions={
                "Compound": "DMSO",
                "Concentration": "0.1%",
                "Treatment Time": "24h",
            },
        ),
        AssayCondition(
            plate="DemoPlate",
            well="A2",
            conditions={
                "Compound": "Drug X",
                "Concentration": "10 ŒºM",
                "Treatment Time": "24h",
            },
        ),
        AssayCondition(
            plate="DemoPlate",
            well="B1",
            conditions={
                "Compound": "Drug X",
                "Concentration": "1 ŒºM",
                "Treatment Time": "24h",
            },
        ),
    ],
)

print("‚úÖ Created custom metadata object")
print(f"   Wells: {len(custom_metadata.assay_conditions)}")
print(f"\n   Well names (auto-normalized): {[c.well for c in custom_metadata.assay_conditions]}")

‚úÖ Created custom metadata object
   Wells: 3

   Well names (auto-normalized): ['A01', 'A02', 'B01']


## 8. Validate Well Format

Pydantic automatically validates well names and normalizes them.

In [10]:
# Valid well formats (will be normalized)
valid_wells = ["A1", "A01", "B12", "P48"]

for well in valid_wells:
    condition = AssayCondition(
        plate="Test",
        well=well,
        conditions={},
    )
    print(f"Input: '{well}' ‚Üí Normalized: '{condition.well}'")

Input: 'A1' ‚Üí Normalized: 'A01'
Input: 'A01' ‚Üí Normalized: 'A01'
Input: 'B12' ‚Üí Normalized: 'B12'
Input: 'P48' ‚Üí Normalized: 'P48'


In [None]:
# Invalid well formats (will raise ValidationError)
invalid_wells = ["Q1", "A49", "AA1", "1A", "A0"]

print("\n‚ùå Testing invalid well formats:")
for well in invalid_wells:
    try:
        condition = AssayCondition(
            plate="Test",
            well=well,
            conditions={},
        )
        print(f"  {well}: ‚úÖ Valid (unexpected!)")
    except ValueError as e:
        print(f"  {well}: ‚ùå Invalid - {str(e)[:80]}...")

## 9. Upload to OMERO (Optional)

‚ö†Ô∏è **This section requires a live OMERO connection.** Skip if you don't have access to an OMERO server.

To upload metadata to OMERO, you need:
- OMERO server URL
- Username and password
- Target Screen ID or Plate ID

In [None]:
# OMERO connection parameters (update these!)
OMERO_HOST = "omero.services.universiteitleiden.nl"  # Change this
OMERO_USER = "paulmw"  # Change this
OMERO_PASSWORD = ""  # Change this (or use getpass)

# Target for upload
TARGET_TYPE = "Screen"  # or "Plate"
TARGET_ID = 3015  # Change this to your Screen/Plate ID

print("‚ö†Ô∏è  OMERO Upload Configuration:")
print(f"   Host: {OMERO_HOST}")
print(f"   User: {OMERO_USER}")
print(f"   Target: {TARGET_TYPE} ID {TARGET_ID}")
print("\n   ‚ö†Ô∏è  Update these values before running!")

‚ö†Ô∏è  OMERO Upload Configuration:
   Host: omero.services.universiteitleiden.nl
   User: paulmw
   Target: Screen ID 3015

   ‚ö†Ô∏è  Update these values before running!


In [17]:
# Uncomment to run the upload
import ezomero
# Connect to OMERO
print("üîå Connecting to OMERO...")
conn = ezomero.connect(
    host=OMERO_HOST,
    user=OMERO_USER,
    password=OMERO_PASSWORD,
    secure=True,
)
print("‚úÖ Connected!")

# Upload metadata
print(f"\nüì§ Uploading metadata to {TARGET_TYPE} {TARGET_ID}...")
result = upload_metadata_to_omero(
    conn=conn,
    metadata=metadata,
    target_type=TARGET_TYPE,
    target_id=TARGET_ID,
    namespace="MIHCSME",
    replace=False,  # Set to True to replace existing annotations
)

# Display results
print("\nüìä Upload Results:")
print(f"   Status: {result['status']}")
print(f"   Wells processed: {result['wells_processed']}")
print(f"   Wells succeeded: {result['wells_succeeded']}")
print(f"   Wells failed: {result['wells_failed']}")

if result['errors']:
    print(f"\n‚ùå Errors encountered:")
    for error in result['errors'][:5]:  # Show first 5 errors
        print(f"   - {error}")

# Close connection
conn.close()
print("\n‚úÖ Upload complete!")


print("‚ÑπÔ∏è  Upload code commented out. Uncomment to run.")

üîå Connecting to OMERO...
‚úÖ Connected!

üì§ Uploading metadata to Screen 3015...





üìä Upload Results:
   Status: success
   Wells processed: 84
   Wells succeeded: 60
   Wells failed: 24


KeyError: 'errors'

## 10. Summary and Next Steps

### What we covered:

‚úÖ Parsing MIHCSME Excel files  
‚úÖ Inspecting metadata structure  
‚úÖ Filtering and querying data  
‚úÖ Exporting to JSON  
‚úÖ Creating metadata programmatically  
‚úÖ Validating well formats  
‚úÖ Converting to OMERO format  
‚úÖ Uploading to OMERO (optional)  

### Next Steps:

1. **CLI Usage**: Try the command-line interface
   ```bash
   mihcsme parse LEI-MIHCSME.xlsx --output metadata.json
   mihcsme validate LEI-MIHCSME.xlsx
   mihcsme upload LEI-MIHCSME.xlsx --screen-id 123
   ```

2. **Integration**: Integrate into your analysis pipeline
   - Parse metadata at the start of analysis
   - Use metadata to annotate results
   - Upload results back to OMERO

3. **Customization**: Extend for your use case
   - Add custom validation rules
   - Create custom export formats
   - Build analysis workflows

### Documentation:

- **README.md**: User documentation and quick start
- **CLAUDE.md**: Developer documentation and architecture
- **API docs**: Coming soon with Sphinx

### Questions or Issues?

Please report issues or suggestions on the project repository!