# WS44 - High throughput & automated data analysis and data management workflow with Cellprofiler and OMERO

### Introduction

In this workshop we will use this Jupyter Notebook to load image data from OMERO, feed them into a Cellprofiler pipeline and automatically upload the resulting images and measurements. The uploaded data will also be annotated using tags and key:value pairs.

### Tasks during the workshop
1.     (Data import to OMERO and preparation for analysis.)
2.  	Automated data download/injection into analysis pipeline
3.  	Automated data analysis using image analysis pipelines (e.g., Cellprofiler)
4.  	Upload of the resulting images (including tags and metadata) and measurement results (omero.tables)
5.  	Explorative data analysis using omero.parade

### Aims of this workshop:

- learn to analyze provided example datasets
- execute the full workflow
- perform easy adjustments of the pipeline 
- generation of new projects/datasets
- key:value pair annotation
- file tagging
- explorative data analysis using omero.parade/omero.parade-crossfilter

### Dataset

The data used in this workshop is derived from Pascual-Vargas et al., Sci Data, 2017
"RNAi screens for Rho GTPase regulators of cell shape and YAP/TAZ localisation in triple negative breast cancer"
DOI: 10.1038/sdata.2017.18

The data is publicly available in the Image Data Resource (idr0028)
https://idr.openmicroscopy.org/webclient/?show=screen-1651

The datasets contain RNAi screens of cancer cells that were stained with Hoechst (Nuclei), Tubulin, Actin and Yap/Taz.


### Licenses & Code

The code presented here is partially based on the following scripts and resources:

- Omero Dataset_To_Plate.py script by Will Moore, OME Team, Copyright © 2006-2014 University of Dundee. All rights reserved. Source: https://github.com/ome/omero-scripts/blob/68c7505e62115e9c086a8e5a1d3edc1d4aff35f3/omero/util_scripts/Dataset_To_Plate.py
<br>

- InjectImage module for Cellprofiler. Copyright © 2020-2021 University of Dundee. All rights reserved. Source: https://omero-guides.readthedocs.io/en/latest/cellprofiler/docs/index.html; https://github.com/ome/omero-guide-cellprofiler
<br>

- General Omero-Python API documentation, Source: https://omero-guides.readthedocs.io/en/latest/python/docs/gettingstarted.html
<br>

- Cellprofiler Python API, Copyright © 2003 - 2021 Broad Institute, Inc. All rights reserved.Source: https://github.com/CellProfiler/CellProfiler/wiki/CellProfiler-as-a-Python-package
<br>

- ezomero (https://github.com/TheJacksonLaboratory/ezomero) 
<br>


### Imports

In [2]:
#Cellprofiler
import cellprofiler_core.pipeline
import cellprofiler_core.preferences
import cellprofiler_core.utilities.java
import cellprofiler.modules
import cellprofiler_core.image
import cellprofiler_core.measurement
import cellprofiler_core.object
import cellprofiler_core.workspace
from cellprofiler_core.modules.injectimage import InjectImage


#Omero
import ezomero
from myconfig import OMEROUSER, OMEROPASS, OMEROPORT, OMEROHOST
from omero.model import OriginalFileI, PlateI, ScreenPlateLinkI, ScreenI, ImageAnnotationLinkI, ImageI
from omero.rtypes import rint, rlong, rstring, robject, unwrap
from omero.grid import DoubleColumn, ImageColumn, LongColumn, WellColumn, StringColumn, FileColumn
from omero.constants.namespaces import NSBULKANNOTATIONS
from omero.gateway import FileAnnotationWrapper


#Other
import h5py
import pandas as pd
import skimage.io
import os
import pathlib
import pickle
import tempfile
import skimage
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
from datetime import datetime
import warnings
import time
import glob
import PIL
import re
import cv2
import json
import shutil
import getpass

#Own functions
import CP_Omero_helper as cp_omero

### Parameters

In the code block below, you will add specific analysis paramters, such as the screen and plate id, you would like to image, as well as filepaths and other settings.

In [None]:
# Login to OMERO
#OMEROUSER = input(f"Enter username: \t")
#OMEROPASS = getpass.getpass(prompt = f"Enter password: \t")


#OMEROHOST = ''
#OMEROPORT = 
#OMEROWEB = ''

In [3]:
# Connection Check:
conn=ezomero.connect(OMEROUSER, OMEROPASS, "", host=OMEROHOST, port=OMEROPORT, secure=True)
print(conn.isConnected())

True


In [5]:
# OMERO IDs
screen_id = 852 #Insert ID of dataset that you want to analyse
plate_id =  765#Insert corresponding plate ID
project_id = 7911 #Project ID for temp - dataset
selected_well = "E6" # Insert well you want to analyse
tag_owner_id = 2607  # To keep the omero server clean, we will all use tags from 1 tag owner. Otherwise everyone would produce their own tags.

# Pipeline
pipe_dir = r"D:\PROJECTS\MiN_Data\Workgroups\Sarah\Project_OMERO-CP\Data_Dreisewerd_TestSet\Pipeline\General_Pipeline_v2.cppipe" #Insert directory of pipeline including name of pipeline

# Input and saving directories:
output_dir = "temp_dir" 
# if you want to use a temporary directory that is automatically created use: "output_dir = 'temp_dir'"

# Cellprofiler-settings
# (maybe remove)
overwrite_results = 'Yes'  # If yes, data present in the output folder will be overwritten
output_file_format = None  # 'npy' for numpy array, 'tiff' for image (label images: 16-bit floating point), put None if you want to keep the fileformats specified in your pipeline
plugin_directory = ""

# Name of the new dataset to which the label images will be uploaded
new_plate_name = "Results_"
append_original_plate_name = True # False

# Specify the channels that should be used for segmentation and analysis
# Same names as in CP pipeline!
ch1 = "Nuclei" #Nuclei segmentation
ch2 = "Actin" #Actin (cell body) segmentation
ch3 = "Tubulin"
ch4 = "YapTaz" #YapTaz for analysis
# ... expand if you have more channel .. ch5 = xx

channels = [ch1, ch2, ch3, ch4]

In [7]:
# Key:Value Pairs - Add the annotations you would like to pass with your analysis results

# Coud be excel sheet or textfile, or written here.
annotation_dict= {
"Goal & Description" : "We segment the cells and calculate the YAP/Taz ratio in the nuclei.", 
"Software" : "Cellprofiler",
"Software Version": "4.2.5",
"Segmentation Algorithm": "Cellpose",
"Segmentation Algorithm Version": "?",
"Models": "nuclei, cyto2",
"eLabTFW": "https://eln.uni-muenster.de/experiments.php?mode=edit&id=71",
}

## 1. Perform Cellprofiler Analysis

In this part we will obtain the image data from omero, inject it into the cellprofiler analysis pipeline and perform the image analysis. Results will be saved on disk in the specified output folder.

In [8]:
### Prepare Cellprofiler

#Set output directory
if output_dir == "temp_dir":
    temp_dir = tempfile.mkdtemp()  # Creates a temporary directory
    temp_path = os.path.normcase(temp_dir)
    saving_path = pathlib.Path(temp_path).absolute()
else:
    saving_path = pathlib.Path(output_dir).absolute()

cellprofiler_core.preferences.set_default_output_directory(f"{saving_path}")
print(f"Data will be saved to: {saving_path}")    


# Set-Up Cellprofiler
cellprofiler_core.preferences.set_headless() # The headless mode runs cellprofiler without use of the GUI. 
cellprofiler_core.preferences.set_plugin_directory(plugin_directory) # Sets the plugin directory that contains the cellpose module plugin
cellprofiler_core.preferences.set_max_workers(1) # You can increase the number of workers depending on your computer/server hardware.


#Start the Java VM
cellprofiler_core.utilities.java.start_java()

Data will be saved to: c:\users\min_acc1\appdata\local\temp\4\tmptdy49_ql


In [9]:
# Here we load the pipeline and adjust it to work with Omero. 

pipeline = cp_omero.load_pipeline(pipe_dir)
pipeline = cp_omero.adjust_pipeline(pipeline, overwrite_results, output_file_format) 

ERROR:root:Failed to load pipeline
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\WS44\lib\site-packages\cellprofiler_core\pipeline\_pipeline.py", line 543, in setup_modules
    module = self.setup_module(
  File "C:\ProgramData\Anaconda3\envs\WS44\lib\site-packages\cellprofiler_core\pipeline\_pipeline.py", line 569, in setup_module
    module = self.instantiate_module(module_name)
  File "C:\ProgramData\Anaconda3\envs\WS44\lib\site-packages\cellprofiler_core\pipeline\_pipeline.py", line 197, in instantiate_module
    return instantiate_module(module_name)
  File "C:\ProgramData\Anaconda3\envs\WS44\lib\site-packages\cellprofiler_core\utilities\core\modules\__init__.py", line 180, in instantiate_module
    module = get_module_class(module_name)()
  File "C:\ProgramData\Anaconda3\envs\WS44\lib\site-packages\cellprofiler_core\utilities\core\modules\__init__.py", line 175, in get_module_class
    raise ValueError("Could not find the %s module" % module_class)
Valu

Remove module:  Images
Remove module:  Metadata
Remove module:  NamesAndTypes
Remove module:  Groups
Pipeline modules:
1 GrayToColor
2 RescaleIntensity
3 RescaleIntensity


In [None]:
# Start Analysis

# We define a timer to track how long the analysis will take.
start_time = datetime.now().strftime("%H:%M:%S")


# We connect to omero and get the plate we want to analyse
conn=ezomero.connect(OMEROUSER, OMEROPASS, "", host=OMEROHOST, port=OMEROPORT, secure=True) # Connection to Omero
plate = conn.getObject("Plate", plate_id) # Gets the plate you want to analyse


# Start of the analysis
print(f"You are analyzing the well {selected_well}.")
measurements = {}
df_features=pd.DataFrame()


# The code will loop through each well and perform the analysis. 
wells = list(plate.listChildren())

for count, well in enumerate(wells):
    if (well.row, well.column) == (cp_omero.well_name_to_position(selected_well)):  #For workshop purposes we only select one well to analse
        print(well.row, well.column)

    # Load a single Image per Well
        index = well.countWellSample() # Will analyse all images in the well
        index = 1 # Will analyse 1 image in the well (omit if you want to analyse all images)

        for i in range(0, index):
            image = well.getImage(i)
            image_id = image.getId()
            pixels = image.getPrimaryPixels()
            size_c = image.getSizeC()

            # For each Image in OMERO, we copy pipeline, add the image_id to the Saving and Export Modules. 
            pipeline_copy = pipeline.copy()

            # Find the SaveImages modules and update its settings          
            pipeline_copy = cp_omero.update_save_images_module_setting(pipeline, image_id)       
                            
            # Find the ExportToSpreadsheet module and update its settings
            pipeline_copy = cp_omero.update_export_module_setting(pipeline, image_id)

            # Inject image for each channel into the pipeline.
            for c in range(0, size_c):
                plane = pixels.getPlane(0, c, 0)
                image_name = image.getName()
                image_id = image.getId()

                # Name of the channel expected in the pipeline
                if c == 0:
                    image_name = ch1
                if c == 1:
                    image_name = ch2
                if c == 2:
                    image_name = ch3
                if c == 3:
                    image_name = ch4
                inject_image_module = InjectImage(image_name, plane)
                inject_image_module.set_module_num(1)
                pipeline_copy.add_module(inject_image_module)

            # Here we run the pipeline on our image.
            output_measurements = pipeline_copy.run()

            # Here we process the measurement results
            measurements[image_id] = output_measurements
            feature_meas = output_measurements.compute_aggregate_measurements(1, aggs=None)
            df_feature = pd.DataFrame(feature_meas, index=[image_id])
            df_features = pd.concat([df_features,df_feature])
            print(f"ImageID: {image_id} :  finished")

df_features["Image_ID"] = df_features.index
df_features.to_csv(os.path.join(saving_path,"features_summary.csv")) #Saving the results

# Timer
end_time = datetime.now().strftime("%H:%M:%S")

print(f"Pipeline finished: {len(measurements)} images analysed")

print(f"Analysis started: {start_time}")
print(f"Analysis finished: {end_time}")

In [None]:
# Write adjusted pipeline to file. This file will be later uploaded as file attachment. 
with open(os.path.join(str(saving_path), 'Pipeline.json'), 'w') as f:
    pipeline_dict = {
        'modules': [
            {
                'module_num': i,
                'module_name': str(x),
                'settings': [setting.to_dict() for setting in x.settings()]
            }
            for i, x in enumerate(pipeline_copy.modules())
        ]
    }
    json.dump(pipeline_dict, f, indent=4)

In [None]:
#conn.close()

## 2. Upload Results To Omero

We will now upload the results to omero.  

We will first create a new screen and plate to host the resulting images. <br>
Then we will derive image information (parent ID and appendix) from the file name. <br>
The images will be updated to a (temporary) dataset. <br>
Finally, all images will be distributed on the new results plate in the corresponding wells. <br>

In [None]:
############# 1. Creation of plate that hosts the results #############
#conn=ezomero.connect(OMEROUSER, OMEROPASS, "", host=OMEROHOST, port=OMEROPORT, secure=True)
screen = conn.getObject("Screen", screen_id)


# Create new plate
plate = conn.getObject("Plate", plate_id)
if append_original_plate_name:
    plate_name = new_plate_name + plate.name
else:
    plate_name = new_plate_name
    
results_plate = PlateI()
results_plate.name = rstring(plate_name)
results_plate = conn.getUpdateService().saveAndReturnObject(results_plate)
results_plate_id = results_plate.getId()
results_dataset_id = ezomero.post_dataset(conn, "TempData", project_id, description="Temp dataset for image results")
results_dataset = conn.getObject("Dataset", results_dataset_id)

# Links new Plate with new Screen
link = ScreenPlateLinkI()
link.setParent(ScreenI(screen_id, False))
link.setChild(PlateI(results_plate_id, False))
link_update_service = conn.getUpdateService()
link_update_service.saveObject(link)

In [None]:
############# 2. Prepare image information #############
# Find image results to upload
results = [str(f) for f in pathlib.Path(saving_path).glob("*")]

image_results = [x for x in results if x.endswith((".png", ".npy", ".tiff"))]
image_result_tags = sorted(list(set([x.strip(".png.npy.tiff").split("_")[-1] for x in image_results])), key=lambda x: x.lower())
image_ids = sorted(list(set([x.split("_")[0] for x in image_results])), key=lambda x: x.lower())
print("Resulting image types:", image_result_tags)
print(f"You analysed {len(image_ids)} images.")

# Result measurements
table_results = [x for x in results if x.endswith(".csv")]


In [None]:
############# 3. Main Upload #############
conn=ezomero.connect(OMEROUSER, OMEROPASS, "", host=OMEROHOST, port=OMEROPORT, secure=True)

# Upload all result images
omero_images = []

for img_path in pathlib.Path(saving_path).glob((f"*{output_file_format}")):
    parent_id, image = cp_omero.load_result_image_from_disk(img_path)
    image_link = "Original Image: " + OMEROWEB + "?show=image-" + parent_id
    omero_image = cp_omero.upload_image_from_npseq(image, img_path, conn, results_dataset, image_link)
    omero_images.append(omero_image)

cp_omero.add_images_to_plate(omero_images, plate_id, results_plate_id, conn, results_dataset)

## 3. Tag Upload

To aid filtering inside omero, we will add tags to the result images based on their appendix. 
First, we query omero for all existing tags. 
Then, well find the uploaded images and add their corresponding tag to them (e.g. "NucleiSeg" for the nuclei-segmentation images)

In [None]:
#conn=ezomero.connect(OMEROUSER, OMEROPASS, "", host=OMEROHOST, port=OMEROPORT, secure=True)

# Create dictionary to save your existing tags
existing_tags = {}

# Define your sql query, you use an sql to search for all existing tags, to prevent creation of double tags
sql = f"SELECT ann.id, ann.description, ann.textValue from TagAnnotation ann WHERE ann.details.owner.id = {tag_owner_id}"

for element in conn.getQueryService().projection(sql, None):    #element: list with 3 elements (ann.id, ann.description, ann.textValue)
                                                                #element[0]: object #0 (::omero::RLong){_val = 15286} type: <class 'omero.rtypes.RLongI'>
    tag_id, description, text = list(map(unwrap, element))
    existing_tags[text] = tag_id

print(f"The following tags exist: {existing_tags}.")

In [None]:
#conn=ezomero.connect(OMEROUSER, OMEROPASS, "", host=OMEROHOST, port=OMEROPORT, secure=True)
plate = conn.getObject("Plate", results_plate_id)

for well in plate.listChildren():
    index = well.countWellSample()
    for index in range(0, index):
        tag_name = well.getImage(index).getName().split(".")[0].split("_")[-1]
        if tag_name in existing_tags:
            try:
                tag_id = existing_tags[tag_name]
                image = conn.getObject("Image", well.getImage(index).getId())
                image.linkAnnotation(conn.getObject("Annotation", tag_id))
                print(f"Image {image.getName()} was tagged with {tag_name}")
            except omero.ValidationException:
                print(f"Image {image.getName()} was already tagged.")
        else:
            tag_ann = omero.gateway.TagAnnotationWrapper(conn)
            tag_ann.setValue(tag_name)
            tag_ann.setDescription("No description")
            tag_ann.save()
            image = conn.getObject("Image", well.getImage(index).getId())
            image.linkAnnotation(tag_ann)
            existing_tags[tag_name] = tag_ann.id
            print("New tag created: ", tag_name, ".")

In [None]:
# Add key:value pairs to your data:
# You can create a simple annotation dictionary to add Key:Value pairs to the plate

annotation_dict = {"TiM23": "WS44", "Software": "Cellprofiler 4.2.5", "Segmentation Algorithm": "Cellpose"} 

# Add KV pairs to the plate:
map_ann_id = ezomero.post_map_annotation(conn, "Plate", results_plate_id.getValue(), annotation_dict, "myns")

# Add KV pairs to every image in the plate:
results_plate = conn.getObject("Plate", results_plate_id)

for well in results_plate.listChildren():
    for image in well.listChildren():
        map_ann_id = ezomero.post_map_annotation(conn, "Image", image.id, annotation_dict, "myns")
        
print(f"You added the these annotations {annotation_dict} as key:value pairs to the wells.")

In [None]:
# Upload pipeline to results plate:
filepath_pipeline_txt = f"{saving_path}\Pipeline.json"
file_ann_id = ezomero.post_file_annotation(conn, "Plate", results_plate_id.getValue(), filepath_pipeline_txt, ns= "myns", description="This pipeline was used for analysis.")

print("You succesfully added your pipeline as file annotation to the plate.")

## 4. Upload of ROIs

## 5. Upload result as omero.table

Finally, we will upload the measurement results as an "omero.table" to Omero and link it to the analysed plate. 
These measurement results can be viewed in omero.parade-crossfilter.

In [None]:
#filepath = os.path.join(saving_path,"features_summary.csv")
#df_features = pd.read_csv(filepath)
#df_features.rename(columns={'Unnamed: 0': "Image_ID"}, inplace=True)
#conn=ezomero.connect(OMEROUSER, OMEROPASS, "", host=OMEROHOST, port=OMEROPORT, secure=True)
#screen = conn.getObject("Screen", screen_id)

In [None]:
# Here we create the columns with the correct column types for an omero.table
cols = []

for col in df_features.columns:
    if col == "Image_ID":
        cols.append(ImageColumn(col, '', df_features[col]))
    elif df_features[col].dtype == 'int64':
        cols.append(LongColumn(col, '', df_features[col]))
    elif df_features[col].dtype == 'float64':
        cols.append(DoubleColumn(col, '', df_features[col]))

In [None]:
# Initialize a table
resources = conn.c.sf.sharedResources()
repository_id = resources.repositories().descriptions[0].getId().getValue()
table_name = plate_name +"_CellprofilerResults"
table = resources.newTable(repository_id, table_name)
table.initialize(cols)
table.addData(cols)

# Create file annotation
orig_file = table.getOriginalFile()
file_ann = FileAnnotationWrapper(conn)
file_ann.setNs(NSBULKANNOTATIONS)
file_ann._obj.file = OriginalFileI(orig_file.id.val, False)
file_ann.save()

# Link the table to the original screen
screen.linkAnnotation(file_ann)
table.close()
print("You added your analysis results as omero.table to your screen. You can now view them in omero.parade crossfilter.")

### 5. Clean Up

In [None]:
# Delete temporay results data set
conn.deleteObjects("Dataset", [results_dataset_id])

In [None]:
# Close the omero connection
conn.close()

In [None]:
# Delete your temporary directory
if output_dir == "temp_dir":
    shutil.rmtree(temp_dir)