# Part A: Preprocessing

### **Authors:** oscardong4@gmail.com, thomas.oneil@sydney.edu.au & heeva.baharlou@sydney.edu.com (Dec 2024) - script adapted from [here](https://github.com/BodenmillerGroup/ImcSegmentationPipeline/blob/main/scripts/imc_preprocessing.ipynb)

To fill in extra info


## Order of the analysis
0. Set up
1. MCD extraction
2. Cellpose prep
3. Cellpose model training
4. Cellpose batch segmentation
5. Feature Extraction

# 0 Set up

Anaconda is a program used to install packages needed for many steps of the pipeline to run. Follow the steps below to set up Anaconda and a `conda` environment:

**Step 1:** Install [**Anaconda** ](https://www.anaconda.com/download) <br>
**Step 2:** Once Anaconda is installed, navigate to the relevant command line interface:

<div align="center">

| Windows                                                                                            | macOS                                                                                                      |
|----------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
| 1. Search for **'Anaconda Prompt'** in the taskbar search <br> 2. Select **Anaconda Prompt**  <br> | 1. Use `cmd + space` to open Spotlight Search  <br> 2. Type **'Terminal'** and press `return` to open <br> |

</div>

  
**Step 3:** Set your directory to the project folder (and make folders)


In [None]:
cd /Desktop/ImageAnalysis

**Step 4:** Clone the IMComplete repository.

In [None]:
git clone --recursive https://github.com/CVR-MucosalImmunology/IMComplete-Workflow.git
conda env create -f dev_IMComplete-Workflow/environment.yml
conda activate IMComplete
git clone -- recursive https://github.com/BodenmillerGroup/ImcSegmentationPipeline.git
python -m pip install -e ./ImcSegmentationPipeline

In [None]:
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.4 -c pytorch -c nvidia

In [None]:
python -m pip install PyQt5 cellpose[gui] tensorflow keras

In [1]:
import torch

print(torch.cuda.is_available())


True


In [None]:
import os

projdir = "D:\Dev-IMComplete\Example"

os.makedirs(os.path.join(projdir ,"raw"), exist_ok=True)
os.makedirs(os.path.join(projdir , "analysis"), exist_ok=True)
 

# MCD extraction

**MCD extraction**  
<span style="color:grey; opacity: 0.5">Cellpose prep</span>  
<span style="color:grey; opacity: 0.5">Cellpose model training</span>  
<span style="color:grey; opacity: 0.5">Cellpose batch segmentation</span>    
<span style="color:grey; opacity: 0.5">Feature Extraction</span>    

In [17]:
denoise= True

projdir = "D:\Dev-IMComplete\Example"


In [10]:
# Import libraries
from pathlib import Path
from tempfile import TemporaryDirectory
import pandas as pd
import tifffile as tiff
import numpy as np
import imcsegpipe
from imcsegpipe.utils import sort_channels_by_mass
import os

os.makedirs(os.path.join(projdir ,"raw"), exist_ok=True)
os.makedirs(os.path.join(projdir , "analysis"), exist_ok=True)

os.chdir(projdir)

# Set and create output directories
acquisitions_dir = os.path.join(projdir, "analysis/1_mcd_out")
denoise_dir = os.path.join(projdir, "analysis/2_denoise")
segment_fold_dir = os.path.join(projdir, "analysis/3_segmentation")
segment_dir = os.path.join(segment_fold_dir, "3b_forSeg")
output_dir = os.path.join(segment_fold_dir, "3a_fullstack")


os.makedirs(acquisitions_dir,exist_ok=True)
os.makedirs(denoise_dir,exist_ok=True)
os.makedirs(segment_fold_dir,exist_ok=True)
os.makedirs(segment_dir,exist_ok=True)
os.makedirs(output_dir,exist_ok=True)

acquisitions_dir = Path(acquisitions_dir)
denoise_dir = Path(denoise_dir)
segment_fold_dir = Path(segment_fold_dir)
segment_dir = Path(segment_dir)
output_dir = Path(output_dir)

# Raw directory with raw data files
raw = Path(os.path.join(projdir ,"raw"))

# Step 1: Extract .mcd files
temp_dirs = []
try:
    for raw_dir in [raw]:
        zip_files = list(raw_dir.rglob("**/*.zip"))
        if len(zip_files) > 0:
            temp_dir = TemporaryDirectory()
            temp_dirs.append(temp_dir)
            for zip_file in sorted(zip_files):
                imcsegpipe.extract_zip_file(zip_file, temp_dir.name)
    for raw_dir in [raw] + [Path(temp_dir.name) for temp_dir in temp_dirs]:
        mcd_files = list(raw_dir.rglob("*.mcd"))
        mcd_files = [i for i in mcd_files if not i.stem.startswith('.')]
        if len(mcd_files) > 0:
            txt_files = list(raw_dir.rglob("*.txt"))
            txt_files = [i for i in txt_files if not i.stem.startswith('.')]
            matched_txt_files = imcsegpipe.match_txt_files(mcd_files, txt_files)
            for mcd_file in mcd_files:
                imcsegpipe.extract_mcd_file(
                    mcd_file,
                    acquisitions_dir / mcd_file.stem,
                    txt_files=matched_txt_files[mcd_file]
                )
finally:
    for temp_dir in temp_dirs:
        temp_dir.cleanup()
    del temp_dirs

# Read the panel.csv
panel = pd.read_csv("panel.csv")

# Step 2: Generate image stacks (_full and _segment)
for acquisition_dir in acquisitions_dir.glob("[!.]*"):
    if acquisition_dir.is_dir():
        imcsegpipe.create_analysis_stacks(
            acquisition_dir=acquisition_dir,
            analysis_dir=output_dir,
            analysis_channels=sort_channels_by_mass(
                panel.loc[panel["Full"] == 1, "Metal Tag"].tolist()
            ),
            suffix="_full",
            hpf=50.0
        )
        imcsegpipe.create_analysis_stacks(
            acquisition_dir=acquisition_dir,
            analysis_dir=segment_dir,
            analysis_channels=sort_channels_by_mass(
                panel.loc[panel["Segment"] == 1, "Metal Tag"].tolist()
            ),
            suffix="_segment",
            hpf=50.0
        )

# Step 3: Process TIFFs for denoising
if denoise:
    for sample_dir in acquisitions_dir.glob("[!.]*"):
        if sample_dir.is_dir():
            for roi_tiff_path in sample_dir.glob("*.tiff"):
                roi_name = roi_tiff_path.stem
                roi_subdir = denoise_dir / roi_name
                roi_subdir.mkdir(parents=True, exist_ok=True)

                # Load the stack using tifffile
                with tiff.TiffFile(roi_tiff_path) as tif:
                    stack = tif.asarray()  # Load the entire TIFF stack as a NumPy array

                # Filter and unstack based on panel.csv
                for idx, row in panel[panel["Full"] == 1].iterrows():
                    metal_tag = row["Metal Tag"]
                    target = row["Target"]
                    output_name = f"{metal_tag}-{target}_{metal_tag}.tiff"
                    output_path = roi_subdir / output_name

                    # Extract the specific slice from the stack
                    slice_image = stack[idx, :, :]  # Adjust indexing based on stack structure

                    # Save the slice as a TIFF
                    tiff.imwrite(output_path, slice_image.astype(np.uint16))  # Save as 16-bit TIFF

print("Done!")


Done!


# Cellpose preparation

<strike>MCD extraction</strike>  
**Cellpose prep**  
<span style="color:grey; opacity: 0.5">Cellpose model training</span>  
<span style="color:grey; opacity: 0.5">Cellpose batch segmentation</span>    
<span style="color:grey; opacity: 0.5">Feature Extraction</span>    

Set your variables before running. Identify the `DNA` channel and the `square size` (in pixels) you want to use for cellpose training

In [12]:
dna = "DNA"
square_size = 200

In [None]:
import os
import random
import numpy as np
import pandas as pd
from skimage import io, exposure, img_as_uint

os.chdir(projdir)

# Define directories
dir_images =  os.path.join(projdir,"analysis","3_segmentation","3b_forSeg")
im_output =  os.path.join(projdir,"analysis","3_segmentation","3c_cellpose_full")
crop_output =  os.path.join(projdir,"analysis","3_segmentation","3d_cellpose_crop")
panel_file = os.path.join(projdir,"panel.csv")

# Create output directories
os.makedirs(im_output, exist_ok=True)
os.makedirs(crop_output, exist_ok=True)

# load image list
image_list = [f for f in os.listdir(dir_images) if f.endswith(('.tiff', '.tif'))]

# read panel
panel = pd.read_csv(panel_file)
segmentation_targets = panel.loc[panel['Segment'] == 1, 'Target'].tolist()
print("Segmentation Targets:", segmentation_targets)

# get indices of dna channel
dna_index = [i for i, target in enumerate(segmentation_targets) if target == dna]

# crop and compress each image
for image_file in image_list:
    image_path = os.path.join(dir_images, image_file)
    image = io.imread(image_path)
    im_title = os.path.splitext(image_file)[0]
    
    # normalise
    normalized_stack = []
    for i in range(image.shape[0]): 
        channel = image[i, :, :]
        normalized = exposure.rescale_intensity(channel, in_range='image', out_range=(0, 1))
        normalized_stack.append(img_as_uint(normalized))
    normalized_stack = np.stack(normalized_stack)
    
    # get dna channel
    if dna_index:
        # keep only the first instance of dna
        dna_channel = normalized_stack[dna_index[0]]
        
        # remove dna from segmentation stack
        for idx in sorted(dna_index, reverse=True):
            normalized_stack = np.delete(normalized_stack, idx, axis=0)
    else: #error message if dna not found
        raise ValueError("DNA channel not found in segmentation targets.")
    
    # create mask for surface segmentation
    surface_mask = np.mean(normalized_stack, axis=0).astype(np.uint16)
    
    # create empty channel - for cellpose colour scheme to avoid red/green and combine in order empty > segment > dna
    empty_channel = np.zeros_like(dna_channel, dtype=np.uint16)
    # empty -> surface mask -> DNA
    composite_stack = np.stack([empty_channel, surface_mask, dna_channel])
    
    # save
    im_output_path = os.path.join(im_output, f"{im_title}_CpSeg.tiff")
    io.imsave(im_output_path, composite_stack)
    
    # get crop dimensions
    height, width = composite_stack.shape[1:3]
    if width < square_size or height < square_size:
        # if image is smaller than crop size, save image itself as the crop
        crop_output_path = os.path.join(crop_output, f"{im_title}_CpCrop.tiff")
        io.imsave(crop_output_path, composite_stack)
        print(f"Image {im_title} is smaller than the cropping size. Saved without cropping.")
        continue

    # create the crop and save
    workable_x = width - square_size
    workable_y = height - square_size
    rand_x = random.randint(0, workable_x)
    rand_y = random.randint(0, workable_y)
    cropped = composite_stack[:, rand_y:rand_y + square_size, rand_x:rand_x + square_size]
    crop_output_path = os.path.join(crop_output, f"{im_title}_CpCrop.tiff")
    io.imsave(crop_output_path, cropped)
print("Done!")

# Cell pose 

In [None]:
python -m cellpose

In [20]:
# Set your required variables here
model_path = "D:\Dev-IMComplete\IFMasksOnIMCModel_HumanColon_TN3_CD12_FT1"
channels = [2, 3] # This means Channel 1 was 'Green' and Channel 2 was 'Blue' (1 = R, 2 = G, 3 = B)
cell_diameter = 12.4
flow_threshold = 3
cellprob_threshold = -6

In [23]:
# Import libraries
import os
import skimage.io
from cellpose import models, core
from cellpose.io import logger_setup
import shutil
from pathlib import Path

# Define Cellpose model
model = models.CellposeModel(gpu=core.use_gpu(), pretrained_model=model_path)

# Set and create directories
analysis = Path(projdir)
image_dir = analysis / "analysis/3_segmentation/3c_cellpose_full"
mask_dir = analysis / "analysis/3_segmentation/3e_cellpose_mask"
os.makedirs(mask_dir, exist_ok=True)

# Call logger_setup to have output of cellpose written
logger_setup()

# Get list of image files
files = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith(".tiff")]  # Adjust the file extension if necessary
imgs = [skimage.io.imread(f) for f in files]

# Run segmentation
masks, flows, styles = model.eval(imgs, diameter=cell_diameter, flow_threshold=flow_threshold, cellprob_threshold=cellprob_threshold, channels=channels)

# Save mask images
for idx, mask in enumerate(masks):
    original_path = Path(files[idx])
    new_path = mask_dir / (original_path.stem + "_mask.tif")
    skimage.io.imsave(new_path, mask)


cellp_out = analysis / "analysis/4_cellprofiler_output"
cellp_out.mkdir(exist_ok=True)

print("Done!")


2024-12-18 16:58:42,067 [INFO] ** TORCH CUDA version installed and working. **
2024-12-18 16:58:42,070 [INFO] ** TORCH CUDA version installed and working. **
2024-12-18 16:58:42,071 [INFO] >>>> using GPU (CUDA)
2024-12-18 16:58:42,249 [INFO] >>>> loading model D:\Dev-IMComplete\IFMasksOnIMCModel_HumanColon_TN3_CD12_FT1
2024-12-18 16:58:42,396 [INFO] >>>> model diam_mean =  30.000 (ROIs rescaled to this size during training)
2024-12-18 16:58:42,396 [INFO] >>>> model diam_labels =  9.655 (mean diameter of training ROIs)
creating new log file
2024-12-18 16:58:42,398 [INFO] WRITING LOG OUTPUT TO C:\Users\daniel.buffa\.cellpose\run.log
2024-12-18 16:58:42,398 [INFO] 
cellpose version: 	3.1.0 
platform:       	win32 
python version: 	3.10.16 
torch version:  	2.5.0
2024-12-18 16:58:42,411 [INFO] 0%|          | 0/7 [00:00<?, ?it/s]
2024-12-18 16:58:58,208 [INFO] 100%|##########| 7/7 [00:15<00:00,  2.26s/it]
Done!


  skimage.io.imsave(new_path, mask)
  skimage.io.imsave(new_path, mask)
  skimage.io.imsave(new_path, mask)
  skimage.io.imsave(new_path, mask)
  skimage.io.imsave(new_path, mask)
