## Setting up IMC-Denoise

1. Follow the instructions under the **'Installation'** header from here: https://github.com/PENGLU-WashU/IMC_Denoise. In brief, you need to setup a new conda environment and install some packages with specific version numbers, and then clone and install the IMCDenoise package from Github.

2. Run the following command in Anaconda prompt to install a couple of extra packages we will need in the new environment: **conda install tqdm pandas**

## Imports and functions
Run this to import all the relevant packages and functions. If there is an error here, you have not setup the environment properly. It may also be possible that your graphics card / GPU is not compatible with the script. It *should* all still run, but without acceleration from the GPU, it could be *incredibly* slow!

In [1]:
import os
from os import listdir
from os.path import isfile, join, abspath, exists
from glob import glob
import tifffile as tp
import pandas as pd
from pathlib import Path
from copy import copy
from tqdm import tqdm

from mpl_toolkits.axes_grid1 import make_axes_locatable
import numpy as np
import matplotlib.pyplot as plt
import tifffile as tp
from IMC_Denoise.IMC_Denoise_main.DIMR import DIMR
from IMC_Denoise.IMC_Denoise_main.DeepSNF import DeepSNF
from IMC_Denoise.DeepSNF_utils.DeepSNF_DataGenerator import DeepSNF_DataGenerator

### These are adapted from functions from IMC_Denoise

def load_single_img(filename):
    
    """
    Loading single image from directory.
    Parameters
    ----------
    filename : The image file name, must end with .tiff.
        DESCRIPTION.
    Returns
    -------
    Img_in : int or float
        Loaded image data.
    """
    if filename.endswith('.tiff') or filename.endswith('.tif'):
        Img_in = tp.imread(filename).astype('float32')
    else:
        raise ValueError('Raw file should end with tiff or tif!')
    if Img_in.ndim != 2:
        raise ValueError('Single image should be 2d!')
    return Img_in

def load_imgs_from_directory(load_directory,channel_name):
    Img_collect = []
    img_folders = glob(join(load_directory, "*", ""))
    Img_file_list=[]

    print('Image data loaded from ...\n')
    for sub_img_folder in img_folders:
        Img_list = [f for f in listdir(sub_img_folder) if isfile(join(sub_img_folder, f)) & (f.endswith(".tiff") or f.endswith(".tif"))]
        for Img_file in Img_list:
            if channel_name.lower() in Img_file.lower():
                Img_read = load_single_img(sub_img_folder + Img_file)
                print(sub_img_folder + Img_file)
                Img_file_list.append(Img_file)
                Img_collect.append(Img_read)
                break

    print('\n' + 'Image data loaded completed!')
    if not Img_collect:
        print('No such channels! Please check the channel name again!')
        return

    return Img_collect, Img_file_list, img_folders

Using TensorFlow backend.


## 1. Unpack tiff stacks

<font color='red'>input_folder </font> = The folder where the stacked tiff files are. You should be able to just copy and paste the whole .ome.tiff folder that the Bodenmiller pipeline creates after it has extracted the tiff files from the MCD files. This folde also contains the .csv panel files, copy those too! Any panorama files will also be in the same folder, but they won't be used here.

<font color='red'>unstacked_output_folder </font> = Where the 'unstacked' tiff files will be stored. They will be unpacked into a single folder per ROI.

<font color='red'>use_panel_files </font> = If you are using the Bodenmiller pipeline, leave this as True. It will use the .csv panel files for each ROI to properly label the unpacked channels with their metal tags and antigen targets, and will create a file called <font color='blue'>ROI_data.csv</font> which will store all the information

In [2]:
# The folder containing the 
input_folder = 'tiff_stacks' 
unstacked_output_folder = 'tiffs'
use_panel_files=True

#################################################
#################################################

# Make output directories if they don't exist
input_folder = Path(input_folder)
output = Path(unstacked_output_folder)
output.mkdir(exist_ok=True)

# Setup a blank dataframe ready to add to
if use_panel_files:
    #roi_data = pd.DataFrame(columns=['panel_filename','channel_name','channel_label','filename','folder','fullstack_path'])
    roi_data = pd.DataFrame(columns=['channel_name','channel_label'])
# Get a list of all the .tiff files in the input directory
tiff_files = list(input_folder.rglob('*.tiff'))

for roi_count,i in enumerate(tiff_files):
    
    image = tp.imread(i)    
    panel_filename = os.path.splitext(os.path.splitext(os.path.basename(i))[0])[0] + '.csv'
    folder_name = os.path.splitext(os.path.basename(i))[0]

    tiff_folder_name = os.path.splitext(os.path.basename(i))[0]    
    output_dir = Path(unstacked_output_folder,tiff_folder_name)
    output_dir.mkdir(exist_ok=True)        
    
    if use_panel_files:

        panel_df = pd.read_csv(join(input_folder, panel_filename))
        panel_df['fullstack_path'] = copy(str(i))       
        panel_df['panel_filename']=panel_filename
        panel_df['folder']=folder_name
        roi_data = pd.concat([roi_data, panel_df], sort=True)
    
    for channel_count in range(image.shape[0]):
        
        if use_panel_files:
            
            panel_df['filename']=copy(str(channel_count)).zfill(2)+"_"+str(roi_count).zfill(2)+"_"+panel_df['channel_name']+"_"+panel_df['channel_label'].astype(str)+".tiff"
            tp.imwrite(join(output_dir, panel_df.loc[channel_count,'filename']), image[channel_count])
        else:
            file_name=copy(str(channel_count)).zfill(2)+"_"+str(roi_count).zfill(2)+".tiff"
            tp.imwrite(join(output_dir, file_name), image[channel_count])        
        
if use_panel_files:
    roi_data.to_csv('ROI_data.csv')
    
    all_data_channels = roi_data.dropna().channel_label.unique().tolist()
    n = len(all_data_channels)
    print(f'The following {n} channels were detected, and will be used if process_all_channels=True in the next step... \n')
    print(roi_data.dropna().channel_label.unique().tolist())

The following 56 channels were detected, and will be used if process_all_channels=True in the next step... 

['SMAa', 'GFAP', 'iba1', 'PanCytokeratin', 'S100B', 'EGFR', 'CD68', 'ki67', 'Neurocan', 'Fibronectin', 'SOX10', 'Brevican', 'CD44', 'Versican', 'PanLaminin', 'CD31', 'HIF1a', 'NeuN', 'CD109', 'OLIG2', 'CollagenIV', 'NG2', 'V0V1', 'GLUT1', 'Syndecan1', 'TCIRG1', 'pNFkB', 'TNC', 'CAIX', 'cMyc', 'pERK', 'TMEM119', 'SOX2', 'HepSul', 'CS56', 'MCT4', 'DNA1', 'DNA3', 'MHCI', 'CD14', 'CD16', 'CD11b', 'CDK4', 'YKL40', 'CD11c', 'CD24', 'VISTA', 'CD206', 'PTEN', 'Nestin', 'CD74', 'Met', 'P2YR12', 'CD163', 'HLADR', 'PDGFRa']


## 2. Run DeepSNF training and image denoising


<font color='red'>raw_directory </font> = This should be the same as **'unstacked_output_folder'** above - where the unstacked images were stored, with each ROI being a folder containing all its images.

<font color='red'>processed_output_dir </font> = The folder where the processed images will be stored. They will be in the same format as above - each ROI its own folder containing all its images.

<font color='red'>process_all_channels </font> = If left as **True**, it will go through the channels identified above. Only works if you have panel files from the Bodenmiller pipeline, if not you will need to specify exactly which channels to process.

<font color='red'>specific_channels </font> = If process_all_channels is False, you specify exactly which channels to process here, e.g. if you only want to process a couple.

#### Deep SNF settings

These all have accompanying explanations, and can mostly be left alone. Ones you may want to change include...

<font color='red'>train_batch_size </font> You may need to reduce this to work on a GPU (e.g. to 32), or increase if you have a very good GPU setup.

In [None]:
raw_directory = "tiffs" # change this directory to your Raw_image_directory.
processed_output_dir = 'processed'
process_all_channels = True
specific_channels = []

#################################################
######  Deep SNF settings
#################################################
train_epoches = 50 # training epoches, which should be about 200 for a good training result. The default is 200.
train_initial_lr = 1e-3 # inital learning rate. The default is 1e-3.
train_batch_size = 64 # training batch size. For a GPU with smaller memory, it can be tuned smaller. The default is 256.
pixel_mask_percent = 0.2 # percentage of the masked pixels in each patch. The default is 0.2.
val_set_percent = 0.15 # percentage of validation set. The default is 0.15.
loss_function = "I_divergence" # loss function used. The default is "I_divergence".
loss_name = None # training and validation losses saved here, either .mat or .npz format. If not defined, the losses will not be saved.
weights_save_directory = None # location where 'weights_name' and 'loss_name' saved.
# If the value is None, the files will be saved in a sub-directory named "trained_weights" of  the current file folder.
is_load_weights = False # Use the trained model directly. Will not read from saved one.
lambda_HF = 3e-6 # HF regularization parameter


# Create folders
processed_output_dir = Path(processed_output_dir)
processed_output_dir.mkdir(exist_ok=True)


if process_all_channels:
    channels = all_data_channels
else:
    channels = specific_channels

    
for channel_name in tqdm(channels):

    if 'generated_patches' in globals():
        del generated_patches    
    
    n_neighbours = 4 # Larger n enables removing more consecutive hot pixels. 
    n_iter = 3 # Iteration number for DIMR
    window_size = 3 # Slide window size. For IMC images, window_size = 3 is fine.

    DataGenerator = DeepSNF_DataGenerator(channel_name = channel_name, n_neighbours = n_neighbours, n_iter = n_iter, window_size = window_size)
    generated_patches = DataGenerator.generate_patches_from_directory(load_directory = raw_directory)
    print('The shape of the generated training set is ' + str(generated_patches.shape) + '.')

    deepsnf = DeepSNF(train_epoches = train_epoches, 
                      train_learning_rate = train_initial_lr,
                      train_batch_size = train_batch_size,
                      mask_perc_pix = pixel_mask_percent,
                      val_perc = val_set_percent,
                      loss_func = loss_function,
                      weights_name = "weights_"+str(channel_name)+".hdf5",
                      loss_name = loss_name,
                      weights_dir = weights_save_directory, 
                      is_load_weights = is_load_weights,
                      lambda_HF = lambda_HF)

    # Train the DeepSNF classifier 
    train_loss, val_loss = deepsnf.train(generated_patches)

    # Load all images
    Img_collect, Img_file_list, img_folders = load_imgs_from_directory(raw_directory, channel_name)

    # Save resulting images
    for i, img_file_name, folder in zip(Img_collect, Img_file_list, img_folders):
        
        #Perform both the hot pixel and shot noise 
        Img_DIMR_DeepSNF = deepsnf.perform_IMC_Denoise(i, n_neighbours = n_neighbours, n_iter = n_iter, window_size = window_size)

        #Gets the ROI folder name from the path
        roi_folder_name = Path(folder).parts[1]

        #Makes sure the output folder name exists for this ROI
        Path(join(processed_output_dir, roi_folder_name)).mkdir(exist_ok=True) 
        
        #The output file is named the same as the input file
        save_path = join(processed_output_dir, roi_folder_name, img_file_name)      

        #Save the denoised file
        tp.imsave(save_path,Img_DIMR_DeepSNF.astype('float32'))

  0%|                                                                                           | 0/56 [00:00<?, ?it/s]

Image data loaded from ...

tiffs\15_07_22_biomax_cell_ID_matrix_GBM_s0_a1_ac.ome\01_00_Y89_SMAa.tiff
tiffs\15_07_22_biomax_cell_ID_matrix_GBM_s0_a2_ac.ome\01_01_Y89_SMAa.tiff
tiffs\15_07_22_biomax_cell_ID_matrix_GBM_s0_a3_ac.ome\01_02_Y89_SMAa.tiff
tiffs\15_07_22_biomax_cell_ID_matrix_GBM_s0_a4_ac.ome\01_03_Y89_SMAa.tiff
tiffs\15_07_22_biomax_cell_ID_matrix_GBM_s0_a5_ac.ome\01_04_Y89_SMAa.tiff
tiffs\15_07_22_biomax_cell_ID_matrix_GBM_s0_a6_ac.ome\01_05_Y89_SMAa.tiff

Image data loaded completed!
The generated patches augmented.
The generated patches shuffled.
The shape of the generated training set is (13520, 64, 64).
The range value to the corresponding model is 354.9147918701172.
Input Channel Shape => (13520, 64, 64, 1)
Number of Training Examples: 11492
Number of Validation Examples: 2028
Each training patch with shape of (64, 64) will mask 8 pixels.
Training model...
Epoch 1/50
  4/180 [..............................] - ETA: 39:00 - loss: 0.6846

## *Optional* - Assessing performance of denoise
At this stage, you can check that the before and afters for the denoising for each channel

<font color='red'>channels </font> List of channels you want to compare before and after denoising

<font color='red'>colourmap </font> Colourmap for images

<font color='red'>dpi </font> Resolution of generated images

<font color='red'>save </font> Save images (or not). Will be saved as channel.png

<font color='red'>do_all_channels </font> Will process for all channels with data as identified above

In [None]:
channels=['Sma']
colourmap ='jet'
dpi=300
save=True
do_all_channels=False
hide_images=False
#################################################
#################################################

if do_all_channels:
    channel_list=all_data_channels
else:
    channel_list=channels


for channel_name in channel_list:

    raw_Img_collect, raw_Img_file_list, raw_img_folders = load_imgs_from_directory(raw_directory, channel_name)
    pro_Img_collect, pro_Img_file_list, pro_img_folders = load_imgs_from_directory(processed_output_dir, channel_name)

    fig, axs = plt.subplots(len(raw_Img_collect), 2, figsize=(10, 5*len(raw_Img_collect)), dpi=dpi)

    count = 0
    for r_img,p_img in zip(raw_Img_collect,pro_Img_collect):
        im1= axs.flat[count].imshow(r_img, vmin = 0, vmax = 0.5*np.max(r_img), cmap = colourmap)
        divider = make_axes_locatable(axs.flat[count])
        cax = divider.append_axes('right', size='5%', pad=0.05)
        fig.colorbar(im1, cax=cax, orientation='vertical')    
        count=count+1

        im2 = axs.flat[count].imshow(p_img, vmin = 0, vmax = 0.5*np.max(p_img), cmap = colourmap)
        divider = make_axes_locatable(axs.flat[count])
        cax = divider.append_axes('right', size='5%', pad=0.05)
        fig.colorbar(im2, cax=cax, orientation='vertical')    
        count=count+1 

    fig.savefig(channel_name+'.png')
    
    if hide_images:
        fig.set_visible(not fig.get_visible())
        plt.draw()
    

## 3. Reassemble TIFF stacks

At this point, we want to reassemble the invidiual images back into stacks so we can put them back into the Bodenmiller pipeline, replacing the ones originally generated. You may want to keep backups of the unprocessed tiffs!

<font color='red'> **By default, this pipeline will use all the processed image! If you only want to use some of the images, then manually assemble the individual TIFFs in the folders ready to be restacked**</font>


<font color='red'>restack_input_folder </font> = This should be the same as **'processed_output_dir'** above - where the processed images were stored, with each ROI being a folder containing all its images.

<font color='red'>restack_input_folder </font> = Where the processed and now restacked images should be place.

In [None]:
#Specify input and output folder
restack_input_folder = 'processed'
restacked_output_folder = 'processed_stacks'

#################################################
#################################################


# Make output directories if they don't exisit
restack_input_folder = Path(restack_input_folder)
output = Path(restacked_output_folder)
output.mkdir(exist_ok=True)

# Get a list of paths of ROI folder
Img_folders = glob(join(restack_input_folder, "*", ""))

print('Savings stacks...')
for i in tqdm(Img_folders):
    
    tiff_files = list(Path(i).rglob('*.tiff'))
    
    image_stack=[]
    
    for file in tiff_files:
        im = tp.imread(file).astype('float32')
        image_stack.append(im)

    image_stack = np.asarray(image_stack)
        
    save_path=join(restacked_output_folder, Path(i).parts[1]+".tiff")

    tp.imsave(save_path,image_stack.astype('float32'))
