# <center> Introduction à la bioinformatique clinique </center> 

# <center> Exercise week 7: Solution </center> 

# Image analysis - Segmentation of cancer images - Part 2

Last week we worked on Image Preprocessing and today we are ready to do Image Segmentation and refine the post-processing of the segmented images obtained from thresholding and analyze them to extract useful statistics.

# A - Import packages

Before starting, we need to import the packages required to run the code.

In [None]:
#IMPORTS

#data analysis 
import numpy as np
import pandas as pd



#data plotting packages
import matplotlib.pyplot as plt
import seaborn as sns


from random import sample

import warnings
warnings.simplefilter(action='ignore', category=Warning)



# B - Load and visualize preprossed grayscale images 

## B1 - Load images

In [None]:
from skimage import io

path_grayscale_images = './grayscale_images_preprocessed'

number_of_grayscale_images = 2

grayscale_images = [] 

for i in range(number_of_grayscale_images):
    
    im_path = f'{path_grayscale_images}/grayscale_image_preprocessed_{i}.png'
    
    grayscale_image = io.imread(im_path)
    grayscale_images.append(grayscale_image)

In [None]:
grayscale_images[0]

**NOTE:** In this case the grayscale images were loaded with normalized pixel intensities, meaning that the pixel values are real numbers in the interval \[0,1\]. You can recover the original (8-bit) intensities by multiplying the images by 255. 

## B2 - Histogram of  pixel intensity values

In [None]:
def plot_histogram(image, title):
    """Plot histogram of an image."""
    fig, ax = plt.subplots(1, 2, figsize=(12, 4))
    
    ax[0].imshow(image, cmap='gray')
    ax[0].set_title(f"Preprocessed Image {title}")
    ax[0].axis('off')


    ax[1].hist(image.ravel(), bins=50, range=(0, 255),edgecolor = 'gray',color='white')
    ax[1].set_xlabel(f'Pixel Intensity'),ax[1].set_ylabel(f'Number of Pixels')
    ax[1].set_title(f'Histogram for Image {title}')

    plt.tight_layout()
    plt.show()




In [None]:
# Plot each i,age 
idx = 0
for image in grayscale_images:

    # Plot histogram
    plot_histogram(image, f"{idx}")
    idx+=1
    




# C - Image Segmentation

In this part, we are going to do a basic segmentation process named [Thresholding](https://en.wikipedia.org/wiki/Thresholding_%28image_processing%29) (check last week exercise) and a more advanced method called Edge Detection. 

## C1 - Thresholding
Thresholding is used to convert continuously grayscale images into bi-colored images. Pixel value ranges are delimited by *thresholds*, whence the name of the method. Since the cells are darker than the background, in our case we want to distinguish between darker and lighter areas of the image. This means that we will need a single threshold to do the job.

In our case, defining a threshold = X means that all of the pixels that have an intensity value smaller than X will become black and the ones having a bigger value will become white.

Below we plot the the histograms of the grayscale images with the chosen threshold for each image and the thresholded version of these images. Since the cells are darker than the background we expect to see them in black.



In [None]:
def threshold_image(image,thr,title):

    threshold = thr
    
    # Apply threshold
    binary_image = np.where(image > threshold, 1, 0).astype(np.uint8)
    #print(binary_image)
    """Plot histogram of an image."""
    fig, ax = plt.subplots(1, 2, figsize=(12, 4))
    
    ax[0].hist(image.ravel(), bins=50, range=(0, 255),edgecolor = 'gray',color='white')
    ax[0].set_title(f'Histogram of Image {title}')
    
    # Marking the threshold on the histogram
    ax[0].axvline(threshold, color='red', linestyle='--')     

    ax[1].imshow(binary_image, cmap='gray')
    ax[1].set_title(f"Thresholded Image {title} with thr = {threshold}")
    ax[1].axis('off')

    plt.tight_layout()
    plt.show()    
    return binary_image

In [None]:

# From the previous histograms, try to find the most satisfactory threshold for each image. 

# Fill the list below `Thresholds` (Don't spend more time than 2 minutes).

Thresholds = [XX,XX] ####


idx = 0
binary_images = []
for image in grayscale_images:
    # Check min and max values of the image before thresholding
    print(f"Image {idx}: min value = {image.min()}, max value = {image.max()}")

    binary_image = threshold_image(image,Thresholds[idx],f'{idx}')
    binary_images.append(binary_image)
    idx+=1

**(1) Is there an optimal threshold ? Discuss your results ?**


## C2 - Edge detection

Edge detection is a fundamental technique in image processing used to identify points in an image where the brightness changes sharply. These points typically correspond to the boundaries of objects within the image. Unlike thresholding, which segments an image based solely on pixel intensity values, edge detection focuses on the transitions between different regions, allowing us to pinpoint the exact outlines of structures.

In our context, detecting the edges of a cell can provide a more precise segmentation than thresholding alone. This is because edge detection algorithms are designed to highlight areas with significant intensity changes, which often correspond to the borders of cells.

Edge detection works by analyzing the image for abrupt changes in intensity:

- **Gradient Calculation**: The algorithm computes the gradient of the image intensity at each pixel, which measures how quickly the intensity changes.
- **Edge Localization**: Pixels where the gradient magnitude exceeds a certain threshold are considered part of an edge.

Several common edge detection methods include **Canny Edge Detector**: A multi-stage algorithm that provides good detection and localization of edges while minimizing the response to noise.

Below, we will apply the Canny edge detection algorithm to our grayscale images. This method will help us accurately identify the boundaries of the cell, facilitating more precise segmentation than thresholding can achieve.


In [None]:
#image processing
from skimage.measure import regionprops, label
from skimage import feature, morphology
from skimage.segmentation import watershed
from scipy import ndimage as ndi
import matplotlib.pyplot as plt

from skimage import measure


# Function to keep only the inner circles before filling
def keep_inner_circle(edges,min_size,max_size):
    # Step 1: Label the connected components in the edge-detected image
    labeled_edges = label(edges)

    # Step 2: Measure properties of labeled regions
    props = measure.regionprops(labeled_edges)
    
    # Step 3: Create an empty image to store the inner circles
    inner_circle_mask = np.zeros_like(edges)
    
    # Step 4: Iterate over each detected region and filter based on size
    for prop in props:
        # Filter regions based on size or area; adjust the thresholds to your needs
        if (prop.area > min_size) | (prop.area < max_size):  # Adjust based on the expected size of the inner circles
            # Keep the smaller region (inner circle)
            inner_circle_mask[labeled_edges == prop.label] = 1

    return inner_circle_mask



# Perform edge detection and process inner circles before filling
def segment_by_edges(image,min_size=100,max_size=200):
    # Step 1: Apply Canny edge detection
    edges = feature.canny(image, sigma=0.2)  # Adjust sigma to tune the sensitivity

    # Step 2: Keep only the inner circles before filling
    inner_circles = keep_inner_circle(edges,min_size,max_size)
    # Step 3: Fill in the inside of the inner circles (cells)
    filled_inner_circles = ndi.binary_fill_holes(inner_circles)



    return  edges, filled_inner_circles


edges_images = []
segmented_edges_images = []
for image in grayscale_images:
    edges_img, segmented_edges_img = segment_by_edges(image,min_size=XX,max_size=XX)
    edges_images.append(edges_img)
    segmented_edges_images.append(segmented_edges_img)




In [None]:
# Plot the results for comparison
fig, ax = plt.subplots(2, 3, figsize=(12, 8))

for idx in range(len(grayscale_images)):
    ax[idx, 0].imshow(grayscale_images[idx], cmap='gray')
    ax[idx, 0].set_title(f'Image {idx}')

    ax[idx, 1].imshow(edges_images[idx], cmap='gray')
    ax[idx, 1].set_title(f'Edges Detected for Image {idx}')
    
    ax[idx, 2].imshow(segmented_edges_images[idx], cmap='gray')
    ax[idx, 2].set_title(f'Segmented Image {idx}')
    

plt.tight_layout()
plt.show()


## C3 - Clean up the segmentation

- In this part we are going to remove small undesired elements while preserving larger shapes. the function accepts a size parameter, which is the minimum size (in pixels) of accepted objects (the cells in our case).


In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
from skimage.measure import label



def remove_contours(image,th=30):
    # Convert the image to binary (0 and 1)
    binary_image = image.astype(np.uint8)

    # Label connected components using skimage
    labeled_image = label(binary_image, connectivity=1)  
    num_features = labeled_image.max()  # Get the number of features

    # Create a mask to store the filtered regions
    filtered_image = np.zeros_like(binary_image)

    # Loop through all connected components and filter them by size
    for region in range(1, num_features + 1):
        # Extract the component
        component = (labeled_image == region)

        # Remove small/thin components that are likely to be contours
        if np.sum(component) > th:  # Adjust this threshold as needed
            filtered_image[component] = 1

    return filtered_image


# Try different thresholds for each image 
segmented_images = []
size_threshold = [XX,XX]
for i in range(len(segmented_edges_images)):
    segmented_images.append(remove_contours(segmented_edges_images[i],th=size_threshold[i]))




In [None]:
# Plot the results for comparison
fig, ax = plt.subplots(2, 3, figsize=(12, 8))

for idx in range(len(grayscale_images)):
    ax[idx, 0].imshow(grayscale_images[idx], cmap='gray')
    ax[idx, 0].set_title(f'Image {idx}')

    ax[idx, 1].imshow(segmented_edges_images[idx], cmap='gray')
    ax[idx, 1].set_title(f'Edges Detected for Image {idx}')
    
    ax[idx, 2].imshow(segmented_images[idx], cmap='gray')
    ax[idx, 2].set_title(f'Clean Segmented Image {idx}')
    

plt.tight_layout()
plt.show()

# D - Features Extraction 

We might want to extract some properties of the nuclei, such as their size, perimeter, circularity or density in the image. We can do this using `skimage.measure.regionprops`. This function allows to easily extract some measurements, including nuclei features such as area or perimeter, bounding boxes, and intensity statistics.

## D1 - From sementic segmentation to instance segmentation

Now we are ready to label the connected components of this image. This means that each object (nucleus in our case) will be assigned a unique number. To do this we can use the [`skimage.measure.label()`](https://scikit-image.org/docs/dev/api/skimage.measure.html) function.

In [None]:
# define function change_label
def change_label(label_image):
    
    # define a dictionnary countaining change_label_dict[0]=0 (the not nuclei object remains labeled 0)
    change_label_dict = {0:0}
    
    # find all the labels
    unique_labels = list(np.unique(label_image))
    
    # number of label
    number_of_unique_labels = len(unique_labels)
    
    # do a random sample of the number from 0 to number_of_nuclei - 1
    random_sample = sample(unique_labels[1:], number_of_unique_labels - 1)
    
    # iterate on the number from 0 to number_of_nuclei-1
    for index in range(number_of_unique_labels - 1):

        # associate one keys to one new value
        change_label_dict[index + 1] = random_sample[index]

    # change the labels of label_im but respect the distribution
    return(np.vectorize(change_label_dict.get)(label_image))

In [None]:
def labeling_img(image):
    # Apply distance transform and watershed to separate touching objects
    distance = ndi.distance_transform_edt(image)
    local_maxi = morphology.local_maxima(distance)
    markers = label(local_maxi,connectivity=2)
    
    #Watershed segmentation to separate touching objects
    labels_ws = watershed(-distance, markers, mask=image)
    
    return  labels_ws


labeled_images = []
for img in segmented_images:
    labeled_images.append(labeling_img(img))

In [None]:
# Plot the results for comparison
fig, ax = plt.subplots(2, 3, figsize=(12, 12))

for idx in range(len(grayscale_images)):
    ax[idx, 0].imshow(grayscale_images[idx], cmap='gray')
    ax[idx, 0].set_title(f'Image {idx}')

    ax[idx, 1].imshow(segmented_images[idx], cmap='gray')
    ax[idx, 1].set_title(f'Segmented Image  {idx}')
    
    ax[idx, 2].imshow(change_label(labeled_images[idx]), cmap='nipy_spectral')
    ax[idx, 2].set_title(f'Labeled Image {idx}')
    

plt.tight_layout()
plt.show()


**(2) What is the difference between a sementic segmentation and an instance segmentation?**



**(3) What is the difference between `connectivity = 1` and `connectivity = 2`?**



## D2 - Counting Cells per Image

In [None]:

# count the nuclei for each image
cells_number = [len(np.unique(labels)) - 1 for labels in labeled_images]

# define a figure
fig, ax = plt.subplots(figsize = (5, 5))

# plot the barplot
ax.bar(x = ['image 0', 'image 1'],
       height = cells_number)

# define ylabel
ax.set_ylabel('Number of Cells')

# and xlabel
ax.set_xlabel('Images')

plt.show()

**(4) Why is `len(np.unique(labels)) - 1` the number of cells?**



## D3 - Extract Object properies 


#### D3 - 1) Area - Perimeter - Circularity 

In [None]:
# Compute region properties
dfs = []
idx = 0
for img in labeled_images:
    regions = regionprops(img)
    labels = [region.label for region in regions]
    area = [region.area for region in regions]
    perimeter = [region.perimeter for region in regions]
    circularities = [(4 * np.pi * region.area) / (region.perimeter ** 2) for region in regions]
    df = pd.DataFrame({
        'Label':labels,
        'Perimeter':perimeter,
        'Area': area,
        'Circularity': circularities,
        'Image_id':[f'image_{idx}' for i in range(len(regions))]
    })
    dfs.append(df.set_index('Label'))
    idx+=1

table_info = pd.concat(dfs,keys=[f"image_{k}" for k in range(len(labeled_images))])

table_info.head()


In [None]:
# create 3 figures
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(10,5),
                         facecolor = 'w')

# plot a boxplot with seaborn
sns.boxplot(data = table_info[["Image_id", "Area"]],
                x = "Image_id",
                y = "Area",ax=axes[0])

# define ylabel
axes[0].set_ylabel("Size of cell")

# and xlabel
axes[0].set_xlabel('Images')

# plot a boxplot with seaborn
sns.boxplot(data = table_info[["Image_id", "Perimeter"]],
                x = "Image_id",
                y = "Perimeter",ax=axes[1])

# define ylabel
axes[1].set_ylabel("Perimeter  value of cell")

# and xlabel
axes[1].set_xlabel('Images')  
 
# plot a boxplot with seaborn
sns.boxplot(data = table_info[["Image_id", "Circularity"]],
                x = "Image_id",
                y = "Circularity",ax=axes[2])

# define ylabel
axes[2].set_ylabel("Circularity of cell")

# and xlabel
axes[2].set_xlabel('Images')   
# optimise the distance between the figures
plt.tight_layout()

### D3 - 2) Cells Density  

In [None]:

idx = 0
for label_img in labeled_images:

    
    # Compute the total area of all objects
    total_object_area = sum(region.area for region in regionprops(label_img))
    
    # Compute the total area of the image
    total_image_area = label_img.size
    
    # Calculate density
    density = total_object_area / total_image_area
    print(f'Cells density in Image {idx} = {density}')
    idx+=1
    



## E - Case Study

In this part we will analyse a some portions of a slide taken from the same patient. You can find the entire slide [here](https://portal.gdc.cancer.gov/files/96fdca5b-86cb-4209-931e-986f4964e4b0), but for the purpose of this exercise we already cropped four regions of this image that represent different parts of the tumor and we did the required segmnetation. You can find these images in the `case_study` folder.

The goal will be to automatically detect cell nuclei in an image, distinguishing them from everything else (mainly cytoplasm). We will then use this information to quantify different features of different tumor regions.

- Use what you have see in section D to do the required analysis

In [None]:

path_images = './case_study'
number_of_images= 4
case_study_images = [] 
case_study_segmented = []
for i in range(number_of_images):
    image = io.imread(f'{path_images}/processed_grayscale_image_{i}.png')
    print(image.shape)
    case_study_images.append(image)
    image = io.imread(f'{path_images}/segmented_image_{i}.png')
    case_study_segmented.append((image > 128).astype(np.uint8))

fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(20, 15))
for i, ax in enumerate(axes):
    ax.imshow(case_study_images[i],cmap='gray')
    ax.set_title(f"Image {i}")
    ax.axis('off')



fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(20, 15))
for i, ax in enumerate(axes):
    ax.imshow(case_study_segmented[i],cmap='gray')
    ax.set_title(f"Image {i}")
    ax.axis('off')

### E1 - Sementic segmentation

In [None]:
################



#   WRITE YOUR CODE HERE




################

### E2 - (A) Counting Nuclei per Image

In [None]:
################



#   WRITE YOUR CODE HERE




################

### E2 - (B) Region properties

**Area - Perimeter - Circularity** 

In [None]:
################



#   WRITE YOUR CODE HERE




################


In [None]:
################



#   WRITE YOUR CODE HERE




################

### E2 - (C) Nuclei Density


In [None]:
################



#   WRITE YOUR CODE HERE




################

**- Interpret the results**