# SICOM-SIGMA SEGMENTATION - RECOMPOSITION 2024/2025
In this lab we will process and play around with a famous State-of-the-Art detection model : YOLO (You Look Only Once)
This Lab will be less guided and will challenge your autonomy and curiosity.

## Goals

The expected result is either this Notebook with the final images (The first mission is enough, the 2 other are bonuses) displayed at the end, either a python code file/folder and the pictures zipped together

**First mission:**

Given a dataset both in visible and infrared modalities (10 images each), find and cut out from the images all the people (30 ~ 60) and their bicycles/moto and sent them to Mars ('/data/planetary stages/mars.jpg')

    bonuses:
        - Adapt their respective Hue to match the vibe on Mars
        - Scale their size on the distance, the further, the smaller, to make it more realistic



**Second mission (optional):**

Using the same images segment the cars, motorcycles, trucks and other thermal vehicule and park them properly in the Parking  ('/data/planetary stages/parking.jpg')
    bonus:
        _- Adapt their orientation to the parking spot_

**Last mission (optional):**

Make the found traffic lights watch over all the missclassification and other object on the Moon  ('/data/planetary stages/moon.jpg') 
    
- Cars
- Pedestrians
- Bicycles
- Traffic lights 
- Other object and miss-classification

Recompose 3 new images, using the given (or other if you will) background and the segmented instance scatter in the image
The new images will present **both** modalities fused (some simple fusion methods are proposed thereafter)



## Set-up
For this mission check how to install and use #YOLO @https://github.com/ultralytics/ultralytics
We advice you to use a pretrained version of the model and not train it all by yourself.

The documentation for segmentation can be found here : https://docs.ultralytics.com/guides/isolating-segmentation-objects/#how-do-i-isolate-objects-using-ultralytics-yolo11-for-segmentation-tasks

To facilitate the vizualisation and manipulation of the images / tensor you can download and use this package:

In [None]:
from ImagesCameras import ImageTensor as im
import os
import glob
from ultralytics import YOLO
import cv2

**How to use ImageTensor**

- Create a new Image : ImageTensor(__source__), where __source__ is np.ndarray, torch.Tensor, str path to file...
- Stack 2 images : __image1.hstack(image2)__ or __image1.vstack(image2)__ for respectively horizontal stacking or vertical stacking
- Batch 2 images : __image1.batch(image2)__ or __image1.stack(image2)__, batch dimension will be added
- Display an image or batched images : __image.show(num='name of the window')__
- Change an image Colorspace: __image.COLORSPACE(colormap=None)__, with COLORSPACE in [RGB, GRAY, HSV, HSL, CMYK, XYZ, LAB, LUV, YCbCr], colormap in https://matplotlib.org/stable/gallery/color/colormap_reference.html
- print Image properties: __image.pprint()__
- crop an image: __image.crop([x, y, h, w], center=False)__, with x, y the anchor point (default is top-left, is center is True, the patch is around the center), h height, w width
- Apply a patch on an image: __image.apply_patch(patch, (x, y), in_place=True, center=False, shape=None)__, with a patch (same batch-size and channel) and the anchor point. The flag shape is either a float number to scale the patch or a defined new shape for the patch.

For more specific use, ask us it might exist a method


In [None]:
import torch
import kornia
from kornia.morphology import dilation, erosion
from torch import Tensor
import numpy as np
import torchvision.transforms.functional as TF
import torchvision.utils as vutils



def fusion_overlapping_bbox(results, *args) -> tuple[Tensor, Tensor]:
    masks = results.masks.data.clone().detach()
    bbox = results.boxes.xyxy.clone().detach()
    if len(args) > 0:
        for arg in args:
            if masks is not None:
                masks = torch.cat([masks, arg.masks.data.clone().detach()], dim=0)
                bbox = torch.cat([bbox, arg.boxes.xyxy.clone().detach()])
            else:
                masks = arg.masks.data.clone().detach()
                bbox = arg.boxes.xyxy.clone().detach()

    assert masks is not None
    length = masks.shape[0]
    masks = im(masks, batched=results.masks.data.shape[0]>1)

    kernel = torch.ones(5, 5).to(masks.device)
    masks_dilated = dilation(masks, kernel)
    masks = im(erosion(masks_dilated, kernel))

    connectivity = []

    for idx, mask in enumerate(masks_dilated[:-1]):
        list_idx = (np.argwhere((mask + masks_dilated[idx+1:, 0]).max(-1)[0].max(-1)[0].cpu().numpy()>1)+idx+1).tolist()
        list_idx = list_idx[0] if len(list_idx) > 0 else list_idx
        connectivity.append(list_idx)
    connectivity.append([])
    im_fus = []
    not_fus = []
    for idx in range(length):
        c = connectivity[idx]
        if len(c) > 0:
            new_idx = c.pop(0)
            c = [] if c is None else c
            if new_idx > idx:
                c.append(idx)
                connectivity[new_idx].extend(c)
            else:
                connectivity[new_idx].append(idx)
                im_fus.append(connectivity[new_idx])
        else:
            not_fus.append(idx)

    if len(im_fus) > 0:
        new_boxes = bbox[not_fus]
        list_masks = [masks.extract_from_batch(i) for i in not_fus]
        if len(list_masks)>1:
            new_mask = im(torch.cat(list_masks, dim=0), batched=len(list_masks)>1)
        else:
            new_mask = None
        for indexes in im_fus:
            boxes_fused = torch.tensor([bbox[indexes][:, [0, 2]].min(),  bbox[indexes][:, [1, 3]].min(), bbox[indexes][:, [0, 2]].max(), bbox[indexes][:, [1, 3]].max()])[None]
            new_boxes = torch.cat([new_boxes, boxes_fused.to(new_boxes.device)], dim=0)
            list_new_masks = [masks.extract_from_batch(i) for i in indexes]
            new_mask_add = im(torch.sum(torch.cat(list_new_masks, dim=0), dim=0) > 0, batched=len(list_new_masks)>1)
            if new_mask is not None:
                new_mask = new_mask.batch(new_mask_add)
            else:
                new_mask = new_mask_add
    else:
        new_boxes = bbox
        new_mask = masks
    return new_boxes, im(new_mask)

In [None]:
# Load the data
path_data = os.getcwd() + '/data/'
images_visible_pathes = sorted(glob.glob(path_data + 'visible/*.jpg'))
images_infrared_pathes = sorted(glob.glob(path_data + 'infrared/*.jpg'))

images_visible = im.batch(*[im(f) for f in images_visible_pathes])
images_infrared = im.batch(*[im(f) for f in images_infrared_pathes])

mars = im(path_data + 'planetary stages/mars.jpg')
parking = im(path_data + 'planetary stages/parking.jpg')
moon = im(path_data + 'planetary stages/moon.jpg')

**Simple Fusion Methods**

Average with colormaped infrared:
- Transform the IR grayscale image in a 3 channels image : image_infrared_colored = image_infrared.RGB('gray')
- Average the two modalities in a new image : fused_image = image_color/2 + image_infrared_colored/2

Average in a new colorspace representation:
- Transform the visible image into a new colorspace representation: image_visible_LAB = image_visible.LAB()  (Works also with LUV, HSV, HSL)
- Fuse the gray scale images in the L (or V in LUV and HSV) channel: fused_image_LAB = image_visible_LAB // fused_image_LAB[:, 0] = image_visible_LAB[:, 0]/2 + image_infrared[:, 0]/2
- Transform back the result into the RGB colorspace: fused_image = fused_image_LAB.RGB()

Or propose your own method if you prefer !

In [None]:
# Visualize the data
images_visible.pprint()
images_infrared.pprint()
images_visible.show(num='Visible images')
images_infrared.show(num='Infrared images')
mars.show()
# moon.show()
# parking.show()


# --------------------------------- Image Layout -------------------------------- #
Modality: Visible
Image size: 1024 x 1280 (height x width)
Channel names: Red | Green | Blue
Pixel format: RGB | 3 x 8 bits
Batch size: 10
Layers: batch x channels x height x width || 10 x 3 x 1024 x 1280
# -------------------------------------------------------------------------------- #

# --------------------------------- Image Layout -------------------------------- #
Modality: Any
Image size: 1024 x 1280 (height x width)
Channel names: Any
Pixel format: GRAY | 1 x 8 bits
Batch size: 10
Layers: batch x channels x height x width || 10 x 1 x 1024 x 1280
# -------------------------------------------------------------------------------- #



<Axes: >

### First fusion method :

In [None]:
images_infrared_colored = images_infrared.RGB('gray')
print(images_infrared_colored.shape)

fused_images_1 = images_visible/2 + images_infrared_colored/2

fused_images_1.shape

torch.Size([10, 3, 1024, 1280])


torch.Size([10, 3, 1024, 1280])

### Second fusion method :

In [None]:
images_visible_LAB = images_visible.LAB()
print(images_visible_LAB.shape)

fused_images_LAB = images_visible_LAB

fused_images_LAB[:, 0] = images_visible_LAB[:, 0]/2 + images_infrared[:, 0]/2

fused_images_2 = fused_images_LAB.RGB()

fused_images_2.shape

torch.Size([10, 3, 1024, 1280])


torch.Size([10, 3, 1024, 1280])

### Visualisation :

In [None]:
fused_images_1.show(num='Fused 1 images')
fused_images_2.show(num='Fused 2 images')

In [None]:
fused_image_1 = im(fused_images_1[0])
fused_image_2 = im(fused_images_2[0])


fused_image_1.show(num='Fused 1 image')
fused_image_2.show(num='Fused 2 image')

<Axes: >

### Comparison :

In [None]:
print(f"Standard deviation of fused image method 1 : {torch.std(fused_images_1)}")
print(f"Standard deviation of fused image method 2 : {torch.std(fused_images_2)}")
print('\n')

# MSE between a linear combinasion of both visible and infrared images :

e_1 = torch.mean( (fused_images_1 - (images_visible + images_infrared_colored))**2 )
e_2 = torch.mean( (fused_images_2 - (images_visible + images_infrared_colored))**2 )

print(f"MSE fused image method 1 : {e_1}")
print(f"MSE fused image method 2 : {e_2}")

Standard deviation of fused image method 1 : 0.12405599653720856
Standard deviation of fused image method 2 : 0.12017931044101715


MSE fused image method 1 : 0.05619974061846733
MSE fused image method 2 : 0.0647156685590744


# Modélisation

In [None]:
fused_images_2 = torch.Tensor(fused_images_2)


# Yolo11n

In [None]:
# Load a model
model_1 = YOLO('yolo11n.pt')

dict_classes = {v: k for k, v in model_1.names.items()}

results_1 = model_1.predict(fused_images_2, show=True)


0: 1024x1280 6 persons, 1 handbag, 324.8ms
1: 1024x1280 5 persons, 2 bicycles, 2 cars, 1 motorcycle, 324.8ms
2: 1024x1280 7 persons, 1 car, 1 skateboard, 324.8ms
3: 1024x1280 4 persons, 5 motorcycles, 1 umbrella, 324.8ms
4: 1024x1280 5 persons, 4 traffic lights, 324.8ms
5: 1024x1280 5 persons, 3 traffic lights, 324.8ms
6: 1024x1280 7 persons, 1 car, 324.8ms
7: 1024x1280 (no detections), 324.8ms
8: 1024x1280 6 persons, 324.8ms
9: 1024x1280 1 person, 1 bicycle, 1 car, 324.8ms
Speed: 0.0ms preprocess, 324.8ms inference, 19.0ms postprocess per image at shape (1, 3, 1024, 1280)


**We can start the segmentation of all the people and bicycles from the different images**

In [None]:
wanted_classes = ['person', 'bicycle', 'motorcycle']

classes_to_detect = [v for k, v in dict_classes.items() if k in wanted_classes]

In [None]:
def retrieve_props_1(results, classes=classes_to_detect, confidence_level=0.9) :
    
    props = []

    for res in results :
        
        img = im(res.orig_img)

        img = im(img) 
            
        for boxe in res.boxes :

            for xyxy, cls, conf in zip(boxe.xyxy.int(), boxe.cls.int(), boxe.conf) :

                if (cls in classes) and (conf.item() > confidence_level) :
                        
                    x_min, y_min, x_max, y_max = xyxy
                    
                    cropped_img = img.crop([x_min, y_min, x_max, y_max], xyxy=True)
                                
                    props.append(cropped_img)
                    
    print(f"Number of relevant character detected {len(props)}")
        
    return props

 - Bon, Yolo11n est pas fait pour la segmentation.....

# Modelisation - Yolo11n-seg

In [None]:
# Load a model
model_2 = YOLO('models/yolo11n-seg.pt')

dict_classes = {v: k for k, v in model_2.names.items()}

results_2 = model_2.predict(fused_images_2, show=True)


0: 1024x1280 6 persons, 396.8ms
1: 1024x1280 7 persons, 3 bicycles, 2 cars, 1 motorcycle, 396.8ms
2: 1024x1280 6 persons, 1 car, 2 skateboards, 396.8ms
3: 1024x1280 6 persons, 3 motorcycles, 396.8ms
4: 1024x1280 5 persons, 3 traffic lights, 1 umbrella, 2 handbags, 396.8ms
5: 1024x1280 2 persons, 3 traffic lights, 396.8ms
6: 1024x1280 6 persons, 1 car, 396.8ms
7: 1024x1280 (no detections), 396.8ms
8: 1024x1280 6 persons, 1 car, 1 traffic light, 396.8ms
9: 1024x1280 1 person, 1 bicycle, 1 car, 1 traffic light, 396.8ms
Speed: 0.0ms preprocess, 396.8ms inference, 41.9ms postprocess per image at shape (1, 3, 1024, 1280)


In [None]:
def retrieve_props_2(results, confidence_level=0.8) :
    count = 0
    props = []
    centers = []
    
    for res in results :
        img = np.copy(res.orig_img)
        
        if res.masks :
            for xy, cls, conf, xyxy in zip(res.masks.xy, res.boxes.cls.int(), res.boxes.conf, res.boxes.xyxy) :
                
                if (cls.item() in classes_to_detect) and (conf.item() > confidence_level) :
                    b_mask = np.zeros(img.shape[:2], np.uint8)
                    contour = xy.astype(np.int32).reshape(-1, 1, 2)
                    cv2.drawContours(b_mask, [contour], -1, (255, 255, 255), cv2.FILLED)

                    mask3ch = cv2.cvtColor(b_mask, cv2.COLOR_GRAY2BGR)
                    isolated = im(cv2.bitwise_and(mask3ch, img))

                    props.append(isolated)
                    x_min, y_min, x_max, y_max = xyxy
                    centers.append( (int((x_max + x_min) / 2), int((y_max + y_min) / 2)) )
                    count += 1
                    
                    
    images = im(torch.stack(props).squeeze(1))

    print(f"Number of relevant props detected : {count}")
    
    single_image = im(torch.sum(images, dim=0))

    return single_image, images, centers

In [None]:
def process_perspective(images, centers, separation = 3) :
    
    _, height, width = images[0].shape
    process_img = []

    seps = [int((height / separation) * i) for i in range(1, separation+1)]

    for img, c in zip(images, centers) :
    
        if (c[1] < seps[0]) :
            
            img = im(img)

            scale_factor = 0.35
            
            scaling_matrix = torch.tensor([
                    [scale_factor, 0, (1 - scale_factor) * width / 2],
                    [0, scale_factor, (1 - scale_factor) * height / 2]
            ], dtype=torch.float32).unsqueeze(0)
            scaled_img = kornia.geometry.transform.warp_affine(
            img, scaling_matrix, (height, width), padding_mode="zeros")
            
            shift_y = int(0.65 * seps[0])
            translation_matrix = torch.tensor(
            [[1, 0, 0], 
            [0, 1, shift_y]], 
            dtype=torch.float32).unsqueeze(0)
            translated_img = kornia.geometry.transform.warp_affine(
                scaled_img, translation_matrix, (height, width), padding_mode="zeros")

            process_img.append(translated_img)
            
        elif (c[1] > seps[0]) and (c[1] < seps[1]) :
            
            img = im(img)        
            scale_factor = 0.5
            scaling_matrix = torch.tensor([
                    [scale_factor, 0, (1 - scale_factor) * width / 2],
                    [0, scale_factor, (1 - scale_factor) * height / 2]
            ], dtype=torch.float32).unsqueeze(0)

            scaled_img = kornia.geometry.transform.warp_affine(
            img, scaling_matrix, (height, width), padding_mode="zeros")
            
            shift_y = int(0.55 * seps[0])  
            translation_matrix = torch.tensor(
            [[1, 0, 0], 
            [0, 1, shift_y]], 
            dtype=torch.float32).unsqueeze(0)
            translated_img = kornia.geometry.transform.warp_affine(
                scaled_img, translation_matrix, (height, width), padding_mode="zeros")
            process_img.append(translated_img)   
                
        elif (c[1] > seps[1]) :
            
            img = im(img)
            
            scale_factor = 0.8
            scaling_matrix = torch.tensor([
                    [scale_factor, 0, (1 - scale_factor) * width / 2],
                    [0, scale_factor, (1 - scale_factor) * height / 2]
            ], dtype=torch.float32).unsqueeze(0)

            scaled_img = kornia.geometry.transform.warp_affine(
            img, scaling_matrix, (height, width), padding_mode="zeros")

            shift_y = int((1-scale_factor)  * seps[1])  
            translation_matrix = torch.tensor(
            [[1, 0, 0], 
            [0, 1, shift_y]], 
            dtype=torch.float32).unsqueeze(0)
            translated_img = kornia.geometry.transform.warp_affine(
                scaled_img, translation_matrix, (height, width), padding_mode="zeros")

            process_img.append(translated_img)
            
    processed_image = im(torch.stack(process_img))   
    processed_image = im(torch.sum(processed_image, dim=0))
    
    return processed_image

In [None]:
def overlay_on_background(foreground_tensor, background_tensor, transparency=0.80):
    height, width = foreground_tensor.shape[2], foreground_tensor.shape[3]

    background_tensor = TF.resize(background_tensor, (height, width))

    foreground = foreground_tensor.squeeze(0)
    foreground_np = (foreground.permute(1, 2, 0).cpu().numpy() * 255).astype(np.uint8)
    foreground_colored = cv2.applyColorMap(foreground_np, cv2.COLORMAP_OCEAN   ) 
    foreground_colored = torch.tensor(foreground_colored).permute(2, 0, 1) / 255.0

    mask = (foreground < 0.04).all(dim=0)

    composite_image = (foreground_colored * transparency + background_tensor * (1 - transparency))
    composite_image[:, mask] = background_tensor[:, mask]  
    return im(composite_image)


In [None]:
image, set_images, centers = retrieve_props_2(results_2)

process_img = process_perspective(set_images, centers)

result_image = overlay_on_background(process_img, mars.squeeze(0))

result_image.show()

vutils.save_image(result_image, "outputs/mars_populated.png")

Number of relevant props detected : 21
