# Image Segmentation - Scene Understanding

# HW3 - Segmentation

### Come up with Story
    0. You will work in groups of three. Come up with a team name.
    
    1. You are given powerful segmentation model, that simulates human annotator labelling the listed classes below
    
    2. Come up with the interesting (specific use-case) segmentation application on the images you can produce with your phone or gather from internet
    
    3. State in which situation it can be used, i.e. robot navigating through terrain, catching escaped animals from zoo, safari, collecting food from table etc.
    
    4. It can be related to your thesis as well
    

### Design Model
    1. You are given MobileNetV3 as a segmentation architecture. You can use whichever you want, this is just recommended as it is running realtime on gpu and can be tested on cpu 

    2. You will train the model (you can use pretrained weights on different scenes) on your collected image data
    
    3. Split the data to training part and testing part and validate your model on testing part in terms of IoU from Teacher and visualized outputs
    
    4. Specify some unique scenarios for testing and show loss values and final segmentation on these cases
        - Discuss how it fails or succeded
        - Try to explain why and what you help to improve the performance on these cases

### Things that can help you
    - Strong Regularization
        1. Weight decay in torch optimizer
        2. Data augmentation
        3. Using pretrained model
        4. More training data from unique scenarios
        5. (Advanced) self-supervised pre-training
        
    - Server GPUs 
        1. Taylor and Cantor - ssh username@taylor.felk.cvut.cz or ssh username@cantor.felk.cvut.cz
        2. Video Tutorial in server.mp4
        3. Text guide on: https://cyber.felk.cvut.cz/cs/study/gpu-servers/
        
### Final Presentation
   
    
    1. Describe Idea in sheets: https://docs.google.com/spreadsheets/d/1rvsg9ZgzmXiVJsiJvnpsy-yQ7N5WqADn10NL235eC1M/edit?usp=sharing
    
    2. Evaluation will be given on the day of presentations - 17./18.12. based on your parralel 
    
### Evaluation

    1. Idea and preparation of data (Unique scenarios, Useful Teacher outputs, amount of training samples)
    
    2. Training - loss minimization, Validity of approach, Tweaks to training (Regularizations, Augmentation, ...)
    
    3. Examples and output overview and discussion
    
    4. Discussion of training times and speed of teacher and inference model. Is it sufficiently fast for the application?
    
    5. Presentation clarity and enthusiasm
    
    

 <img src="username2/rgb/a.png" width="480"> <img src="username2/vis/a.png" width="480"> 

# Teacher Model - source of "ground truth" labels but very slow and only achievable with Facebook-level resources

### Classes
- from https://github.com/cocodataset/panopticapi/blob/master/panoptic_coco_categories.json

- Try what you can segment and what not
    - Or Try it directly on server
    - https://segment-anything.com/demo

- See on which data the foundation model was trained on
    - https://cocodataset.org/#explore
    
### Running the teacher model
   - Store images on Taylor server for annotations 
       - <mark>/local/temporary/UROB/segmentation/students/YOUR_TEAM_NAME/rgb/*.png</mark>
   - Load necessary modules and Install libraries locally to the user profile 
       - source <mark>/local/temporary/UROB/segmentation/SEEM/demo_code/install.sh</mark>
   - Run the teacher model
       - python <mark>/local/temporary/UROB/segmentation/SEEM/demo_code/app.py --username YOUR_TEAM_NAME</mark>
       - If you started new ssh session, you need to load modules in install.sh again. You can run first few commands or source it again.

### See the results
   - In <mark>/local/temporary/UROB/segmentation/students/YOUR_TEAM_NAME/seg/</mark>, you can have the segmentation output resized to <mark>512x512</mark> and the label is encoded in grayscale.
   - The value of grayscale correspond with the index of name_list bellow
   - You can access the segmentation class name by: <mark>name_list[i]</mark>, where "i" is the value of pixel
   - In <mark>/local/temporary/UROB/segmentation/students/YOUR_TEAM_NAME/vis/</mark>, you see overlay visualization of the segmentation, that is humanly readable
   
   

<img src="username2/rgb/a.png" width="400"> <img src="username2/seg/a.png" width="400"> <img src="username2/vis/a.png" width="400">

In [None]:
name_list = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush', 'banner', 'blanket', 'bridge', 'cardboard', 'counter', 'curtain', 'door-stuff', 'floor-wood', 'flower', 'fruit', 'gravel', 'house', 'light', 'mirror-stuff', 'net', 'pillow', 'platform', 'playingfield', 'railroad', 'river', 'road', 'roof', 'sand', 'sea', 'shelf', 'snow', 'stairs', 'tent', 'towel', 'wall-brick', 'wall-stone', 'wall-tile', 'wall-wood', 'water-other', 'window-blind', 'window-other', 'tree-merged', 'fence-merged', 'ceiling-merged', 'sky-other-merged', 'cabinet-merged', 'table-merged', 'floor-other-merged', 'pavement-merged', 'mountain-merged', 'grass-merged', 'dirt-merged', 'paper-merged', 'food-other-merged', 'building-other-merged', 'rock-merged', 'wall-other-merged', 'rug-merged']

print(name_list[20])

# Tutorial

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import torch
import torch.nn as nn

# Import the Model

In [None]:
!pip install fastseg --user
from fastseg import MobileV3Small
model = MobileV3Small.from_pretrained()
model

# Visualize Output

In [None]:
from fastseg.image.colorize import colorize, blend

# Open image from file and resize it to lower memory footprint
img = Image.open('image.png').resize((1024, 512))

# Change the class from PIL.Image into numpy array
img_np = np.asarray(img)

# Create torch tensor from numpy array and add dimension representing batchsize. Also change dtype to float as it is required by torch
x = torch.tensor(img_np).unsqueeze(0).float()

# Transpose dimension of tensor so it respects the torch convention: Batch Size x Number of Classes x Height x Width
x = x.permute(0, 3, 1, 2)

# Normalize data
x = (x / 255) * 2 - 1

# Forward pass, input image x and return output probabilities for each pixel and each class along each image in batch size
output = model(x)

# Output in 
print("Following is for the first pixel [0,0] of first image in batch [0]: \n")
print('Logits: \n', output[0,:,0,0], '\n')
print('Probabilities: \n', output.softmax(dim=1)[0,:,0,0], '\n')
print('Prediction: \n', output.argmax(dim=1)[0,0,0], '\n')

In [None]:
# Calculation of final segmentation prediction from class probabilities along dimension 1
# detach.cpu.numpy transfer tensor from torch to computational graph-detached, to cpu memory and to numpy array instead of tensor
seg_np = output.argmax(dim=1)[0].detach().cpu().numpy()

# Function from fastseg to visualize images and output segmentation
seg_img = colorize(seg_np) # <---- input is numpy, output is PIL.Image
blended_img = blend(img, seg_img) # <---- input is PIL.Image in both arguments

# Concatenate images for simultaneous view
new_array = np.concatenate((np.asarray(blended_img), np.asarray(seg_img)), axis=1)

# Show image from PIL.Image class
combination = Image.fromarray(new_array)
# combination.show()

# Input Image - Output 
![alt text](input-output.png "i/o")

# Training Data
- Shuffle the data every training loop to prevent overfitting on examples one by one
- Feed the inputs in batches, so it can see "the most" of the things at once to prevent overfitting
    - you can use https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
- Data Augmentation for extending training samples and therefore catch more configurations of objects
    - https://pytorch.org/vision/stable/transforms.html
    - https://pytorch.org/vision/0.12/auto_examples/plot_transforms.html#sphx-glr-auto-examples-plot-transforms-py

In [None]:
# Already implemented data augmentations in torch vision
import torchvision.transforms as T

# Example of rotating input image to show different "views" on objects
# will rotate the image and the segmentation mask differently!!!
rotater = T.RandomRotation(degrees=(0, 180))
orig_img = torch.from_numpy(np.asarray(Image.open('image.png').resize((1024, 512)))).permute(2, 0, 1).unsqueeze(0).float()

# Just visualization
seg_img = torch.from_numpy(seg_np).unsqueeze(0).float()
Image.fromarray(rotater(seg_img).to(torch.uint8).numpy()[0]).show()
# Just visualization
rotated_img = rotater(orig_img).to(torch.uint8)
Image.fromarray(rotated_img[0].permute(1,2,0).numpy()).show()

In [None]:
# Cross entropy loss module. Basically softmax + negative log-likelihood.
# Softmax chains the output probabilities and set it between values 0-1
# Negative log-likelihood makes the values behaves more smoothly and penalize values like 0.1 much more than 0.9.
# weight argument sets the penalization per-class. When there is too much of the background, the model will overfit on majority class during training (background)
# You should calculate the number of pixels per class to avoid the overfit and specify the ratio in CrossEntropyLoss torch module.
CE = torch.nn.CrossEntropyLoss(reduction="none", weight=None)

# Get final prediction with argmax
labels = output.argmax(dim=1)

 <img src="neg_log.png" width="400"> 

In [None]:
# Initialize model, can be from pretrained version (prefered). Here it is for educational purposes
num_classes = 19
model = MobileV3Small(num_classes=num_classes)

# Set up model to training mode (some layers are designed to behave differently during learning and during inference - batch norm for example.)
# Always learn model in training mode
model.train()

# Set up optimizer to automatically update weights with respect to computed loss and negative of gradient
# Regularization weight decay - analogy with remembering the exam questions
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.0001)

# Multiple iterations (epochs)
for e in range(100):
    
    # Forward pass of model. Input image x and output as per-pixel probabilities, per-image in batch
    # output dimensions: Batch Size x Class probs x H x W
    output = model(x)
    
    # Calculation of Loss function, we use pytorch implementations of Cross entropy (softmax + negative log-likelihood)
    loss = CE(output, labels)
    
    # Print loss and metric Intersection-over-union to monitor model's performance during the training
    # Why there is non-zero loss when learning on the self-produced labels?
    print(f'Epoch: {e:03d}', f'Loss: {loss.mean().item():.4f}')
    
    # This step is the most important. On the backend, Torch will accumulate gradients along the performed operations and keeps it in the memory
    # After calling backward(), the gradients are recomputed for specific forward pass and the model accumulates gradients with respect to the loss
    loss.mean().backward()
    
    # After we compute the gradients from backward(), each weight in the model will have the .grad value.
    # Optimizer will then use the gradient and learning rate to update the weights
    optimizer.step()
    
    # Test if the models has accumulated gradients and therefore "learn something"
    if e == 0:
        print("Gradient in the last layer on specific weights: ", model.last.weight[0,0,0,0])
        
    # Clean already used gradients to start over in the new iteration
    optimizer.zero_grad()

    # Visualization of model's output at every iterations
    seg_np = output.argmax(dim=1)[0].detach().cpu().numpy()
    seg_img = colorize(seg_np)
    seg_img.save(f'overfitting/{e:03d}.png')


# Saving weights
torch.save(model.state_dict(), 'weights/model.pth')

In [None]:
# Loading trained model
model.load_state_dict(torch.load('weights/model.pth', map_location='cpu'))

# Setting model to eval mode
# Always test model in eval mode
model.eval()