# Computer vision and deep learning - Laboratory 6

In this last laboratory, we will switch our focus from implementing and training neural networks to developing a machine learning application.
More specifically you will learn how you can convert your saved torch model into a more portable format using torch script and how you can create a simple demo application for your model.



In [1]:
%pip install gradio
%pip install torchscript

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.3.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement torchscript (from versions: none)
ERROR: No matching distribution found for torchscript

[notice] A new release of pip is available: 23.3.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Converting your model into portable TorchScript binaries


``TorchScript`` allows you to create serializable and optimizable models from PyTorch code and then use them in a process where there is no Python dependency.


When deploying our module in production systems, we might need to run the model using another programming language (not Python) and even on mobile or embedded devices. In addition, we need a more lightweight environment than the development one.


Until now, when training a model we've saved checkpoints and reloaded the weights when needed into the development environment. As the name suggests, the checkpoints contain additional information (such as optimizer states) which allows you to resume the training process. However, all this information is not required during inference.




``Torchscript`` allows you to create a lightweight and independent model artifact suitable for runtime via two different techniques: scripting and tracing. They are both used to convert a PyTorch model into a more optimized or deployable form.




Tracing involves capturing a model's execution trace by passing example inputs through the model and recording the operations executed. This creates a TorchScript representation of the model based on the traced operations. However, tracing might not capture all dynamic aspects of the model, especially if the model's behavior changes dynamically based on input data or control flow operations. Tracing is more focused on capturing the specific operations executed with example inputs, which might be more efficient but might not cover all dynamic behaviors of complex models.




Scripting, on the other hand, refers to converting a PyTorch model (built using PyTorch's dynamic computation graph with Python control flow, such as loops and if statements) into a TorchScript. This involves representing the model as a static computation graph that can be executed independently of Python. Scripting allows the model to be saved and run in environments where a Python interpreter might not be available. Scripting captures the entire model logic and can handle more complex models with Pythonic control flow, making it more flexible but potentially more complex.


Both techniques aim to transform PyTorch models into TorchScript representations, making them efficient for deployment in various environments or for optimized execution, albeit with different approaches. The choice between scripting and tracing depends on the specific use case, model complexity, and deployment requirements.

You can check out the [documentation](https://pytorch.org/docs/stable/jit.html) for further details on ``TorchScript``.



Below you have an example that demonstrates the conversion of a pre-trained ResNet-18 model from torchvision into a TorchScript and then loading and using the saved TorchScript model for inference:

In [1]:
import torch
import torchvision.models as models


model = models.resnet18(pretrained=True)
model.eval()

# Create a sample input tensor (change according to your model's input requirements)
example_input = torch.randn(1, 3, 224, 224)

# Script the model
scripted_model = torch.jit.script(model)

# Save the scripted model to a file
scripted_model.save("scripted_resnet18.pt")

KeyboardInterrupt: 

The main steps of the process are:
- load the pre-trained model and set it to evaluation mode with model.eval().
- create a sample input tensor (example_input) that matches the expected input shape of the model.
- use ```torch.jit.script()``` to convert the model into a TorchScript representation.
- save the scripted model to a file using ```scripted_model.save()``` for later use or deployment.

Now, let's see how you can use the scripted model:

In [31]:
import torch
from PIL import Image
from io import BytesIO
import requests
import torchvision
from torchvision.models import  ResNet18_Weights
from torchvision import transforms



# Load the saved TorchScript model
model = torch.jit.load("scripted_resnet18.pt")


preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


image_url = 'https://images.unsplash.com/photo-1611267254323-4db7b39c732c?q=80&w=1000&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxzZWFyY2h8M3x8Y3V0ZSUyMGNhdHxlbnwwfHwwfHx8MA%3D%3D'
response = requests.get(image_url)
image = Image.open(BytesIO(response.content)).convert('RGB')


input_tensor = preprocess(image)
input_batch = input_tensor.unsqueeze(0)  # Add batch dimension


with torch.no_grad():
    # run the scripted model
    output = model(input_batch)

weights = ResNet18_Weights.DEFAULT

class_names = weights.meta["categories"]

# Get the top 5 predictions
probabilities = torch.nn.functional.softmax(output[0], dim=0)
top5_prob, top5_catid = torch.topk(probabilities, 5)

# Display top 5 predicted classes and their probabilities
for i in range(top5_prob.size(0)):
    class_idx = top5_catid[i].item()
    print(f"Prediction: {class_names[class_idx]}, Probability: {top5_prob[i].item():.4f}")

Prediction: tabby, Probability: 0.7649
Prediction: tiger cat, Probability: 0.1408
Prediction: Egyptian cat, Probability: 0.0876
Prediction: Persian cat, Probability: 0.0024
Prediction: lynx, Probability: 0.0013


Optionally, you can also save the torchscript binary into ```wandb```. In this way, you will have a connection link between the model that is running in production and the training runs that you logged during training.

# Creating a simple UI with gradio


[Gradio](https://www.gradio.app/docs/interface) is an open-source Python library used for creating customizable UI components for machine learning models with just a few lines of code. It greatly simplifies the process of building web-based interfaces to interact with ML models without requiring extensive knowledge of web development and allows you to quickly build an MVP and get feedback from the users.


To get an application running, you just need to specify three parameters:
1. the function to wrap the interface around.
2. what are the desired input components?
3. what are the desired output components?


This is achieved through the ``gradio.Interface`` class, the central component in gradio, responsible for creating the user interface for your machine learning model.


```
import gradio as gr
demo = gr.Interface(fn=image_classifier,
                    inputs="image",
                    outputs="label")


```


Once you've defined the gr.Interface, the launch() method is used to start the interface, making it accessible through a web browser.


```
demo.launch()
```


When the launch method is called, ```gradio``` launches a simple web server that serves the demo. If you specify ```share=True``` when calling the launch function, ```gradio``` will create a public link Can also be used to create a public link used by anyone to access the demo from their browser.


## Simple UI for image classification in gradio

Below you have an example of how you could use ```gradio``` to create a simple UI for an image classification problem.

In [32]:
import numpy as np
import gradio as gr

CLASSES = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

def softmax(x):
    return(np.exp(x - np.max(x)) / np.exp(x - np.max(x)).sum())


def classify_image(img):
    # TODO run a classification model to get the class scores
    prediction = softmax(np.random.randn(10, ))
    confidences = {CLASSES[i]: float(prediction[i]) for i in range(len(CLASSES))}
    return confidences

ui = gr.Interface(fn=classify_image,
             inputs=gr.Image(),
             outputs=gr.Label(num_top_classes=3),
             # TODO replace example1.png example2.png with some images from your device
            examples=['D:\\University-Projects\\Semester5\\Computer Vision and Deep Learning\\lab6\\elephant.jpg']
          )
ui.launch()

  from .autonotebook import tqdm as notebook_tqdm


Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




## Accessing the webcam with gradio

In the example below, you have an example in which you take the input images from your webcam.
The function wrapped by gradio uses a mask to blur the input image outside that mask. If you plan to do background blurring, the mask could be the segmentation mask predicted by your model.



In [15]:
import cv2
import gradio as gr
import numpy as np

def blur_background(input_image):
    input_image = cv2.cvtColor(input_image, cv2.COLOR_RGB2BGR)

    # Generate a blank mask
    # TODO your code here: call a segmentation model to get predicted mask
    mask = np.zeros_like(input_image)

    # for demo purposes, we are going to create a random segmentation mask
    #  just a circular blob centered in the middle of the image
    center_x, center_y = mask.shape[1] // 2, mask.shape[0] // 2
    cv2.circle(mask, (center_x, center_y), 100, (255, 255, 255), -1)

    # Convert the mask to grayscale
    mask_gray = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
    mask_gray = mask_gray[:,:,np.newaxis]

    # apply a strong Gaussian blur to the areas outside the mask
    blurred = cv2.GaussianBlur(input_image, (51, 51), 0)
    result = np.where(mask_gray, input_image, blurred)

    # Convert the result back to RGB format for Gradio
    result = cv2.cvtColor(result, cv2.COLOR_BGR2RGB)
    return result


ui = gr.Interface(
    fn=blur_background,
    inputs=gr.Image(sources=["webcam"]),
    outputs="image",
    title="Image segmentation demo!"

)
ui.launch()

Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.




Traceback (most recent call last):
  File "/Users/soranaaa/Documents/ubb/third-year/compvis/lab1/env/lib/python3.11/site-packages/gradio/queueing.py", line 489, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/soranaaa/Documents/ubb/third-year/compvis/lab1/env/lib/python3.11/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/soranaaa/Documents/ubb/third-year/compvis/lab1/env/lib/python3.11/site-packages/gradio/blocks.py", line 1533, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/soranaaa/Documents/ubb/third-year/compvis/lab1/env/lib/python3.11/site-packages/gradio/blocks.py", line 1151, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/so

## Laboratory assignment


Now you have all the knowledge required to build your own ML semantic segmentation application.


1. First use ```torchscript``` to obtain a model binary.
2. Using gradio, create a simple application that uses the semantic segmentation that you developed. Feel free to define the scope and the functional requirements of your app.
3. __[Optional, independent work]__ Use a serverless cloud function on [AWS Lambda](https://aws.amazon.com/lambda/) (this requires an account on Amazon AWS and you need to provide the details of a credit card) to run the prediction and get the results.


Congratulations, you've just completed all the practical work for Computer Vision and Deep Learning!
May your data always be clean, your models accurate, and your code bug-free!





In [2]:
import torch
import torch.nn as nn

class UNet(nn.Module):
    def __init__(self):
        super(UNet, self).__init__()
        # Encoder
        self.ec11 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.ec12 = nn.Conv2d(64, 64, kernel_size=3, padding=1)
        self.mp1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.ec21 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.ec22 = nn.Conv2d(128, 128, kernel_size=3, padding=1)
        self.mp2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.ec31 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.ec32 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
        self.mp3 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.ec41 = nn.Conv2d(256, 512, kernel_size=3, padding=1)
        self.ec42 = nn.Conv2d(512, 512, kernel_size=3, padding=1)
        self.mp4 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.ec51 = nn.Conv2d(512, 1024, kernel_size=3, padding=1)
        self.ec52 = nn.Conv2d(1024, 1024, kernel_size=3, padding=1)

        # Decoder
        self.upconv4 = nn.ConvTranspose2d(1024, 512, kernel_size=2, stride=2)
        self.dc41 = nn.Conv2d(1024, 512, kernel_size=3, padding=1)
        self.dc42 = nn.Conv2d(512, 512, kernel_size=3, padding=1)

        self.upconv3 = nn.ConvTranspose2d(512, 256, kernel_size=2, stride=2)
        self.dc31 = nn.Conv2d(512, 256, kernel_size=3, padding=1)
        self.dc32 = nn.Conv2d(256, 256, kernel_size=3, padding=1)

        self.upconv2 = nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2)
        self.dc21 = nn.Conv2d(256, 128, kernel_size=3, padding=1)
        self.dc22 = nn.Conv2d(128, 128, kernel_size=3, padding=1)

        self.upconv1 = nn.ConvTranspose2d(128, 64, kernel_size=2, stride=2)
        self.dc11 = nn.Conv2d(128, 64, kernel_size=3, padding=1)
        self.dc12 = nn.Conv2d(64, 64, kernel_size=3, padding=1)

        self.final_conv = nn.Conv2d(64, 3, kernel_size=1)

    def forward(self, x):
        # Encoder
        ec11_out = nn.ReLU(inplace=True)(self.ec11(x))
        ec12_out = nn.ReLU(inplace=True)(self.ec12(ec11_out))
        pool1_out = self.mp1(ec12_out)

        ec21_out = nn.ReLU(inplace=True)(self.ec21(pool1_out))
        ec22_out = nn.ReLU(inplace=True)(self.ec22(ec21_out))
        pool2_out = self.mp2(ec22_out)

        ec31_out = nn.ReLU(inplace=True)(self.ec31(pool2_out))
        ec32_out = nn.ReLU(inplace=True)(self.ec32(ec31_out))
        pool3_out = self.mp3(ec32_out)

        ec41_out = nn.ReLU(inplace=True)(self.ec41(pool3_out))
        ec42_out = nn.ReLU(inplace=True)(self.ec42(ec41_out))
        pool4_out = self.mp4(ec42_out)

        ec51_out = nn.ReLU(inplace=True)(self.ec51(pool4_out))
        ec52_out = nn.ReLU(inplace=True)(self.ec52(ec51_out))

        #Decoder 
        upconv4_out = self.upconv4(ec52_out)
        cat4 = torch.cat([ec42_out, upconv4_out], dim=1) 
        dc41_out = nn.ReLU(inplace=True)(self.dc41(cat4))
        dc42_out = nn.ReLU(inplace=True)(self.dc42(dc41_out))

        upconv3_out = self.upconv3(dc42_out)
        cat3 = torch.cat([ec32_out, upconv3_out], dim=1) 
        dc31_out = nn.ReLU(inplace=True)(self.dc31(cat3))
        dc32_out = nn.ReLU(inplace=True)(self.dc32(dc31_out))

        upconv2_out = self.upconv2(dc32_out)
        cat2 = torch.cat([ec22_out, upconv2_out], dim=1) 
        dc21_out = nn.ReLU(inplace=True)(self.dc21(cat2))
        dc22_out = nn.ReLU(inplace=True)(self.dc22(dc21_out))
        
        upconv1_out = self.upconv1(dc22_out)
        cat1 = torch.cat([ec12_out, upconv1_out], dim=1) 
        dc11_out = nn.ReLU(inplace=True)(self.dc11(cat1))
        dc12_out = nn.ReLU(inplace=True)(self.dc12(dc11_out))

        final_out = self.final_conv(dc12_out)

        return final_out

model = UNet()

In [5]:
checkpoint = torch.load("")
model.load_state_dict(checkpoint)
# model_state_dict = model.state_dict()
# for key in checkpoint.keys():
#     if key in model_state_dict:
#         model_state_dict[key] = checkpoint[key]
# model.load_state_dict(model_state_dict)
# model.eval()  


FileNotFoundError: [Errno 2] No such file or directory: ''

In [7]:
from PIL import Image
import torchvision.transforms as v2

transformations = v2.Compose([
    v2.Resize(256),
    v2.CenterCrop(224),
    v2.ToTensor(),
])

def preprocess_image(image):
    image = Image.fromarray(image)

    image = transformations(image)

    image = image.unsqueeze(0)
    return image

In [8]:
def predict_segmentation(model, image):
    model.eval()
    with torch.no_grad():
        prediction = model(image)
    
    _, predicted_mask = torch.max(prediction, dim=1)
    predicted_mask = predicted_mask.squeeze(0)  

    return predicted_mask

In [9]:
def visualize_segmentation(predicted_mask):
    label_colors = np.array([(255, 0, 0),  
                             (0, 255, 0), 
                             (0, 0, 255)])

    colored_mask = label_colors[predicted_mask]

    return colored_mask

In [10]:
import torch.nn as nn
import torch.nn.functional as F
import gradio as gr
import numpy as np
from PIL import Image
from torchvision.transforms import v2

model = UNet()
model.load_state_dict(torch.load("model_epoch_10.pth")) 

def test(input_image):
    print(input_image)
    return input_image

def inference(input_image):

    print(0)
    input_image = preprocess_image(input_image)

    print(1)
    predicted_mask = predict_segmentation(model, input_image)
    
    print(2)
    segmentation_image = visualize_segmentation(predicted_mask.numpy())

    return segmentation_image

iface = gr.Interface(
    fn=inference,
    inputs=gr.Image(sources=["webcam"]),
    outputs="image",
    live=True,
    title="Real-time Image Segmentation (Background, Hair, Face)"
)

iface.launch()


  from .autonotebook import tqdm as notebook_tqdm


Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




0
1
2
0
1
2
0
1
2
0
1
2
0
1
2
