## Do you see what I see? Using PyTorch Deep Learning Models

In this short project I will start diving into Deep Learning pretrained models for object detection and image classification. I will use mainly PyTorch, but I will bring MATLAB and Simulink for augmented capabilities when exploring ways to improve the model.

In [1]:
import os
import numpy as np
import cv2  #pip install opencv-python
from PIL import Image, ImageDraw

### 1) Load an image and display it
Let's start with a very simple image on a street that includes several objects. Let's display the image first: 

In [2]:
# Load an image with OpenCV
cv2_image = cv2.imread("image_street.jpg")

# Convert the image to grayscale
grayscale_image = cv2.cvtColor(cv2_image, cv2.COLOR_BGR2GRAY)

# Resize the image
resized_image = cv2.resize(grayscale_image, (224, 224))

# Convert back to RGB for display
resized_image = cv2.cvtColor(resized_image, cv2.COLOR_GRAY2RGB)

# Display the processed image using PIL
processed_image = Image.fromarray(resized_image)
processed_image.show()

### 2) Use a pretrained PyTorch model to detect and classify objects in the image
I'd like to use first a pretrained model to detect and label all the objects in the image. I've heard of Mask-R-CNN ResNet 50 as an architecture that can help me with this: 

In [3]:
import torch
from torchvision import models, transforms

# Load the pre-trained Mask R-CNN model
model = models.detection.maskrcnn_resnet50_fpn(pretrained=True)

model.eval()

# Image transformation
transform = transforms.Compose([
    transforms.ToTensor(),
])

# Load the processed image
image_tensor = transform(processed_image).unsqueeze(0)

# Perform inference
with torch.no_grad():
    predictions = model(image_tensor)

# Draw bounding boxes
draw = ImageDraw.Draw(processed_image)
for element in predictions[0]['boxes']:
    draw.rectangle([(element[0], element[1]), (element[2], element[3])], outline="red", width=3)

# Show the image with bounding boxes
processed_image.show()

  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


### 3) Use a pretrained image classification model so we obtain one label for the image
In my previous test I realized that I'm not looking to do detection, but rather image classification. This will be faster and I'll obtain just one prediction per image. I'll test ResNet 50, and trace the model so it will be optimized using just-in-time compilation:

In [4]:
import torch
from torchvision import models, transforms
from PIL import Image
import requests

# Load the pre-trained ResNet50 model
model = models.resnet50(pretrained=True)
model.eval()

# Define image transformation: resize, center crop, normalize, and convert to tensor
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(
       mean=[0.485, 0.456, 0.406],
       std=[0.229, 0.224, 0.225]
    ),
])

# Load the image (local file or URL)
image_path = "image_street.jpg" 
image = Image.open(image_path)

# Apply the transformations to the image
image_tensor = transform(image).unsqueeze(0) # Add batch dimension

# Perform inference with the model
with torch.no_grad():
    outputs = model(image_tensor)

# Get the predicted class index
_, predicted_idx = outputs.max(1)

# Load ImageNet labels (if you don't have a local file, you can use an online resource)
LABELS_URL = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
labels = requests.get(LABELS_URL).text.splitlines()

# Print the top predicted label
predicted_label = labels[predicted_idx.item()]
print(f"Predicted Class: {predicted_label}")

# Create a temp tensor for tracing
temp_input = torch.randn(1,3,224,224)
#Trace the model
traced_model = torch.jit.trace(model,temp_input)

traced_model.save("resnet50_traced.pt")

Predicted Class: traffic light


### 4) Let's bring MATLAB for additional exploration and capabilities

In [5]:
import matlab.engine

In [6]:
# Get the MATLAB Engine session that you can use to connect

eng = matlab.engine.start_matlab('-desktop')

As a quick test of the engine, let's determine the greatest common denominator of two numbers, use the gcd function. Set nargout to return the three output arguments from gcd.

In [7]:
t = eng.gcd(100.0,80.0,nargout=3)
print(t)

(20.0, 1.0, -1.0)


I can also open the documentation from here at any point:

In [8]:
eng.doc(nargout=0)

I can also call apps and tools from the MATLAB ecosystem. For example, let's take a look at the Deep Network Designer to examine the architecture of the ResNet 50 traced model I saved earlier:

In [None]:
eng.deepNetworkDesigner(nargout=0)

I can also access the MATLAB workspace and manipulate variables from my Python code:

In [None]:
import numpy as np
from PIL import Image
image = Image.open("image_street.jpg")
eng.workspace["image_array"] = np.array(image)

Now, rather than converting the PyTorch model, I'd like to test a model directly in Simulink. First, let's load the model I want to use, for instance the MNASNet model which has been optimized to balance between accuracy and latency:

In [None]:
# Load the pre-trained mnasnet model
model = models.mnasnet1_0(pretrained=True)

# Original model
torch.save(model,'mnasnet1_0.pt')

I could start with an empty Simulink model, or I could use the documentation example I just saw, and edit it as needed:

In [None]:
import os
cwd = os.getcwd()
eng.openExample("deeplearning_shared/ClassifyImagesPyTorchModelPredictBlockExample","workDir",cwd,nargout=0)

Call either the exit or the quit function to stop the MATLAB engine. 

In [None]:
eng.quit()

This is just the start. Once I'm more familiar with transfer learning to expand the use of the pretrained models, I could use other sources of data for model training.