<a href="https://colab.research.google.com/github/SJCAAT/cv_workshop/blob/main/workshop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **Computer Vision Workshop I&D 2025**
Welcome to the workshop brought to you by members of the Computer Vision Capability group!

In this hands-on session, we'll explore how computers can "see" the world using machine learning techniques. You'll get an introduction to Computer Vision, learn how object detection works and use a powerful, real-time model called **YOLOv8 (You Only Look Once version 8)** to detect objects in images and video.

In this workshop we will be focussing on two main things:

**1) Image classification**

Here we implement a pre-trained model to classify images. You will have to import the necessary libraries and ensure the code refers to the correct images to get the model to work.

**2) Live detection**

After completing the image classification task, we will take it a step further by fine-tuning our model and using this for real-time object detection using our webcams.


By the end of this workshop you'll be able to:
- Understand basic concepts in computer vision.
- Perform object detection with YOLOv8 using the [Ultralytics] library.
- Run inference on static images and live webcam feeds.
- Explore how to fine-tune YOLOv8 on your own dataset.

Whether you're a beginner or have some experience with machine learning, this workshop is designed to be approachable and practical. 

Lets get started!

# 1) Image Classification
First we start with importing all the necessary libraries. Many of these are simply to present images and use the webcam in a notebook format, however the main library we are interested in is Ultralytics. Without this we do not have access to a pre-trained model.

In [None]:
!pip install ultralytics
from ultralytics import YOLO

In [None]:
from io import BytesIO
from PIL import Image as pil_img
from IPython.display import Image, display, Javascript, update_display
from google.colab.output import eval_js
from google.colab import files # for uploading images to test on
from base64 import b64decode
import numpy as np

Now that we have imported all the necessary modules from various libraries we can start with loading a test image. The cell below allows you to upload any image you have stored on your local system. Download any image from the internet and use the code below to upload it.
Alternatively, you may choose one of our sample images. We have put up a couple of images in the Github space that you can download to your local machine and use that to upload instead.

In [None]:
# Upload your own image
uploaded = files.upload()
image_path = list(uploaded.keys())[0]
print(f"Uploaded image: {image_path}")

In our import statements we import a function named 'Image' from the PIL library as 'pil_img' which allows us to display images. In order to actually read the image, this function uses the ".open()" command. This command takes an image as input.

In the cell below, implement this function using the 'image_path' that you created in the previous cell.

Note: you only need to fill in the first line ('img = .......'). The second command ('img') is then used to display what you just entered.

In [None]:
# Assign the image to a variable
img = pil_img.
# Display your image
img

If all went well you should see your test image above. With our test image ready we can use a pre-trained YOLO model to classify objects within that image.

We start with choosing the model type. In this session we will be using a pre-trained YOLOv8 model from the ultralytics library. We need to start by initializing the model. Use the cell below as a starting point and complete it appropriately:

Hint: You need to load the pre-trained weights for YOLO v8, specifically the 'yolov8n.pt' file using the YOLO() command.

In [None]:
# Load a pretrained YOLOv8 model
model = 'your_code_here'

Now that we have seen our image and loaded our pretrained model. Lets use this model to do object recognition on our selected image. 
In the cell below, apply the model to our chosen image using the 'results' variable.

We then immediately visualize the results with the command: 'results[0].show()'

Hint: instead of using pil_image.open() on our image_path, we now use model().

In [None]:
# Apply the model to the image
results = 'your_code_here'

# Visualize the model predictions
results[0].show()

Hooray! If we got this far it means you've successfully implemented a YOLO model to classify a static image.
If there is spare time, feel free to test with some more images that you can find on the web! Remember to change the image_path accordingly!

# 2) Live Detection

Now that we've seen the very basics of computer vision, lets take it a step further.

In this scenario we are going to attempt to improve the model we just used on our initial images, and prepare it for live detection!

In order to fine-tune the model we must have a dataset of images that we can train on. Fortunately for us, plenty of these exist and we can simply use the built-in Coco128 dataset (coco128.yaml). Due to our time constraints we limit the number of epochs (a complete pass through the entire training dataset) to three.

Using the same model as before, use the '.train' command on the coco128.yaml dataset for 3 epochs.

Hint: the '.train' method is a function that initiates the training process for the model. It can take several arguments but in our case we are only interested in the "data" and "epochs" arguments.
 - The 'data' argument takes a string as input
 - the 'epochs' argument takes an integer as input.



In [None]:
# Train the model on the coco128 dataset
model.train(data='', epochs=)

The beauty of YOLO is that we dont have to search for the best model ourselves. Instead, it automatically stores the best weights it found during training giving us easy access.

You can inspect the results of training in the 'runs/detect/train/results.csv'. 

If you notice that you have a high loss and a low mAP50 this is an indication that training dit not go well and the model may not have actually improved. 

We assign our 'fine_tuned_model' to use the weights from our training process. The best weights can be found in the 'runs/detect/train/weights/best.pt' folder. 

If the results look promising, use the cell below to select the best weights so we can apply it for live detection!

If you notice a high loss and a low mAP50 and there is enough time, retrain the model and select the weights from the second training run and adjust the path in the cell below accordingly (runs/detect/train2/weights/best.pt). 
If there is not enough time to retrain, you may simply use the original pre-trained model we used in part 1.

Hint: instead of using the pretrained weights like before: YOLO('yolov8n.pt'), we adjust it to include our trained weights instead.

In [None]:
# Select the best weights found in training
fine_tuned_model = YOLO('your_code_here')

Now that we (hopefully) have our model fine-tuned, we can use it for live object detection using our webcams!

The cells below may look a bit daunting, however we do not expect you to fully understand what is going on below. This is mainly to make sure the webcam works in the notebook version. You may inspect the code if you want but we have filled in everything so that it should work by simply executing the cells.


In [None]:
def take_photo(display_id: str, quality: float = 0.8) -> bytes:
    # js snippet to capture frame from webcam
    js = Javascript('''
        async function takePhoto(quality) {
            const div = document.createElement('div')
            const video = document.createElement('video')
            const stream = await navigator.mediaDevices.getUserMedia({video:true});

            div.appendChild(video);
            video.srcObject = stream;
            await video.play();
                    
            const canvas = document.createElement('canvas');
            canvas.width = video.videoWidth;
            canvas.height = video.videoHeight;
            canvas.getContext('2d').drawImage(video, 0, 0);
            stream.getVideoTracks()[0].stop();
            div.remove();
            return canvas.toDataURL('image/jpeg', quality);
        }
    ''')

    # evaluate js and retrieve returned binary image
    display(js, display_id=display_id)
    data = eval_js('takePhoto({})'.format(quality))
    binary = b64decode(data.split(',')[1])
    return binary

In [None]:
def infer_image(model: YOLO, binary_img: bytes) -> np.array:
    # Run inference on a binary image
    img = np.array(pil_img.open(BytesIO(binary_img)))
    results = model(img, verbose=False)
    return results[0].plot(pil=True)

NOTE: If the training did not go well and your fine_tuned_model's performance is not an improvement, simply change the selected model in the cell below to be the pre-trained version we used on our initial image.

in line 4 where we see: model = fine_tuned_model, change 'fine_tuned_model' to 'YOLO('yolov8n.pt')

In [None]:
# update the same display
display_id = 'sample_display'
# load your fine tuned model. If you are using the pre-trained model without fine tuning - change to: model = YOLO('yolov8n.pt')
model = fine_tuned_model 

while True:
    try:
        # read a frame from the webcam
        binary_img = take_photo(display_id='sample_display_2')
        # run inference on the frame
        result = infer_image(model, binary_img=binary_img)

        # show the frame with the inference results
        display(result, display_id=display_id)
    except Exception as err:
        # show error if user does not have a webcam or did not grant page permission
        print(str(err))