**Authors**:  

- Dr. Chloe Game ([CGame1](https://github.com/CGame1)) – University of Bergen (UiB)
- Dr. Nils Piechaud ([Npiechaud](https://github.com/Npiechaud)) – Institute of Marine Research (IMR)

# Deploying YOLO image classifier

This notebook contains python code to deploy a [YOLO](https://docs.ultralytics.com/tasks/classify/) model to classify a set of images.

To simplify the process for inexperienced python users, this code can be run online in Google Colab - though there are some [caveats](#caveats-of-google-colab). This code can also be run in jupyter notebook / visual studio code. However this requires hardware (GPU) and some additional setup which is not currently covered in this notebook.

> **Note:** This notebook is a supplement to the paper (TBC) that will guide a user through automated image classification with Machine Learning. Please refer to the paper for more details on terminology, methods, results and analysis.
![image](https://github.com/CGame1/Img_classificaton_guide/blob/main/docs/workflow.png?raw=true)
*Figure 1: Simplified and idealized diagram of an ML scenario. Each box represents a key task and corresponds to a section of this paper to aid comprehension.  While presented largely linearly for clarity, real-world ML workflows are often iterative and non-linear and the need to revisit specific sections may vary depending on the scenario.*

## Requirements

To run this notebook, you require the following:

1. A trained model as a .pt file. that can be accessed by colab so it must be online or uploade to the VM
2. Some images to make predictions on. All into one directory that can be accessed by colab.
3. A **Google account** (if using Google Colab) - If you do not have one, you can create one [here](https://accounts.google.com/signup).
4. **Storage space** for saving training/testing outputs - If permitted in Colab, these can be saved to your Google Drive. As a default, Google Drive contains 15GB of free storage.

    - The models themselves are light (~100 of mb max ):
    - The new images can take space. That up to the users
    - Users can also decide to export the images with the prediction overlay which will requires adequate space (duh!)


> **Tip:** It is recommended to first try the notebook with the public data. You can then try with your own data.

<mark>**Next Step:**</mark> You can now proceed to [How to use this notebook](#how-to-use)

## Contents (!!! UPDATE !!! )

- [How to use this notebook](#how-to-use-this-notebook)
- [1. Prepare computer session](#1-prepare-computer-session-run-everytime) - **Run this everytime**
- [2. Load model ](#2-download-data)
- [3. Load & Prepare data](#3-load--prepare-data-run-everytime) - **Run this everytime**
- [4. make and export predictions](#4-training--validation)



## How to use this notebook


1. Click the **Open in Colab** button at the top of this notebook to open the notebook in Google Colab. This will open a new tab in your browser with the notebook loaded. If you are not already signed into Google, you will be prompted to do so.

![image](https://github.com/CGame1/Img_classificaton_guide/blob/main/docs/google_sign_in.png?raw=true)

*Figure 2: Google sign-in warning message.*




2. Follow the instructions in the notebook to prepare your computer session, download the data, load and prepare the data, train & test the model, and evaluate the results.

3. Work linearly, using the play button on the left side of each cell to run it (see [How to run cells](https://www.youtube.com/watch?v=rsBiVxzmhG0)). It is not recommended to run all cells at once, in case of errors. A green tick will appear next to the cell when it has run successfully.




> **Note:** Colab will warn you about running this notebook as it is not authored by Google. This is a helpful reminder to check what public notebooks are doing before running them.

![image](https://github.com/CGame1/Img_classificaton_guide/blob/main/docs/warning_colab.png?raw=true)

*Figure 3: Colab warning message.*




4. To navigate through the notebook, you can use the **Table of Contents** on the left pane to jump to different sections if working in Colab. In Jupyter you can use the internal links in the notebook to jump to specific sections, for example in the **Contents** section above.

<img src="https://github.com/CGame1/Img_classificaton_guide/blob/main/docs/table_contents.PNG?raw=true" width="400">

*Figure 4: Location of Table of Contents in Colab*



5. **Very Important:** The only inputs you **must** make or verify are shown as a ***text field*** or ***option buttons***. Run the code cells below to see examples.  

In [None]:
import ipywidgets as widgets
from IPython.display import display

# Display UI
display(widgets.Text(value="This is a text field", layout=widgets.Layout(width="50%")))

In [None]:
import ipywidgets as widgets
from IPython.display import display

# Create a toggle button widget
toggle = widgets.ToggleButtons(
    options=["Option 1", "Option 2"],
    description="Select:",
    style={"description_width": "initial"},
)

# Display the widget
display(toggle)

> **Tip:** If you run into problems, please refer to the [8. Troubleshooting](#8-troublehooting) section at the end of this notebook.

<mark>**Next Step:**</mark> You can now proceed to [1. Prepare computer session](#prepare-computer-session-run-everytime)

## 1. Prepare computer session **(Run everytime)**

### 1.1 Detect & Enable GPU

This notebook is designed to run on a GPU for fast processing. If you are using Google Colab, this should be automatically detected if any are available for use. Paying for pro version of Colab will give you access to more powerful GPUs.

In [None]:
import torch

if not torch.cuda.is_available():
    print("Warning: Not connected to a GPU! Training will be slower.")
    device = None
else:
    print(f"Connected to GPU: {torch.cuda.get_device_name(0)}")
    device = torch.cuda.current_device()


### 1.2. Install libraries (packages)

Google Colab has many libraries pre-installed. However, some of the packages used in this notebook may not be available. This code will check for the required packages and install them if necessary.

> **Note:** This may take a few minutes

In [None]:
import sys

# #code for colab. If running in jupyter notebook/rstudio/vs code, you will need to link to your own environment with these packages installed
if "google.colab" in sys.modules:
    print("Running in Colab")

    try:
        !pip install ultralytics #this will look like an error outside of colab. Just ignore.

        from google.colab import drive, files

    except:
        print("Error importing ultralytics.")
else:
    print("Not running in Colab")


# load rest of libraries for notebook
import csv
import cv2
import glob
from io import StringIO
from IPython.display import clear_output, display, Markdown
import ipywidgets as widgets
import json

from matplotlib.patches import Patch
from matplotlib import pyplot as plt

%matplotlib inline

import numpy as np
import os
import pandas as pd
from pathlib import Path, PureWindowsPath
from plotly import express as px
import random
import requests
import shutil
from sklearn.metrics import (
    ConfusionMatrixDisplay,
    classification_report,
    confusion_matrix,
    accuracy_score,
    average_precision_score,
    cohen_kappa_score,
    matthews_corrcoef,
)
from sklearn.model_selection import train_test_split, StratifiedKFold
import time
from tqdm.notebook import tqdm
import ultralytics
from ultralytics import YOLO
import urllib.request


### 1.4. Mount G-drive

You will be prompted to sign in to your Google account and grant permission to access your Google Drive. This will allow you to save files to your Google Drive. Next time you can then simply link to Google Drive and continue where you left off.

![image](
https://github.com/CGame1/Img_classificaton_guide/blob/main/docs/gdrive_access.png?raw=true)

*Figure 4: G-drive access warning.*


> **Important:** If you refuse access to Google Drive ***or*** there any issues accessing your Google Drive, all data and notebook outputs will be saved locally in the Colab session. Should you leave the session or if it becomes inactive, you may lose access to your files. You will therefore need to manually download any files you wish to keep.




In [None]:
# check if running in colab. Not relevant if running in jupyter notebook for example
try:
    if "google.colab" in sys.modules:
        # from google.colab import drive

        drive.mount("/content/drive")

        base_path = "/content/drive/MyDrive"

        # check if mounted
        if not os.path.isdir("/content/drive"):
            base_path = os.getcwd()
            print(
                f"Google drive not mounted. Using current working directory:{base_path}"
            )

    else:
        base_path = os.getcwd()
        print(
            f"Not running in Google Colab so Google drive not mounted. Using current working directory: {base_path}"
        )

except:
    base_path = os.getcwd()
    if "google.colab" in sys.modules:
        print(f"Google drive not mounted. Using current working directory:{base_path}")
    else:
        print(
            f"Not running in Google Colab so Google drive not mounted. Using current working directory: {base_path}"
        )


### 1.5. Set default directories

First verify the project name. No need to change this unless you are working on a different project to Schulz Bank. The project name is used to define the base_path and project directory where all files related to the project will be saved. These paths can be changed in section ? if required.

The default directory structure is as follows:

  **ProjectName**
    -   Data
        -   Images
        -   ...




In [None]:
project_name = "shulz_bank"

# Add text field for entering path to train.json
text_input = widgets.Text(value=project_name, layout=widgets.Layout(width="50%"))

# Create a bold label if in fields_to_bold, otherwise normal text
label = widgets.HTML(f"<b>Project name:</b>")

# Display UI
display(widgets.HBox([label, text_input]))

# Get default path
project_name = text_input.value


# Create function to update project name
def update_name(change):
    """Update the project name based on the text input.

    Args:
        change: The change event from the text input widget.

    Returns:
        str: The updated project name.
    """
    # define project name as global variable so it can be used elsewhere
    global project_name

    # Update name to value from the text input widget
    project_name = text_input.value

    return project_name


# Update path
text_input.observe(update_name, names="value")


In [None]:
# Create default paths
path = f"{base_path}/{project_name}"
NEW_img_path = f"{path}/new images/ALL"
weights_dir = f"{path}/weights"
weights_file_name = "best.pt"
weights_file = f"{weights_dir}/{weights_file_name}"


directories = [path, NEW_img_path, weights_dir]
# make the directories if they dont exist
for directory in directories:
    if not os.path.exists(directory):
        os.makedirs(directory)
        print(f"Created directory: {directory}")
    else:
        print(f"Directory already exists: {directory}")


## 2. Get model and make predicions

### 2.1 Download Model

get one from url like hugging face. Use the

In Hugging face the link must end with *?download=true*
in your browser, right-click on hte DL button and select "copy  link address"

load it from your gdrive

load it from your desktop




In [None]:
# paste DL URL here
weights_url = "https://huggingface.co/Npiechaud/Shultz_hab_class/resolve/main/Shulz_bank_yolov8m_cls/weights/best.pt?download=true"

# check if it exists and DL if need be
if not os.path.isfile(weights_file):
    # If the file doesn't exist, download it from the URL
    print(f"File not found: {weights_file}. Downloading...")
    urllib.request.urlretrieve(weights_url, weights_file)
    print(f"File downloaded and saved at: {weights_file}")
else:
    print(f"File already exists: {weights_file}")


load the model and extract names of classes

Alternatively, if you have already downloaded the pt file before or you have loaded it from your local environment:

In [None]:
weights_file = "/content/drive/MyDrive/shulz_bank/weights/best.pt"

In [None]:
# load model
model = YOLO(weights_file)
# model = YOLO("/content/drive/MyDrive/shulz_bank/weights/best - Copy.pt")

# make a table of label names
label_names = pd.DataFrame(model.names.items(), columns=["class", "label_name"])


detect the extension of your images

In [None]:
for image in glob.glob(NEW_img_path + "/*.png"):
    print(image)


### 2.2. Make predictions on New images

link new images directory



In [None]:
save = True  # save the images with the predictions
images_dir_pred = (
    f"{NEW_img_path}_predictions"  # the directory where the predictions will be saved
)
if save:
    # save the results to the save_dir
    if os.path.isdir(images_dir_pred) is False:
        os.mkdir(images_dir_pred)
        print("making directory for predictions")


# make a dataframe to store the labels
labels = pd.DataFrame()
# image = glob.glob(NEW_img_path + "/*.png")[1] # DEBUG just to get the first image

# make predictions on all images in the folder
for image in glob.glob(NEW_img_path + "/*.png"):
    print(image)
    results = model.predict(
        # source, can be a path to a folder
        source=image,
        task="classify",
        # save the images with the bbox drawn?
        save=False,
        # save the confidence of predictions (will appear at the end of the polygon xy string
        save_conf=True,
        # save the predicted masks
        save_txt=True,
    )

    # if save is set to true, the results will be saved in the save_dir
    if save:
        # save the results to the save_dir
        results[0].save(
            image.replace(NEW_img_path, images_dir_pred).replace(".png", "_pred.png")
        )

    # results is a list of results for each image
    df = label_names
    # attach a new column with the score and class
    df["score"] = results[0].probs.cpu().data.tolist()
    # attach column of which class was predicted
    df["class_prediciton"] = label_names["label_name"][results[0].probs.top1]

    # round the scores to 3 decimal places
    df["score"] = df["score"].apply(lambda x: round(x, 3))
    # add the filename (image name)
    df["filename"] = os.path.basename(image)
    df.insert(0, "filename", df.pop("filename"))
    # add the scores to the labels table
    labels = pd.concat([labels, df])
    # reset index
    labels.reset_index(drop=True, inplace=True)

# end loop


<mark>**Next Step:**</mark> If downloads successful, go to [3. Load & Prepare data](#load--prepare-data-run-everytime).

### 2.3 Package and export predictions

1.   List item
2.   List item



run the predictions command and extract predicted class and scores for each image

pack each prediction into a table for easy export


In [None]:
# all predictions

# add a column with model name
labels["model"] = "v8m_noKfold"

# save the path to the detections folders
predictions_dir = PureWindowsPath(str(model.predictor.save_dir)).as_posix()

# save the labels to a csv file
labels.to_csv(predictions_dir + "/class_scores.csv", index=False)

# put the labels table in wide format, keep the other columns
labels_wide = labels.pivot(
    index=["filename", "class_prediciton"], columns="label_name", values="score"
).reset_index()
# arrange by filename
labels_wide = labels_wide.sort_values(by=["filename"]).reset_index(drop=True)

# save the wide labels table
labels_wide.to_csv(predictions_dir + "/class_scores_wide.csv", index=False)


## 7. Manual Export from Colab

Should you need to download files from Colab manually i.e. if there is a reason you cannot add the files to Google Drive. Note that by default this ignores the **data** folder and just downloads the **runs** folder with all the training and testing results. If you wish to export a different folder, ammend the path below.

In [None]:
# import ipywidgets as widgets
# from IPython.display import display

export_folder = f"/content/{project_name}/runs"

# Add text field for entering path to train.json
text_input = widgets.Text(value=export_folder, layout=widgets.Layout(width="50%"))

# Create a bold label if in fields_to_bold, otherwise normal text
label = widgets.HTML(f"<b>Export folder:</b>")

# Display UI
display(widgets.HBox([label, text_input]))

# Get default path
export_folder = text_input.value


# Create function to update project name
def update_folder(change):
    """Update the export_folder based on the text input.

    Args:
        change: The change event from the text input widget.

    Returns:
        str: The updated export_folder.
    """
    # define export_folder as global variable so it can be used elsewhere
    global export_folder

    # Update name to value from the text input widget
    export_folder = text_input.value

    return export_folder


# Update path
text_input.observe(update_folder, names="value")


In [None]:
# from google.colab import drive, files
# import shutil

# #check if running in colab. Not relevant if running in jupyter notebook for example
if "google.colab" in sys.modules:
    # Make sure Google Drive is not mounted
    if not os.path.isdir("/content/drive"):
        print(f"Exporting {os.path.basename(export_folder)}.zip")

        # make zip file
        shutil.make_archive(
            os.path.basename(export_folder), "zip", root_dir=export_folder
        )

        # download
        files.download(f"{os.path.basename(export_folder)}.zip")

        print("Finished!")

    else:
        print("Google drive mounted. Your files should be there.")
else:
    print("Not running in Colab. Your files should be saved locally.")

## 8. Troublehooting

### 8.1. General

> **Tip:** When encountering errors, you should not need to alter any code. Likely something has not run properly and a file/variable is missing or you have not got the correct paths.


| Error    | Solution |
|----------|----------|
| Missing variables e.g. "base_path" is not defined"   | Re-run Sections [1](#prepare-computer-session-run-everytime) & [2](#3-load--prepare-data-run-everytime) where they are defined|
| Training Failed | Try again. Best to delete temporary image folder that was created to free up space |
| Auto Download failed    | Try again later, else try the manual approach instead.   |
| Issues with image copying  | Try deleting the partial copies/folders with errors and starting again|
| Missing hyperparameters in [Section 3.1.2](#312-load--update)  | Maybe you have a partially downloaded train.json. Download again.|




### 8.2. Google Colab


#### Caveats

- Speed is variable. Seems to be more demand in US working hours.
- Get booted out of session after a certain period of time. Due to:
  * Inactivity
  * Resource demands (i.e. number of users)
  * If you have been running it for a while.


  > **Tip** Free Colab is great for a test run, but for more serious usage it may be worth paying for Colab Pro to get access to more powerful GPUs and longer session times.





| Error    | Solution |
|----------|----------|
| Session crashed  | Re-run Sections [1](#prepare-computer-session-run-everytime) & [2](#3-load--prepare-data-run-everytime) where they are defined|
| No GPUs available | Dont train models now. Focus on other tasks and try training again later|





> **Tip:** If you are unsure of the errors, it might be worth starting the session again: i.e. re-running Sections [1](#prepare-computer-session-run-everytime) & [2](#3-load--prepare-data-run-everytime). If still getting errors, try **Runtime** > **Disconnect and Delete runtime** to reset the session to its original state and clear all variables. You will then need to re-run Sections [1](#prepare-computer-session-run-everytime) & [2](#3-load--prepare-data-run-everytime) again before continuing where you left off.


> **Note:** If you make an update to your Google Drive, there may be a short delay before the changes are reflected in Colab.