# NTUHPC Workshop: Accelerating ML Workflows with Aspire2A: The ML Part

This notebook is a continuation of the [Accelerating ML Workflows with Aspire2A](https://github.com/mrzzy/ntuhpc-workshops/tree/main/ml_aspire2a) workshop. Please refer to the README there for instructions on connecting to Aspire2A.

In this notebook, we will fine-tune [Ultralytics YOLOv11](https://github.com/ultralytics/ultralytics) model to recognize Rock-Paper-Scissors gestures using the [Roboflow Rock-Paper-Scissors dataset](https://universe.roboflow.com/roboflow-58fyf/rock-paper-scissors-sxsw).

> ⚠️ Before running the notebook ensure that you:
> - Upload this notebook to Aspire2A by following this [guide on Uploading](https://jupyterlab.readthedocs.io/en/stable/user/files.html#uploading-and-downloading) in the JupyterLab web interface.
> - Download the Rock Paper Scissors dataset to computer & upload the `rock-paper-scissors.v14i.yolov11.zip` into Aspire2A by following the same upload process.


## Running on Aspire2A
Check that we have access to Aspire2A's GPU:

In [1]:
!nvidia-smi

Tue Mar 25 11:49:58 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:03:00.0 Off |                    0 |
| N/A   42C    P0              51W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## Unpack Dataset
Unzip the dataset by running the command below:

In [2]:
!unzip -o rock-paper-scissors.v14i.yolov11.zip

Archive:  rock-paper-scissors.v14i.yolov11.zip
  inflating: README.dataset.txt      
  inflating: README.roboflow.txt     
  inflating: data.yaml               
 extracting: test/images/10e0gvm_jpg.rf.3b68a834fab647f30a57fc3ea92d4cd2.jpg  
 extracting: test/images/15208484cellblock_jpg.rf.95cbda1e169a66105fbf2aa22959a73b.jpg  
 extracting: test/images/19171_298_298_1_0_jpg.rf.0024dfb25d7b5a13a78e94fca47ef004.jpg  
 extracting: test/images/20061004021831_jpg.rf.8667d8aa5599deb901289c024eed4313.jpg  
 extracting: test/images/20220216_221550_jpg.rf.02a071a383151953fcf8671fc7fca3af.jpg  
 extracting: test/images/20220216_221819_jpg.rf.295ebb583293f91f74e1700f0ab0639a.jpg  
 extracting: test/images/20220216_221856_jpg.rf.c551cb3856f480cba36d6aa58e3300cd.jpg  
 extracting: test/images/20220216_222153_jpg.rf.a2bd5f6dd7833d67c9cb2e1d9ca298cc.jpg  
 extracting: test/images/20220216_222607_jpg.rf.2d3554cdf3b954df7e481bf1b22a1e47.jpg  
 extracting: test/images/CARDS_LIVINGROOM_B_T_frame_0124_jpg.

## Setup Dependencies
Install Dependencies

In [3]:
%pip install ultralytics==8.3.96 opencv-python-headless==4.11.0.86

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Note: you may need to restart the kernel to use updated packages.


Restart Jupyter kernel to make newly imported packages visible for import.

In [None]:
import os
os._exit(0)

Import Dependencies

In [1]:
from ultralytics import YOLO
from pathlib import Path

  from .autonotebook import tqdm as notebook_tqdm


## Fine Tune YOLO to Recognise Scissors Paper Stone
We will fine tune the YOLOv11 model to recognize new classes—in this case, hand gestures representing "Scissors, Paper, and Stone."

### 1. Load Pretrained YOLO Model
We start by loading the pretrained YOLO model (yolo11n.pt). This model has been trained on a large dataset (COCO) and provides a strong foundation for transfer learning. Instead of training from scratch, we leverage the existing knowledge from the pretrained weights to speed up training.

In [2]:
model = YOLO('yolo11n.pt')


### 2. Train the Model
We train the model for **5 epochs** using a **batch size of 64** on **GPU device 0**.  
- The `data.yaml` file specifies the dataset's structure, including class labels and image paths.  
- The image size (`imgsz=640`) ensures consistency in input dimensions.  
- `batch=64` sets the number of images processed per training step.  
- `device=0` ensures that training runs on the GPU for faster processing.  



In [3]:
model.train(
    data=(Path.cwd() / 'data.yaml').absolute(),
    epochs=5,                  
    imgsz=640,
    batch=64,
    device=0
)

Ultralytics 8.3.96 🚀 Python-3.8.13 torch-1.12.0a0+bd13bc6 CUDA:0 (NVIDIA A100-SXM4-40GB, 40514MiB)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolo11n.pt, data=/home/users/ntu/zzhu018/ntuhpc-workshops/ml_aspire2a/data.yaml, epochs=5, time=None, patience=100, batch=64, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train2, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False



Freezing layer 'model.23.dfl.conv.weight'
[34m[1mAMP: [0mrunning Automatic Mixed Precision (AMP) checks...
[34m[1mAMP: [0mchecks passed ✅


[34m[1mtrain: [0mScanning /home/users/ntu/zzhu018/ntuhpc-workshops/ml_aspire2a/train/labels.cache... 6455 images, 2516 backgrounds, 0 corrupt: 100%|██████████| 6455/6455 [00:00<?, ?it/s]
[34m[1mval: [0mScanning /home/users/ntu/zzhu018/ntuhpc-workshops/ml_aspire2a/valid/labels.cache... 576 images, 238 backgrounds, 0 corrupt: 100%|██████████| 576/576 [00:00<?, ?it/s]


Plotting labels to /home/users/ntu/zzhu018/runs/detect/train2/labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.001429, momentum=0.9) with parameter groups 81 weight(decay=0.0), 88 weight(decay=0.0005), 87 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added ✅
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to [1m/home/users/ntu/zzhu018/runs/detect/train2[0m
Starting training for 5 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        1/5      8.58G      1.236       3.35      1.435         77        640: 100%|██████████| 101/101 [00:20<00:00,  4.82it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  4.25it/s]


                   all        576        400      0.846     0.0192      0.234      0.117

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        2/5      10.9G      1.231      2.313       1.37         65        640: 100%|██████████| 101/101 [00:19<00:00,  5.28it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  4.50it/s]

                   all        576        400      0.278      0.464      0.336      0.197






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        3/5      10.9G      1.246       1.82      1.374         59        640: 100%|██████████| 101/101 [00:19<00:00,  5.31it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  4.66it/s]

                   all        576        400       0.53      0.411      0.453      0.256






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        4/5      10.9G      1.157      1.426      1.305         65        640: 100%|██████████| 101/101 [00:19<00:00,  5.32it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  4.71it/s]

                   all        576        400       0.76      0.765      0.825      0.563






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


        5/5      10.9G      1.042      1.155      1.238         77        640: 100%|██████████| 101/101 [00:18<00:00,  5.33it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  4.87it/s]

                   all        576        400      0.855      0.849      0.901       0.67






5 epochs completed in 0.030 hours.
Optimizer stripped from /home/users/ntu/zzhu018/runs/detect/train2/weights/last.pt, 5.5MB
Optimizer stripped from /home/users/ntu/zzhu018/runs/detect/train2/weights/best.pt, 5.5MB

Validating /home/users/ntu/zzhu018/runs/detect/train2/weights/best.pt...
Ultralytics 8.3.96 🚀 Python-3.8.13 torch-1.12.0a0+bd13bc6 CUDA:0 (NVIDIA A100-SXM4-40GB, 40514MiB)
YOLO11n summary (fused): 100 layers, 2,582,737 parameters, 0 gradients, 6.3 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 5/5 [00:01<00:00,  3.10it/s]


                   all        576        400      0.856      0.849      0.901      0.671
                 Paper        132        139      0.936      0.738      0.883       0.65
                  Rock        121        141      0.917      0.894      0.933      0.702
              Scissors        116        120      0.715      0.917      0.886       0.66
Speed: 0.6ms preprocess, 0.4ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to [1m/home/users/ntu/zzhu018/runs/detect/train2[0m


ultralytics.utils.metrics.DetMetrics object with attributes:

ap_class_index: array([0, 1, 2])
box: ultralytics.utils.metrics.Metric object
confusion_matrix: <ultralytics.utils.metrics.ConfusionMatrix object at 0x1526b5bc6400>
curves: ['Precision-Recall(B)', 'F1-Confidence(B)', 'Precision-Confidence(B)', 'Recall-Confidence(B)']
curves_results: [[array([          0,    0.001001,    0.002002,    0.003003,    0.004004,    0.005005,    0.006006,    0.007007,    0.008008,    0.009009,     0.01001,    0.011011,    0.012012,    0.013013,    0.014014,    0.015015,    0.016016,    0.017017,    0.018018,    0.019019,     0.02002,    0.021021,    0.022022,    0.023023,
          0.024024,    0.025025,    0.026026,    0.027027,    0.028028,    0.029029,     0.03003,    0.031031,    0.032032,    0.033033,    0.034034,    0.035035,    0.036036,    0.037037,    0.038038,    0.039039,     0.04004,    0.041041,    0.042042,    0.043043,    0.044044,    0.045045,    0.046046,    0.047047,
          0.04

### 3. Save & Download the Model
- Save to the model as `rps_model.pt` on Aspire2A by running the cell below.


In [4]:
model.save("rps_model.pt")

- Download the model file from Aspire2A into **your computer** by following this [guide on Downloading](https://jupyterlab.readthedocs.io/en/stable/user/files.html#uploading-and-downloading).

That concludes the Jupyter Notebook portion of the workshop. Proceed by following the next section of the [README](https://github.com/mrzzy/ntuhpc-workshops/blob/main/ml_aspire2a/README.md#running-the-model-on-a-webcam)