# Project for CS5491 Artificial Intelligence
### NETWORK STRUCTURE OPTIMIZATION FOR YOLOV8 ALGORITHM

LAN JINGSEN 58158499

WANG GAOYU 58285673

This code is part of the project, we will based on the official yolov8 code, making links to https://github.com/ultralytics/ultralytics, we will improve on this code, finish our experiment. The following is a tutorial on replicating the experimental results, including defining deformable convolution, calling CBAM, adjusting the network structure of yolov8, and how to start training. This tutorial is to reflect how we implemented this project at the code level. Due to the computational power of colab, This tutorial will not complete the entire process of running training yolov8, and the specific experimental results will be reflected in the report.

First we need to clone the code from the official.

In [None]:
!git clone https://github.com/ultralytics/ultralytics.git

Now we need to install the environment.

In [4]:
!pip install ultralytics



Firstly, since the official code does not define the deformable convolution, we need to define the deformable convolution class in the "/content/ultralytics/ultralytics/nn/modules/block.py" file so that we can call it.

In [None]:
from torchvision.ops import DeformConv2d
class DeformableConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
        super(DeformableConv, self).__init__()
        # Offset convolution layer to generate offsets for deformable convolution
        self.offset_conv = nn.Conv2d(in_channels, 2 * kernel_size * kernel_size, kernel_size=kernel_size, stride=stride, padding=padding)
        # Mask convolution layer to generate masks for deformable convolution
        self.mask_conv = nn.Conv2d(in_channels, kernel_size * kernel_size, kernel_size=kernel_size, stride=stride, padding=padding)
        # Deformable convolution layer
        self.conv = DeformConv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=padding)

    def forward(self, x):
        # Generate offsets and masks
        offset = self.offset_conv(x)
        mask = self.mask_conv(x)
        # Normalize offsets to range [-1, 1]
        offset = 2 * torch.sigmoid(offset) - 1
        mask = torch.sigmoid(mask)
        # Perform deformable convolution
        out = self.conv(x, offset)
        # Element-wise multiplication with the mask
        out = out * mask
        return out

class C2f_DCN(nn.Module):
    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        super(C2f_DCN, self).__init__()
        # Calculate the number of hidden channels
        self.c = int(c2 * e)
        # First deformable convolution layer
        self.cv1 = DeformableConv(c1, 2 * self.c, kernel_size=1)
        # Second deformable convolution layer
        self.cv2 = DeformableConv((2 + n) * self.c, c2, kernel_size=1)
        # List of bottleneck modules
        self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))

    def forward(self, x):
        # Pass input through the first deformable convolution layer
        y = list(self.cv1(x).chunk(2, 1))
        # Pass the output of the first layer through the bottleneck modules
        y.extend(m(y[-1]) for m in self.m)
        # Pass the concatenated output through the second deformable convolution layer
        return self.cv2(torch.cat(y, 1))

    def forward_split(self, x):
        # Pass input through the first deformable convolution layer
        y = list(self.cv1(x).split((self.c, self.c), 1))
        # Pass the output of the first layer through the bottleneck modules
        y.extend(m(y[-1]) for m in self.m)
        # Pass the concatenated output through the second deformable convolution layer
        return self.cv2(torch.cat(y, 1))

After defining C2f_DCN, we need to add it to the “all” variable in this file. The complete “all" variable should look like the following:


In [None]:
__all__ = (
    "DFL",
    "HGBlock",
    "HGStem",
    "SPP",
    "SPPF",
    "C1",
    "C2",
    "C3",
    "C2f",
    "C2f_DCN", # here !!!!!!
    "C2fAttn",
    "ImagePoolingAttn",
    "ContrastiveHead",
    "BNContrastiveHead",
    "C3x",
    "C3TR",
    "C3Ghost",
    "GhostBottleneck",
    "Bottleneck",
    "BottleneckCSP",
    "Proto",
    "RepC3",
    "ResNetLayer",
    "RepNCSPELAN4",
    "ADown",
    "SPPELAN",
    "CBFuse",
    "CBLinear",
    "Silence",
)

/content/ultralytics/ultralytics/nn/modules/init.py

In [None]:
from .block import (
    C1,
    C2,
    C3,
    C3TR,
    DFL,
    SPP,
    SPPELAN,
    SPPF,
    ADown,
    BNContrastiveHead,
    Bottleneck,
    BottleneckCSP,
    C2f,
    C2f_DCN, # here !!!!!!
    ......
)
__all__ = (
    "Conv",
    "Conv2",
    "LightConv",
    "RepConv",
    "DWConv",
    "DWConvTranspose2d",
    "ConvTranspose",
    "Focus",
    "GhostConv",
    "ChannelAttention",
    "SpatialAttention",
    "CBAM",
    "Concat",
    "TransformerLayer",
    "TransformerBlock",
    "MLPBlock",
    "LayerNorm2d",
    "DFL",
    "HGBlock",
    "HGStem",
    "SPP",
    "SPPF",
    "C1",
    "C2",
    "C3",
    "C2f",
    "C2f_DCN", # here !!!!!!
    ......
)

/content/ultralytics/ultralytics/nn/tasks.py

In [None]:
from ultralytics.nn.modules import (
    AIFI,
    C1,
    C2,
    C3,
    C3TR,
    OBB,
    SPP,
    SPPELAN,
    SPPF,
    ADown,
    Bottleneck,
    BottleneckCSP,
    C2f,
    C2f_DCN, # here !!!
    C2fAttn,
    .....
)

Now, we also need to define CBAM. Unlike before, in the official code, CBAM is already defined as a class in "/content/ultralytics/ultralytics/nn/modules/conv.py", but it is not being called. Therefore, we need to repeat the previous steps to include CBAM in the components that can be called during model generation.

/content/ultralytics/ultralytics/nn/tasks.py

In [None]:
from ultralytics.nn.modules import (
    AIFI,
    CBAM, #here
    C1,
    C2,
    C3,
    C3TR,
    OBB,
    SPP,
    ........
)

Now we need to create a YAML file to define our improved YOLOv8 architecture. I will place it here: /content/ultralytics/ultralytics/cfg/models/v8/yolov8s-dcn-cbam.yaml. As designed in the report, the file content is as follows:

In [None]:
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs


# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 6, C2f_DCN, [128,128,1, True]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 6, C2f_DCN, [256, 256,1,True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9

# YOLOv8.0n head
head:
  - [-1,1,CBAM,[512]]
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 3, C2f, [512]] # 12

  - [-1,1,CBAM,[256]]
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 3, C2f, [256]] # 15 (P3/8-small)

  - [-1,1,CBAM,[128]]
  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]] # cat head P4
  - [-1, 3, C2f, [512]] # 18 (P4/16-medium)

  - [-1,1,CBAM,[256]]
  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]] # cat head P5
  - [-1, 3, C2f, [1024]] # 21 (P5/32-large)

  - [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)


Now we can write a training script to start our training.

In [None]:
from ultralytics import YOLO
 
model = YOLO(model="yolov8s-dcn-cbam.yaml")


 
data = "ultralytics/cfg/datasets/VOC.yaml"
 
model.train(data=data, epochs=100, batch=16)

In [2]:
!python /Users/senniko/Desktop/AI/ultralytics/train.py


Ultralytics YOLOv8.2.0 🚀 Python-3.9.19 torch-2.2.2 CPU (Apple M1 Pro)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolov8s-dcn-cbam.yaml, data=ultralytics/cfg/datasets/VOC.yaml, epochs=100, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_c

The model is already training properly, and the printed model architecture is exactly what we wanted. Due to the long training time, we are interrupting the training process here. The specific experimental results can be viewed in the report.