![New Release: Accelerate YOLOv8](assets/yolov8.png)

# Accelerate Ultralytics YOLOv8 with Speedster

In [18]:
# model_name: str = "yolov8n-seg.pt"
model_name: str = 'yolov8n.pt'

Hi and welcome 👋

In this notebook we will discover how in just a few steps you can speed up the response time of deep learning model inference using the Speedster module from the open-source library nebullvm.

With Speedster's latest API, you can speed up models up to 10 times without any loss of accuracy (option A), or accelerate them up to 20-30 times by setting a self-defined amount of accuracy/precision that you are willing to trade off to get even lower response time (option B). To accelerate your model, Speedster takes advantage of various optimization techniques such as deep learning compilers (in both option A and option B), quantization, half accuracy, and so on (option B).

Let's jump to the code.

In [19]:
%env CUDA_VISIBLE_DEVICES=0

env: CUDA_VISIBLE_DEVICES=0


## Setup

### Install Speedster

In [3]:
!pip install speedster



In [4]:
!python -m nebullvm.installers.auto_installer --frameworks torch --compilers all

[32m2023-03-21 07:39:49[0m | [1mINFO    [0m | [1mRunning auto install of nebullvm dependencies[0m
--- Logging error ---
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_internal/utils/logging.py", line 177, in emit
    self.console.print(renderable, overflow="ignore", crop=False, style=style)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/rich/console.py", line 1673, in print
    extend(render(renderable, render_options))
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/rich/console.py", line 1305, in render
    for render_output in iter_render:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_internal/utils/logging.py", line 134, in __rich_console__
    for line in lines:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packag

### Install Ultralytics YOLOv8

In [5]:
!pip install ultralytics




## Load YOLOv8s

In [20]:
import torch
from optimized_yolo import OptimizedYOLO
from ultralytics import YOLO

yolo = YOLO(model_name)

Let's load a test dummy data and see the original output

In [21]:
test_data = torch.randn(1, 3, 640, 640)
yolo.model(test_data) # type: ignore

(tensor([[[6.08572e+00, 1.52113e+01, 1.89979e+01,  ..., 5.00097e+02, 5.30870e+02, 5.63326e+02],
          [6.47849e+00, 5.28680e+00, 4.31904e+00,  ..., 5.93075e+02, 5.95111e+02, 5.83680e+02],
          [1.22037e+01, 3.02355e+01, 3.56795e+01,  ..., 2.79548e+02, 2.17813e+02, 1.57350e+02],
          ...,
          [2.39736e-06, 2.30593e-06, 1.36928e-06,  ..., 2.73031e-06, 2.27376e-06, 2.61841e-06],
          [5.06497e-07, 3.91386e-07, 2.16624e-07,  ..., 1.12537e-06, 1.17422e-06, 1.24078e-06],
          [3.66924e-06, 3.12165e-06, 1.46373e-06,  ..., 2.33010e-06, 2.20597e-06, 2.09433e-06]]]),
 [tensor([[[[  8.35430,   4.35437,   2.47571,  ...,   1.64296,   7.18458,   5.24387],
            [  9.41179,   4.66860,   2.66888,  ...,   0.48668,   6.16137,   7.56632],
            [  8.84891,   5.14147,   2.92525,  ...,   2.56017,   6.27959,   7.80412],
            ...,
            [  9.26815,   5.26749,   2.72706,  ...,   4.12360,   6.40941,   9.01705],
            [  9.13272,   5.07713,   2.51412,

The original YOLOv8 model return as output a tuple where the first element is a tensor and the second is a list of tensors. Speedster currently supports only models that return only tensors, so we need to create a wrapper to overcome this issue:

In [22]:
class YOLOWrapper(torch.nn.Module):
    def __init__(self, yolo_model):
        super().__init__()
        self.model = yolo_model.model
    
    def forward(self, x, *args, **kwargs):
        res = self.model(x)
        return res[0], *tuple(res[1])
        
model_wrapper = YOLOWrapper(yolo)

## YOLOv8s Optimization with GPU

We can now optimize the model using speedster:

In [9]:
from speedster import optimize_model

# Provide some input data for the model    
input_data = [((torch.randn(1, 3, 640, 640), ), torch.tensor([0])) for i in range(100)]

# Run Speedster optimization
optimized_model = optimize_model(
  model_wrapper, input_data=input_data, metric_drop_ths=0.1, store_latencies=True
)

[32m2023-03-21 07:40:24[0m | [1mINFO    [0m | [1mRunning Speedster on CPU[0m
 Please install them to include them in the optimization pipeline.[0m
 Please install them to include them in the optimization pipeline.[0m
 Without them, some compilers may not work properly.[0m
[32m2023-03-21 07:41:23[0m | [1mINFO    [0m | [1mBenchmark performance of original model[0m
[32m2023-03-21 07:42:29[0m | [1mINFO    [0m | [1mOriginal model latency: 0.5072965788841247 sec/iter[0m
[32m2023-03-21 07:42:33[0m | [1mINFO    [0m | [1mOptimizing with PytorchBackendCompiler and q_type: None.[0m
[32m2023-03-21 07:42:59[0m | [1mINFO    [0m | [1mOptimized model latency: 0.5444438457489014 sec/iter[0m
[32m2023-03-21 07:42:59[0m | [1mINFO    [0m | [1mOptimizing with PytorchBackendCompiler and q_type: QuantizationType.DYNAMIC.[0m
[32m2023-03-21 07:43:26[0m | [1mINFO    [0m | [1mOptimized model latency: 0.5069680213928223 sec/iter[0m
[32m2023-03-21 07:43:26[0m | [1mINF

We can finally restore the original output format by wrapping the optimized model in a new class:

In [14]:

optimized_wrapper = OptimizedYOLO(optimized_model)

In [11]:
optimized_wrapper(test_data.cuda())

AssertionError: Torch not compiled with CUDA enabled

## YOLOv8s Optimization with CPU

In [23]:
from speedster import optimize_model, save_model, load_model
from ultralytics import YOLO

yolo = YOLO(model_name)
model_wrapper = YOLOWrapper(yolo)

# Provide some input data for the model    
input_data = [((torch.randn(1, 3, 640, 640), ), torch.tensor([0])) for i in range(100)]

# Run Speedster optimization
optimized_model = optimize_model(
  model_wrapper, input_data=input_data, metric_drop_ths=0.1, store_latencies=True, device="cpu"
)

optimized_wrapper = OptimizedYOLO(optimized_model)

[32m2023-03-21 08:16:13[0m | [1mINFO    [0m | [1mRunning Speedster on CPU[0m
 Please install them to include them in the optimization pipeline.[0m
 Please install them to include them in the optimization pipeline.[0m
 Without them, some compilers may not work properly.[0m
[32m2023-03-21 08:16:38[0m | [1mINFO    [0m | [1mBenchmark performance of original model[0m
[32m2023-03-21 08:17:07[0m | [1mINFO    [0m | [1mOriginal model latency: 0.2228161597251892 sec/iter[0m
[32m2023-03-21 08:17:10[0m | [1mINFO    [0m | [1mOptimizing with PytorchBackendCompiler and q_type: None.[0m
[32m2023-03-21 08:17:23[0m | [1mINFO    [0m | [1mOptimized model latency: 0.1975259780883789 sec/iter[0m
[32m2023-03-21 08:17:23[0m | [1mINFO    [0m | [1mOptimizing with PytorchBackendCompiler and q_type: QuantizationType.DYNAMIC.[0m
[32m2023-03-21 08:17:36[0m | [1mINFO    [0m | [1mOptimized model latency: 0.2031230926513672 sec/iter[0m
[32m2023-03-21 08:17:36[0m | [1mINF

In [24]:
optimized_wrapper(test_data)

(tensor([[[6.08573e+00, 1.52113e+01, 1.89979e+01,  ..., 5.00097e+02, 5.30870e+02, 5.63326e+02],
          [6.47850e+00, 5.28680e+00, 4.31905e+00,  ..., 5.93075e+02, 5.95111e+02, 5.83680e+02],
          [1.22037e+01, 3.02355e+01, 3.56795e+01,  ..., 2.79548e+02, 2.17813e+02, 1.57350e+02],
          ...,
          [2.39736e-06, 2.30594e-06, 1.36929e-06,  ..., 2.73031e-06, 2.27374e-06, 2.61840e-06],
          [5.06499e-07, 3.91384e-07, 2.16624e-07,  ..., 1.12537e-06, 1.17422e-06, 1.24078e-06],
          [3.66925e-06, 3.12165e-06, 1.46373e-06,  ..., 2.33010e-06, 2.20597e-06, 2.09432e-06]]]),
 [tensor([[[[  8.35431,   4.35437,   2.47571,  ...,   1.64295,   7.18458,   5.24387],
            [  9.41179,   4.66861,   2.66888,  ...,   0.48668,   6.16139,   7.56632],
            [  8.84891,   5.14147,   2.92525,  ...,   2.56016,   6.27959,   7.80411],
            ...,
            [  9.26816,   5.26749,   2.72706,  ...,   4.12358,   6.40942,   9.01704],
            [  9.13273,   5.07713,   2.51412,

## Save and reload the optimized model

We can easily save to disk the optimized model with the following line:

In [None]:
save_model(optimized_model, "model_save_path")

We can then load again the model:

In [16]:
optimized_model = load_model("model_save_path")


FileNotFoundError: [Errno 2] No such file or directory: 'model_save_path/metadata.json'

In [25]:
optimized_wrapper = OptimizedYOLO(optimized_model)

What an amazing result, right?!? Stay tuned for more cool content from the Nebuly team :) 

<center> 
    <a href="https://discord.com/invite/RbeQMu886J" target="_blank" style="text-decoration: none;"> Join the community </a> |
    <a href="https://nebuly.gitbook.io/nebuly/welcome/questions-and-contributions" target="_blank" style="text-decoration: none;"> Contribute to the library </a>
</center>

<center> 
    <a href="https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#key-concepts" target="_blank" style="text-decoration: none;"> How speedster works </a> •
    <a href="https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#documentation" target="_blank" style="text-decoration: none;"> Documentation </a> •
    <a href="https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#quick-start" target="_blank" style="text-decoration: none;"> Quick start </a> 
</center>