Framework for object detection and instance segmentation models from the YOLOv8 family


Environment with Python 3.8 or greater (3.11 suggested) and PyTorch (>=1.8). On devices with CUDA-enabled graphics cards, Nvidia CUDA toolkit version 10.0 or higher and the corresponding version of PyTorch must be installed.

Other packages required:

  • TorchVision (>=0.9.0)
  • MatPlotLib (>=3.2.2)
  • NumPy (>=1.22.2)
  • OpenCV (>=4.6.0)
  • Pillow (>=7.1.2)
  • PyYaml (>=5.3.1)
  • Requests (>=2.23.0)
  • SciPy (>=1.4.1)
  • tqdm (>=4.64.0)
  • Pandas (>=1.1.4)
  • Seaborn (>=0.11.0)
  • psutil
  • py-CPUinfo

as specified in requirements.txt. They can be installed using the following command:

pip install -r requirements.txt 

Repository structure

|_ configuration
    |_ training.yaml # training parameters
|_ models
    |_ data.yaml # data configuration file
    |_ # model weights file
|_ datasets
|_ inference_output
|_ training_output
|_ validation_results
|_ Code

Basic usage

Model weights and configuration

Framework can be used with custom yolo models. By default CModelML class loads weights from models directory - models/ Custom path can be used as well.

from ml_model.CModelML import CModelML as Model

# Default model initialization
c_Model = Model() # weights loaded from models/

# Model initialization with custom path
c_Model = Model('example_path\\')

# Model initialization with official YOLOv8 weights
c_Model = Model('') 

Additional parameters of CModelML class can be tweaked:

  • f_Thresh - confidence score threshold [float]
  • s_ForceDevice: force device (f.e. 'cpu', 'cuda:0') [str]
  • b_SAMPostProcess: enable additional post-processing with Segment Anything Model (SAM) [bool]

Inference and results

CModelML class takes as input ndarrays in a standard OpenCV format (shape=(H,W,3), dtype=np.uint8) or string with path to image in a '.jpg', '.jpeg' or '.png' format.

import cv2 as cv
from ml_model.CModelML import CModelML as Model
c_Model = Model(s_PathWeights='', f_Thresh=0.75) # Initialize model

# perform inference using ndarray
image = cv.imread('example_path\\example_image.jpg') # load image using openCV
results = c_Model.Detect(image)

# perform inference using path to image
results = c_Model.Detect('example_path\\example_image.jpg')

Model class returns results in a form of ImageResults class, which can be seen here.

When detecting or segmenting small objects in large images, tiling can be useful - it divides the input image into several smaller tiles, which are passed to the ML model. The results are merged and presented for the full resolution image.

from ml_model.CModelML import CModelML as Model
c_Model = Model(i_TileSize=500) # Initialize model with tiling enabled and tile shape of 500x500

Use of prepared scripts

Prepare dataset

Input data structure:

|_ class_names.txt # list of class names in plain, each class in a new line
|_ data
    |_ file1.txt # label file should have the '.txt' extension
    |_ file1.jpg # image file should have '.jpg', '.jpeg' or '.png' extension
  1. Run
  2. Select input folder with images and labels
  3. Select output dataset folder in desired directory - f.e. 'datasets/datastet-example'
  4. System will create a new dataset with the yaml configuration file and train, test, val subsets.

Output data structure:

|_ data.yaml # dataset configuration file
|_ train
    |_ file1.txt
    |_ file1.jpg
|_ val
|_ test

Train model

  1. Run
  2. Select model size
  3. Select output dataset folder in desired directory - f.e. 'datasets/datastet-example'
  4. Training output is saved to the training_output

Parameters in

  • i_Epochs - number of training epochs
  • i_BatchSize - training batch size
  • f_ConfThreshTest - confidence threshold during testing

Advanced parameters are stored in configuration/training.yaml.

|_ 20230101_000000 # Folder with training date
    |_ plots # metrics 
    |_ test_inference # inference on test subset
    |_ weights
        |_ # best weights
        |_ # last epoch weights
        |_ data.yaml # dataset configuration file

Validate model

  1. Run
  2. Select dataset folder
  3. Valdiation output is saved to the validation_results
|_ 20230101_000000 # Folder with validation date
    |_ results.json # Validation numeric results 

Output file structure:

    "mean_ap": "mAP50:95",
    "mean_ap50": "mAP50",
    "ap50": {
        "class_name": "AP50",
    "ap": {
        "class_name": "AP50:95",
    "mean_precission": "MEAN_PRECISSION",
    "mean_recall": "MEAN_RECALL",
    "precission": {
        "class_name": "PRECISSION",
    "recall": {
        "class_name": "RECALL",
    "mean_f1": "F1",
    "f1": {
        "class_name": "F1",

Inference and preview

  1. Run
  2. Select folder with input images
  3. Preview will be displayed in OpenCV GUI
  4. Preview output is saved to the inference_output as *.txt YOLO and *.json COCO results file
  5. Pressing 's' during preview will save the image file to disk, 'ESC' will close the script

Local files localization:

  • Weights file: models/
  • Data configuration file: models/data.yaml

Parameters in

  • f_Thresh - confidence threshold value

Inference on webcam feed

  1. Run
  2. Your camera feed will be displayed in OpenCV GUI

Parameters in

  • f_Thresh - confidence threshold value
  • i_TargetFPS - target FPS value

Cross-evaluation model

  1. Run
  2. Select input folder with images and labels
  3. Select output dataset folder in desired directory - f.e. 'datasets/datastet-example'
  4. Select model size
  5. System will split data into N segments, prepare models and perform cross validation.
  6. Cross validation output is saved to the validation_results
|_ CrossEval_20230101_000000 # Folder with validation date
    |_ results_final.json # Validation numeric results 

Parameters in

  • iNSegments - number of sub-datasets used during cross validation
  • i_Epochs - number of training epochs
  • i_BatchSize - training batch size
  • f_ConfThreshTest - confidence threshold during testing