# Custom Data Generation for YOLOv7+ with VOCDataset and Training of YOLOv11
1. Take pictures
1. Prepare object images
1. Prepare environment
1. Organize the object images
1. Download VOCDataset for backing of object images
1. Overlay objects on backing images and generate YOLO labels
1. Train model
1. Upload to [tools.luxonis.com](https://tools.luxonis.com) to convert to OpenVINO format
1. Export to Raspberry Pi

## Take Pictures
When taking the images of the object for training make sure to take them with conditions and angles similar to what the robot will see in competition. Also make sure that the image also has nothing important in it besides the object. 

Good examples: *ADD IMAGES*

## Prepare Object Images
Once you have your images head over to [remove.bg](https://remove.bg) and upload each image. Then open up each image without a background to GIMP or your photo editor of choice and crop the images until you are right at the edge of them down to the pixel.

Good example: *ADD IMAGES*

## Prepare Environment


Clone the ultralytics github which contains yolov11 and create other folders that we will need later:

In [4]:
import os
os.getcwd()

'd:\\ultralytics'

In [None]:
# !git clone --progress --verbose https://github.com/ultralytics/ultralytics

In [2]:
!mkdir objectImages
!mkdir ultralytics\Dataset\images\test
!mkdir ultralytics\Dataset\images\train
!mkdir ultralytics\Dataset\labels\test
!mkdir ultralytics\Dataset\labels\train

Install dependencies for yolov11 and all other ultralytics yolo versions:

In [1]:
!pip install ultralytics tqdm>=4.41.0 pillow

^C



[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


## Organize Object Images
Currently the file structure of important files should look like:
```
ultralytics
├── ultralytics
|     ├── Dataset
|     |     ├── images
|     |     └── labels
|     └── (folders with all of the yolo code and other things we will get to later)  
├── objectImages
└── (more yolo files and folders)
```

Rename all of the images to `*class*_*number*`<br>
ex: `blue_1`, `cone_29`, `banana_0`

Take all of the images you just edited and put them in the folder named `objectImages`

Now edit the list below to contain all of your class names. We will need this later for generating the dataset and training the model.

In [3]:
classNames = ["blueBalloon", "redBalloon"] # Edit
import os
CLASSES = {key: index for index, key in enumerate(classNames)} # DON'T EDIT
os.environ['CLASSES'] = str(CLASSES)

## Download VOCDataset for Backing of Object Images
We use the VOCDataset or Visualized Object Classes Dataset which is a dataset that contains many images with labels for training of pascal. We are just extracting the images from a few of the datasets for backing images for our yolo training images.

Download the 2007 and 2012 VOCDataset and put them in a separate directory (may take 10+ minutes depending on wifi):

In [None]:
!mkdir VOCdevkit
!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar -O ./VOCdevkit/VOCtrainval_06-Nov-2007.tar
!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar -O ./VOCdevkit/VOCtest_06-Nov-2007.tar
!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar -O ./VOCdevkit/VOCtrainval_11-May-2012.tar

Extract the downloaded .tar archive files (should create folders `VOC2007` and `VOC2012` in `VOCdevkit`):

In [None]:
!tar -xvf ./VOCdevkit/VOCtrainval_06-Nov-2007.tar
!tar -xvf ./VOCdevkit/VOCtest_06-Nov-2007.tar
!tar -xvf ./VOCdevkit/VOCtrainval_11-May-2012.tar

Now we convert VOC dataset to YOLO-format.

In [None]:
import xml.etree.ElementTree as ET
from tqdm import tqdm as progressBar
import os
import shutil
import argparse

# VOC dataset (refer https://github.com/meituan/YOLOv6/blob/main/yolov6/data/voc2yolo.py)
# VOC2007 trainval: 446MB, 5012 images
# VOC2007 test:     438MB, 4953 images
# VOC2012 trainval: 1.95GB, 17126 images

DATASET_YEARS = ('2012', 'train'), ('2012', 'val'), ('2007', 'train'), ('2007', 'val'), ('2007', 'test')

def voc2yolo(voc_path):
    for year, image_set in DATASET_YEARS:
        imgs_path = os.path.join(voc_path, 'images', f'{image_set}') # ./VOCdevkit/images/train
        print("Searching for image in:", imgs_path)

        # Sepperate images into train, test, val based on whether or not the number is in the text file
        # Makes it work with multiple files
        try:
            with open(os.path.join(voc_path, f'VOC{year}/ImageSets/Main/{image_set}.txt'), 'r') as f: # ./VOCdevkit/VOC2007/ImageSets/Main/train.txt
                image_ids = f.read().strip().split() # removes extra spaces from the lines and splits them along the "/n"
            if not os.path.exists(imgs_path): # Create the image path that we needed earlier
                os.makedirs(imgs_path)
                print(f'Creating {imgs_path}')

            # for each image ID it copies it into either the test, train, val folders
            for id in progressBar(image_ids, desc=f'{image_set}{year}'):
                f = os.path.join(voc_path, f'VOC{year}/JPEGImages/{id}.jpg')  # old img path
                if os.path.exists(f):
                    shutil.move(f, imgs_path)       # move image to new image path
        except Exception as e:
            print(f'[Warning]: {e} {year} {image_set} convert fail!')

    reorganizeImagePaths(voc_path)

def reorganizeImagePaths(voc_path):
    '''
    Generate backing dataset structure:
    train: # train images 16551 images
        - images/train
        - images/val
    val: # val images (relative to 'path')  4952 images
        - images/test
    '''
    dataset_root = os.path.join(voc_path, 'backingImages')
    print('='*20)
    print(f'dataset_root: {dataset_root}')
    if not os.path.exists(dataset_root):
        os.makedirs(dataset_root)
        print('Creating')
    
    file_structure = {'train': ['train', 'val'], 'test':['test']}
    for data_type, data_list in file_structure.items():
        for data_name in data_list:
            ori_path = os.path.join(voc_path, "images", data_name)
            new_path = os.path.join(dataset_root, data_type)
            if not os.path.exists(new_path):
                os.makedirs(new_path)

            print(f'[INFO]: Moving {ori_path} to {new_path}')
            for file in progressBar(os.listdir(ori_path)):
                shutil.move(os.path.join(ori_path, file), new_path)
    if os.path.exists(os.path.join(voc_path, "images")):
        shutil.rmtree(os.path.join(voc_path, "images"))

Now we run the code we just wrote on the directory of the VOC images:

In [None]:
voc2yolo('./VOCdevkit/')

## Overlay objects on backing images and generate YOLO labels

In [5]:
%%writefile ./dataGen.py
from PIL import Image
import random
import os
from tqdm import tqdm as progressBar
import threading
import multiprocessing
import concurrent.futures
import argparse

classNames = ["blueBalloon", "redBalloon"] # Edit
CLASSES = {key: index for index, key in enumerate(classNames)} # DON'T EDIT

# prompt: write a function using PIL to take a first PNG image with a transparent background, scale it down a certain percentage and paste it on a second PNG image and return the result.
def stackAndScaleImage(objectImage, backgroundImage, scalePercent, position):
  """
  Takes a first PNG image, scales it down a certain percentage and pastes it on a second PNG image.

  Args:
  objectImage: PIL image of object
  backgroundImage: PIL image of background
  scalePercent: Percentage to scale down the first image. (MAKE FLOAT INSTEAD OF DUMB STUFF)
  position: Tuple of (x, y) coordinates to paste the scaled image.
 
  Returns:
  A PIL Image object with the first image pasted on the second image.
  """

  objectImage = Image.open(objectImage)
  backgroundImage = Image.open(backgroundImage)

  # Scale down the first image
  width, height = objectImage.size                     # extract the width and height of object
  newWidth = int(width * scalePercent)            # create a new width for object based on the scalePercent and makes it an int
  newHeight = int(height * scalePercent)          # create a new height for object based on the scalePercent and makes it an int
  objectScaled = objectImage.resize((newWidth, newHeight)) # use the resize method of a PIL Image to scale object to the new_width and new_height
 
  #==== Scaled the hight to add more variation to the data
  width, height = objectScaled.size # Reset the width and height variables to be the new width and height of object after scaling
  randomHeightScale = random.uniform(0.8, 1.2) # Choose a random scalePercent between the minimum and maximum values.
  newHeight = int(height * randomHeightScale) # Calculate the newHeight based on the random scalePercent above
  if newHeight > backgroundImage.height:
    newHeight = backgroundImage.height
  objectScaled = objectScaled.resize((width, newHeight)) # Resize the image just like above
  #====

  backgroundImage.paste(objectScaled, position, objectScaled) # Paste the scaled image (object_scaled) on the second image without scaling the second image. (second parameter makes it so that the pixels with no value meant to be clear stay clear and aren't black)

  return backgroundImage # Return the final stacked image

# prompt: write a function called combine_images which uses the stack_scaled_images function defined above and scale object to a random value between some minimum percentage and some maximum percentage of the size of background.  The position for object to be pasted onto background is randomly selected within the bounds of background
# SCALE PERCENTAGE IS HOW MUH OF THE FRAME YOU WANT TO TAKE UP NOT HOW MUCH YOU WANT TO SCALE THE OBJECT IMAGE DOWN
def selectScaleAndCreateYoloLabels(objectPath, backgroundPath, minSizePercent, maxSizePercent, objectClassStr):
  """
  Combines two images by pasting a scaled version of an object onto a background.

  SCALES BASED ON WIDTH

  Args:
    objectPath: Path to the object image.
    backgroundPath: Path to the background image.
    minSizePercent: Minimum percentage to scale down object's WIDTH.
    maxSizePercent: Maximum percentage to scale down object's WIDTH.

  Returns:
    A PIL Image object with the combined images.
  """

  object = Image.open(objectPath) # Open the first image as object
  background = Image.open(backgroundPath) # Open the second image as background

  objectWidth, objectHeight = object.size # Extract the width and height of object as objectWidth and objectHeight
  backgroundWidth, backgroundHeight = background.size # Extract the width and height of background as backgroundWidth and backgroundHeight

  # Choose a random scalePercent between the minimum and maximum values provided as parameters
  # NEVER ABOVE 1.0(?)
  scalePercent = random.uniform(minSizePercent, maxSizePercent)

  # as background's width before applying this random scale percent (should really be same as shortest side?)
  # baseScale = backgroundWidth / objectWidth # The baseScale is the ratio of how big background's width is compared to object's width (ONLY WITH MEASUREMENTS)
  baseScale = backgroundWidth/objectWidth if backgroundWidth <= backgroundHeight else backgroundHeight/objectHeight # scale based on shortest side of background
  """
  print({
    "backgroundWidth": backgroundWidth,
    "object.width": object.width,
    "objectWidth": objectWidth,
    "baseScale": baseScale,
    "scalePercent": scalePercent,
    "baseScale * scalePercent": baseScale * scalePercent,
    "object.width * scalePercent/100": object.width * scalePercent/100,
    "object.height * scalePercent/100": object.height * scalePercent/100,
  })
  """

  scalePercent = baseScale * scalePercent # fix the scalePercent to include the baseScale between the 2 images and be proportional

  # Choose a random position for object in background
  widthOfObjectAfterScaling = int(object.width * scalePercent)   # predict width of object after scaling so you can choose a random width in bounds of background
  heightOfObjectAfterScaling = int(object.height * scalePercent) # predict height of object after scaling so you can choose a random height in bounds of background
  x = random.randint(0, backgroundWidth - widthOfObjectAfterScaling)     # select random x position for object in background

  # ERROR IS HERE: The error is because background isn't tall enough to fit object even after scaling so the randint is trying to selced from 0 to a negative number
  y = random.randint(0, backgroundHeight - heightOfObjectAfterScaling)   # select random y position for object in background

  combinedImage = stackAndScaleImage(objectPath, backgroundPath, scalePercent, (x, y)) # Use the stack_scaled_images function to combine the two images

  # Save the paste_parameters as a json object
  debugParameters = {
    "width_background": backgroundWidth,
    "height_background": backgroundHeight,
    "paste_x": x,
    "paste_y": y,
    "paste_width": widthOfObjectAfterScaling,
    "paste_height": heightOfObjectAfterScaling,
  }

  # https://docs.cogniflow.ai/en/article/how-to-create-a-dataset-for-object-detection-using-the-yolo-labeling-format-1tahk19/
  # YOLO labeling parameters
  pasteParametersYolo = {
    "objectClassNum": CLASSES[objectClassStr],
    "x_center": (x + widthOfObjectAfterScaling/2.0) / backgroundWidth, # calculate what percentage of the width of background the center of scaled object will be
    "y_center": (y + heightOfObjectAfterScaling/2.0) / backgroundHeight, # calculate what percentage of the height of background the center of scaled object will be
    "width": widthOfObjectAfterScaling / backgroundWidth, # calculate what percentage of the width of background does scaled object take
    "height": heightOfObjectAfterScaling / backgroundHeight, # calculate what percentage of the height of background does scaled object take
  }

  return (combinedImage, debugParameters, pasteParametersYolo) # return the PIL image object after stacking, the parameters used for pasting, and the parameters used for pasting in a YOLO suitable notation


# prompt: write a function which uses the combine_images_james function defined above to randomly select an object from directory1 and combine it from a random background from directory2
def combineRandomImages(directory1, directory2, minSizePercent, maxSizePercent):
  """
  Combines two random images from the two directories using the combine_images_james function.

  Args:
    directory1: Path to the first directory.
    directory2: Path to the second directory.
    minSizePercent: Minimum percentage to scale down the first image.
    maxSizePercent: Maximum percentage to scale down the first image.

  Returns:
    A PIL Image object with the combined images.
  """

  # Get all files in directories. If one isn't an image it could break
  objectFiles = os.listdir(directory1) # Get a list of all files in the first directory
  backgroundFiles = os.listdir(directory2) # Get a list of all files in the second directory

  imageChosen = random.choice(objectFiles)
  objectClass = imageChosen.split("_")[0]

  # Choose random images
  objectPath = os.path.join(directory1, imageChosen) # Choose a random file from the first directory
  backgroundPath = os.path.join(directory2, random.choice(backgroundFiles)) # Choose a random file from the second directory

  # Use the combine_images_james_xy function to combine the two images.
  combinedImage, debugParameters, pasteParametersYolo = selectScaleAndCreateYoloLabels(objectPath, backgroundPath, minSizePercent, maxSizePercent, objectClass)

  # print(pasteParametersYolo)

  return (combinedImage, debugParameters, pasteParametersYolo) # re-return all of the returns from the combine_images_james_xy function

# def makeImage(testOrTrain="test", numImages=10, minSizePercent=.05, maxSizePercent=.8, i=-1):
def makeImage(args):
  (testOrTrain, numImages, minSizePercent, maxSizePercent, i) = args
  if i == -1:
    raise ValueError("Invalid i Value")
  # Combine the images from directory1 (game object) with images from directory2 (backgrounds).
  combinedImage, debugParameters, pasteParametersYolo = combineRandomImages("./objectImages", "./VOCdevkit/backingImages/"+testOrTrain, minSizePercent, maxSizePercent)
  # Figure out a file name based on the current iteration and type of dataset
  baseFilename = f"{testOrTrain}_{i:0{6}d}" # Max 100000 file names
  # Save the image to the specified folder based on type of data set and use the above created filename
  combinedImage.save('./ultralytics/Dataset/images/'+testOrTrain+'/'+baseFilename+'.png')

  # Open/create a text file with the same name as the image and add the paste_parameters_yolo to it
  with open('./ultralytics/Dataset/labels/'+testOrTrain+'/'+baseFilename+'.txt', "w") as f:
    yoloData = pasteParametersYolo
    f.write(f"{yoloData['objectClassNum']} {round(yoloData['x_center'],6)} {round(yoloData['y_center'],6)} {round(yoloData['width'],6)} {round(yoloData['height'],6)}") # write data and round it to the correct decimal places (first digit in string is the class)


# Main function that ties together all of the other functions to make it work
# def makeData(testOrTrain="test", numImages=10, minSizePercent=.05, maxSizePercent=.8):
if __name__ == "__main__":
  parser = argparse.ArgumentParser()
  parser.add_argument("classes", nargs='?', type=dict)
  parser.add_argument("testOrTrain", nargs=1, default="test", type=str)
  parser.add_argument("numImages", nargs=1, default=2, type=int)
  parser.add_argument("minSizePercent", nargs=1, default=.05, type=float)
  parser.add_argument("maxSizePercent", nargs=1, default=.8, type=float)

  inputArgs = parser.parse_args()
  CLASSES = inputArgs.classes
  testOrTrain = inputArgs.testOrTrain
  numImages = inputArgs.numImages
  minSizePercent = inputArgs.minSizePercent
  maxSizePercent = inputArgs.maxSizePercent

  testOrTrain = testOrTrain.lower()

  args = tuple([(testOrTrain, numImages, minSizePercent, maxSizePercent, i) for i in range(numImages)]) # creates a tuple of tuple
  
  with multiprocessing.Pool() as p:
    print(p.map(makeImage, args))

Writing ./dataGen.py


In [6]:
!python dataGen.py classes=$CLASSES testOrTrain="test" numImages=100 minPercentSize=.5 maxPercentSize=.8

usage: dataGen.py [-h]
                  [classes] testOrTrain numImages minSizePercent
                  maxSizePercent
dataGen.py: error: argument classes: invalid dict value: "classes={'blueBalloon':"


In [4]:
%cd /ultralytics/
import os
os.getcwd()
# os.listdir("../ultralytics/objectImages")

d:\ultralytics


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


'd:\\ultralytics'

In [3]:
!mkdir objectImages
!mkdir ultralytics\Dataset\images\test
!mkdir ultralytics\Dataset\images\train
!mkdir ultralytics\Dataset\labels\test
!mkdir ultralytics\Dataset\labels\train

A subdirectory or file objectImages already exists.


In [None]:
import cProfile
import pstats
import sys
makeData('test', 1)
# with open('error_log.txt', 'w') as f:
    # original_stderr = sys.stderr
    # sys.stderr = f
# cProfile.run("makeData('test', 52)", "profileData.pstats")
# p = pstats.Stats('profileData.pstats')
# p.sort_stats('calls').print_stats()
    # sys.stderr = original_stderr
# makeData("train", 20000, .05, .4) # Change this number to change the number of training images that will be created
# makeData("test", 5000, .05, .4) # Change this number to change the number of test/val images that will be created

(('test', 1, 0.05, 0.8, 0),)
1
