Skip to content

Collecting Custom Data

Vương Tuấn Khanh edited this page Aug 26, 2021 · 1 revision

📚 This guide explains custom data collection and processing. 🚀. UPDATED 26 August 2021.

Step 1: Gathering and Labeling a Custom Dataset

In order to create a custom object detector you need a good dataset of images and labels so that the detector can be efficiently trained to detect objects.

This can be done in two ways. through or through using Google images or creating your very own dataset and using an annotation tool to manually draw labels. (I recommend the first way!)

Method 1: Using Google's Open Images Dataset (RECOMMENDED)

This method is the method I recommend as you can gather thousands of images and auto-generate their labels within minutes! Gathering a dataset from Google's Open Images Dataset and using OIDv4 toolkit to generate labels is easy and time efficient. The dataset contains labeled images for over 600 classes! Explore the Dataset Here!

Creating a Custom YOLOv3 Dataset (Video)

Here is a link to The AI Guy github repository for the OIDv4 toolkit! Github Repo

For this tutorial I will be creating a safari animal object detector using data from Google's Open Images Dataset. I ran the following commands within the toolkit.

python main.py downloader --classes Elephant Giraffe Hippopotamus Tiger Zebra --type_csv train --limit 300 --multiclasses 1

This creates a folder with 300 images for each of Elephant, Giraffe, Hippopotamus, Tiger, and Zebra images. So 1500 images in total and saves them all to one folder.

Within the root OIDv4_ToolKit folder open the file classes.txt and edit it to have the classes you just downloaded, one per line.

Now convert the image annotations:

python convert_annotations.py

This converts all labels to YOLOv3 format which can now be used by darknet to properly train our custom object detector.

Remove the old Label folder in the OIDv4 toolkit which contains the non YOLOv3 formatted labels by running: (your file path will have a different name for Elephant_Giraffe.. depending on which classes you downloaded.

rm -r OID/Dataset/train/Elephant_Giraffe_Hippopotamus_Tiger_Zebra/Label/

If this command doesn't work on your machine then just go to the folder with Label and right click and hit Delete to manually delete it.

The folder with all your images and annotations should now look like this. Each image should have a text file with the same name.

You have sucessfully generated a custom YOLOv3 dataset! Congrats!

Method 2: Manually Labeling Images with Annotation Tool

If you can't find the proper images or classes within Google's Open Images Dataset then you will have to use an annotation tool to manually draw your labels which can be a tiresome process.

I have created a previous video where I walk through how to mass download images from Google Images along with how to use LabelImg, an annotation tool, in order to create a custom dataset for YOLOv3. Hit link below from The AI Guy to learn how.

Create Labels and Annotations for Custom YOLOv3 Dataset (Video)

After following the tutorial video you should now have a folder with images and text files like the one above.

You have successfully generated a custom YOLOv3 dataset! Congrats!

Step 2: Moving Your Custom Dataset Into Your Cloud VM

So now that you have your dataset properly formatted to be used for training we need to move it into this cloud VM so that when it comes the time we can actually use it for training.

I recommend renaming the folder with your images and text files on your local machine to be called 'obj' and then creating a .zip folder of the 'obj' folder. Then I recommend uploading the zip to your Google Drive. So you should now have obj.zip someplace in your Google drive.

This will greatly reduce the time it takes to transfer our dataset into our cloud VM.

Now we can copy in the zip and unzip it on your cloud VM.

# this is where my zip is stored (I created a yolov3 folder where I will get my required files from)
!ls /mydrive/yolov3

# copy the .zip file into the root directory of cloud VM
!cp /mydrive/yolov3/obj.zip ../

# unzip the zip file and its contents should now be in /darknet/data/obj
!unzip ../obj.zip -d data/