A notebook used to collect images through the user's webcam. And label using [Label Studio](https://labelstud.io).

This notebook is inspired by the notebook in this [Github repo](https://github.com/nicknochnack/TFODCourse) by [Nicholas Renotte](https://youtu.be/yqkISICHH-U?list=PLEG-yoFSLEJ7inw5rNtyqaFAlOHkyWU3t).

# 1. Import Packages

In [1]:
# cv2 is for opencv, which is used for image operations
# such as reading and processing images
import cv2
import uuid
import os
import time
import yaml

In [2]:
# uuid is to generate filename later
str(uuid.uuid1())

'752daa19-f9cd-11eb-9999-1cbfce4b3bed'

# 2. Define Specifications for Images to Collect

In [3]:
# defining the class names for the dataset, 
# e.g. '5' is for 5 cent coins
CLASS_NAMES = ['5', '10', '20', '50']
# define the number of images to be collected for each class
IMAGES_PER_CLASS = 10

# 3. Setup Folders 

In [4]:
IMAGE_DIR = os.path.join('Tensorflow', 'workspace', 'images', 'collectedimages')

# create the folder to save the images
if not os.path.exists(IMAGE_DIR):
    os.makedirs(IMAGE_DIR)

# create a folder for each class of images
# for label in CLASS_NAMES:
#     path = os.path.join(IMAGE_DIR, label)
#     if not os.path.exists(path):
#         os.makedirs(path)

In [5]:
IMAGE_DIR

'Tensorflow\\workspace\\images\\collectedimages'

# 3. Collecting images with your phone

After taking photos of the coins with your phone, convert them to JPG format online for easy processing if necessary.

NOTE: If you are using iPhone, you will face problem with their images because they are in HEIC format. I recommend using [this website](https://freetoolonline.com/heic-to-jpg.html) to convert all of them into JPG format.

# 4. Image Labelling

1. Open Anaconda prompt and activate your environment with `conda activate tfod`
2. Then run Label Studio by entering `label-studio`
3. Enter "n" if an error about JSON Support appeared
4. Create a local account to be used, please do not forget the email and the password
5. Click "Create" button at top right to create a new project
6. Enter a project name such as "Coin Detection - Phone" and click the "Save" button
7. Click the "Go to import"
8. Go to your phone images folder and drag all the images into Label Studio and click "Import"
9. Click "Label All Tasks" and you will be prompted to set up labelling configuration
10. Click "Go to setup" > "Browse Templates" > "Computer Vision" > "Object Detection with Bounding Boxes"
11. Delete both the existing labels ("Airplane" and "Car") by clicking the red "X" beside them
12. Add your label names by typing into the textbox under "Add label names" (see example in image below), and click "Add" > "Save"

![add-labelnames](images/add-labelnames.png)

13. Now click "Label all tasks" and start labeling the bounding boxes! 
14. Select the **class** such as '5' and draw the box on the coin. See example below.
15. After labelling each image, click the "Submit" at the right and continue until the Label Studio says you have finished labeling.
16. Finally, click the "Export" at the top right and make sure to select "Pascal VOC XML" before clicking "Export". Then you are done labeling!

![label-example](images/label-example.png)


Best practices on how to label images properly:
1. Label every object of interest in every image
2. Label the entirety of an object **tightly** and not cutting off parts of the object
3. Label occluded objects as if they were fully visible
4. Create specific label names that are not too general

Refer to the [link here](https://blog.roboflow.com/tips-for-how-to-label-images/) for more details.



Finally, use the code below to extract the zip file exported from Label Studio, and move them to the desired directory, `IMAGE_DIR`.

In [52]:
ANNOTATION_ZIP = r"T:\New Download Folder\project-5-at-2021-08-10-22-05-2bb3399d.zip"

from zipfile import ZipFile

with ZipFile(ANNOTATION_ZIP, 'r') as zipObj:
   # Extract all the contents of zip file to the IMAGE_DIR
   zipObj.extractall(IMAGE_DIR)

# 6. Split the dataset and move them into train, valid, and test folders

We will split the dataset into training, validation and testing datasets of ratio 80% : 15 % : 5%.

In [46]:
TEST_SIZE = 0.15
TRAIN_SIZE = round(1 - TEST_SIZE, 2)

# change this random seed if the split of images is good
# i.e. the test set should at least contain one image of multiple coins together
RANDOM_SEED = 41

In [54]:
from sklearn.model_selection import train_test_split

# get the image paths and sort them
image_paths = sorted(paths.list_images(os.path.join(IMAGE_DIR, "images")))
print(f"Total images = {len(image_paths)}")

# directory to annotation folder, don't need to change this
ANNOTATION_DIR = os.path.join(IMAGE_DIR, "Annotations")
# get the label paths and sort them to align with image paths
label_paths = sorted(os.listdir(ANNOTATION_DIR))
# append the directory at the front of the folder for full paths
label_paths = [os.path.join(ANNOTATION_DIR, i) for i in label_paths]

# split the dataset into ratio of train:test of 85%: 15%
print("Splitting into train:test dataset"
    f" with ratio of {TRAIN_SIZE:.2f}:{TEST_SIZE:.2f}")
X_train, X_test, y_train, y_test = train_test_split(
    image_paths, label_paths, test_size=TEST_SIZE, random_state=RANDOM_SEED
)
print(f"Total training images = {len(X_train)}")
print(f"Total testing images = {len(test_paths)}")

Total images = 29
Splitting into train:test dataset with ratio of 0.85:0.15
Total training images = 24
Total testing images = 5


In [56]:
SPLIT_IMAGE_DIR = os.path.join('Tensorflow', 'workspace', 'images')

def copy_images(image_paths, label_paths, data_type):
    assert data_type in ("train", "valid", "test")
    image_dest = os.path.join(SPLIT_IMAGE_DIR, data_type)

    print(f"[INFO] Copying files from {IMAGE_DIR} to {image_dest}")
    if os.path.exists(image_dest):
        # remove the existing images
        shutil.rmtree(image_dest, ignore_errors=False)
    
    if not os.path.exists(image_dest):
        # create new directories
        os.makedirs(image_dest)

    for image_path, label_path in zip(image_paths, label_paths):
        # copy the image file to the new directory
        shutil.copy2(image_path, image_dest)
        shutil.copy2(label_path, image_dest)

copy_images(X_train, y_train, "train")
copy_images(X_test, y_test, "test")
print("[INFO] Files copied successfully.")

[INFO] Copying files from Tensorflow\workspace\images\collectedimages to Tensorflow\workspace\images\train
[INFO] Copying files from Tensorflow\workspace\images\collectedimages to Tensorflow\workspace\images\test
[INFO] Files copied successfully.


# OPTIONAL - 7. Compress them for Colab Training

In [None]:
ARCHIVE_PATH = os.path.join('Tensorflow', 'workspace', 'images', 'archive.tar.gz')

In [None]:
!tar -czf {ARCHIVE_PATH} {TRAIN_PATH} {TEST_PATH}