# Data Augmenation On The PCB Defect Dataset

This notebook contains all the code needed to use TLT augmenation on subsets of the PCB defect dataset to showcase how augmenatation can be used to improve KPIs for small datasets. 

This notebook is required to run in the TLT Stream Analysytics container which can be found here. https://ngc.nvidia.com/catalog/containers/nvidia:tlt-streamanalytics

The github readme has steps on how pull and launch the container. 

This notebook also requires preprocess_pcb.py to be in the same directory to function. 

This notebook takes teh following steps
1) Download and unpack the PCB defect dataset

2) Convert the dataset to kitti format 

3) Split the dataset into test and train subsets

4) Generate offline augmenation spec file and apply augmentation to the training sets

5) Generate TF Records for the test and training sets

6) Downloads pretrained object detection weights needed for the trainings

7) Launch trainings and evaluation

The last section of this notebook contains all the commands needed to run training and evaluation on all 6 datasets.  
Steps 1-6 only need to run 1 time. The trainings in step 7 can be run in any order once steps 1-6 have successfully run. 
A common test set of 500 images is used for validation on all trainings

100 subset x1  
100 subset x10  
100 subset x20  
500 subset x1  
500 subset x10  
500 subset x20  


## Download and unpack the PCB defect dataset

In [None]:
!python3 -m pip install matplotlib==3.3.3

In [None]:
%cd /datasets/pcb_defect

In [None]:
!wget https://www.dropbox.com/s/h0f39nyotddibsb/VOC_PCB.zip 
!unzip VOC_PCB.zip

## Convert the dataset to kitti format

In [None]:
import os
from preprocess_pcb import convert_annotation, create_subset

In [None]:
os.makedirs("original/images", exist_ok=True)
os.makedirs("original/labels", exist_ok=True)
!cp -r VOC_PCB/JPEGImages/. original/images

In [None]:
#Setup Paths and make label folder
xml_label_path = "VOC_PCB/Annotations"
kitti_label_output = "original/labels"

#Convert labels to kitti and put into output folder
for x in os.listdir(xml_label_path):
    current_label_path = os.path.join(xml_label_path, x)
    convert_annotation(current_label_path, kitti_label_output)

## Split the dataset into test and train subsets

In [None]:
test_500 = "/tlt_exp/pcb_data_aug/test_500_list.txt"
train_100 = "/tlt_exp/pcb_data_aug/train_100_list.txt"
train_500 = "/tlt_exp/pcb_data_aug/train_500_list.txt"


os.makedirs("500_subset_test_x1", exist_ok=True)
os.makedirs("100_subset_train_x1", exist_ok=True)
os.makedirs("500_subset_train_x1", exist_ok=True)

create_subset("original", test_500, "500_subset_test_x1")
create_subset("original", train_100, "100_subset_train_x1")
create_subset("original", train_500, "500_subset_train_x1")

## Generate offline augmenation spec file and apply augmentation to the training sets

In [None]:
from preprocess_pcb import gen_random_aug_spec, combine_kitti, visualize_images
from random import randint

In [None]:
def generate_augments(dataset_folder, output_folder, num_augments):
    for i in range(0,num_augments):
        spec_out = os.path.join(output_folder, "aug_spec" + str(i) + ".txt")
        gen_random_aug_spec(600,600,"jpg", spec_out)
        !cat $spec_out

        aug_folder = os.path.join(output_folder, "aug" + str(i))
        !augment -a $spec_out -o $aug_folder -d $dataset_folder

        if i == 0:
            d1 = dataset_folder
            d2 = aug_folder
            d3 = os.path.join(output_folder, "combined_x2")
            combine_kitti(d1,d2,d3)
        else:
            d1 = os.path.join(output_folder, "combined_x" + str(i+1))
            d2 = aug_folder
            d3 = os.path.join(output_folder, "combined_x" + str(i+2))
            combine_kitti(d1,d2,d3)

In [None]:
dataset_folder = "100_subset_train_x1" #folder for the existing dataset to be augmented. This folder will not be modified
output_folder = "100_subset_train_aug" #folder for the augmented output. Does not need to exist
num_augments = 19 #number of augmented datasets to generate
os.makedirs(output_folder, exist_ok=True)

generate_augments(dataset_folder, output_folder, num_augments)

In [None]:
aug_choice = str(randint(0,num_augments-1))
visualize_images(os.path.join(output_folder, "aug"+aug_choice+"/images"), num_images=8)

In [None]:
dataset_folder = "500_subset_train_x1" #folder for the existing dataset to be augmented. This folder will not be modified
output_folder = "500_subset_train_aug" #folder for the augmented output. Does not need to exist
num_augments = 19 #number of augmented datasets to generate
os.makedirs(output_folder, exist_ok=True)

generate_augments(dataset_folder, output_folder, num_augments)

In [None]:
aug_choice = str(randint(0,num_augments-1))
visualize_images(os.path.join(output_folder, "aug"+aug_choice+"/images"), num_images=8)

Place important datasets in the dataset folder


In [None]:
!mv 100_subset_train_aug/combined_x10 /datasets/100_subset_train_x10
!mv 100_subset_train_aug/combined_x20 /datasets/100_subset_train_x20

!mv 500_subset_train_aug/combined_x10 /datasets/500_subset_train_x10
!mv 500_subset_train_aug/combined_x20 /datasets/500_subset_train_x20

## Generate TF Records for the test and training sets

In [None]:
def gen_tf_spec(dataset_path):

    spec_str = f"""
    kitti_config {{
      root_directory_path: "/datasets/pcb_data_aug/{dataset_path}"
      image_dir_name: "images"
      label_dir_name: "labels"
      image_extension: ".jpg"
      partition_mode: "random"
      num_partitions: 2
      val_split: 20
      num_shards: 10
    }}
    """
    return spec_str

In [None]:
dataset_paths = ["500_subset_test_x1", "500_subset_train_x1", "500_subset_train_x10", "500_subset_train_x20", "100_subset_train_x1", "100_subset_train_x10", "100_subset_train_x20"]
for path in dataset_paths:
    record_path = os.path.join("/datasets/pcb_data_aug", path, "tfrecord_spec.txt")
    record_output = os.path.join("/datasets/pcb_data_aug", path, "tfrecords_rcnn/")
    print("************" + record_path)
    with open(record_path, "w+") as spec:
        spec.write(gen_tf_spec(path))
    !detectnet_v2 dataset_convert -d $record_path -o $record_output

## Downloads pretrained object detection weights needed for the trainings

In [None]:
os.makedirs("/tlt_exp/models/fasterRCNN", exist_ok=True)
%cd /tlt_exp/models/fasterRCNN
!ngc registry model download-version "nvidia/tlt_pretrained_object_detection:resnet18"

## Launch trainings and evaluation

Each cell in this section will train and evaluate on 1 dataset in the experiment. The results will be output to the respective experiment folder. 

The trainings may take several hours depending on your hardware. 

In [None]:
%cd /tlt_exp/pcb_data_aug/experiments

In [None]:
!faster_rcnn train -e offline_online_aug/100_subset_train_x1/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_online_aug/100_subset_train_x1/training_spec.txt -k tlt_encode --log_file offline_online_aug/100_subset_train_x1/eval_log.txt

In [None]:
!faster_rcnn train -e offline_online_aug/100_subset_train_x10/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_online_aug/100_subset_train_x10/training_spec.txt -k tlt_encode --log_file offline_online_aug/100_subset_train_x10/eval_log.txt

In [None]:
!faster_rcnn train -e offline_online_aug/100_subset_train_x20/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_online_aug/100_subset_train_x20/training_spec.txt -k tlt_encode --log_file offline_online_aug/100_subset_train_x20/eval_log.txt

In [None]:
!faster_rcnn train -e offline_online_aug/500_subset_train_x1/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_online_aug/500_subset_train_x1/training_spec.txt -k tlt_encode --log_file offline_online_aug/500_subset_train_x1/eval_log.txt

In [None]:
!faster_rcnn train -e offline_online_aug/500_subset_train_x10/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_online_aug/500_subset_train_x10/training_spec.txt -k tlt_encode --log_file offline_online_aug/500_subset_train_x10/eval_log.txt

In [None]:
!faster_rcnn train -e offline_online_aug/500_subset_train_x20/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_online_aug/500_subset_train_x20/training_spec.txt -k tlt_encode --log_file offline_online_aug/500_subset_train_x20/eval_log.txt

In [None]:
!faster_rcnn train -e offline_aug/100_subset_train_x1/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_aug/100_subset_train_x1/training_spec.txt -k tlt_encode --log_file offline_aug/100_subset_train_x1/eval_log.txt

In [None]:
!faster_rcnn train -e offline_aug/100_subset_train_x10/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_aug/100_subset_train_x10/training_spec.txt -k tlt_encode --log_file offline_aug/100_subset_train_x10/eval_log.txt

In [None]:
!faster_rcnn train -e offline_aug/100_subset_train_x20/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_aug/100_subset_train_x20/training_spec.txt -k tlt_encode --log_file offline_aug/100_subset_train_x20/eval_log.txt

In [None]:
!faster_rcnn train -e offline_aug/500_subset_train_x1/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_aug/500_subset_train_x1/training_spec.txt -k tlt_encode --log_file offline_aug/500_subset_train_x1/eval_log.txt

In [None]:
!faster_rcnn train -e offline_aug/500_subset_train_x10/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_aug/500_subset_train_x10/training_spec.txt -k tlt_encode --log_file offline_aug/500_subset_train_x10/eval_log.txt

In [None]:
!faster_rcnn train -e offline_aug/500_subset_train_x20/training_spec.txt -k tlt_encode
!faster_rcnn evaluate -e offline_aug/500_subset_train_x20/training_spec.txt -k tlt_encode --log_file offline_aug/500_subset_train_x20/eval_log.txt