# Image Preprocessing

On this notebook you can find the image preprocessing to be loaded into the model.

Steps for the image preprocessing: 
1. Setup Project 
2. TFOD Utils & Google Images Downloader
3. Collect Images
4. Image Labeling     
5. Split into train and test       

## 1. Setup Project

<div class="alert alert-block alert-warning">
<b>Attention:</b> Set label names
</div>


In [None]:
import os
import shutil

# Labels of the images
labels = ['Pepsi', 'Cola']
number_classes = len(labels)

# Define paths
images_collected_path  = os.path.join('Tensorflow', 'workspace', 'images', 'collectedimages')
images_root_path  = os.path.join('Tensorflow', 'workspace', 'images')
train_path = os.path.join('Tensorflow', 'workspace', 'images', 'train')
test_path = os.path.join('Tensorflow', 'workspace', 'images', 'test')

# Create folder collectedimages
if not os.path.exists(images_collected_path):
    os.makedirs(images_collected_path)
    
# Create folders for the different classes
for label in labels:
    path = os.path.join(images_collected_path, label)
    if not os.path.exists(path):
        os.makedirs(path)
        
# Create folders train, test sets
for split_set in ['test', 'train']:
    path = os.path.join(images_root_path, split_set)
    if not os.path.exists(path):
         os.makedirs(path)

## 2. TFOD Utils & Google Images Downloader
Cloning a repositories with useful functions


In [None]:
# Path for the scripts folder
scripts_path = os.path.join('Tensorflow','scripts')
split_script = os.path.join(scripts_path, 'train_test_split.py')

# Clone repo of TFOD-utils
if not os.path.exists(scripts_path):
    os.mkdir(scripts_path)
    if not any(os.scandir(scripts_path)):
        !git clone https://github.com/JPCLima/TFOD-utils {scripts_path}  

# Clone repo for the collected Images
google_images_path = os.path.join('Tensorflow','workspace', 'images', 'googleImages')
gi_downloader_script = os.path.join(scripts_path, 'gi_downloader.py')

# Clone repo of Google-Images-Downloader
if not os.path.exists(google_images_path):
    os.mkdir(google_images_path)
    if not any(os.scandir(google_images_path)):
        !git clone https://github.com/JPCLima/Google-Images-Downloader {google_images_path}  


## 3. Collect Images

To collect images will have 2 approaches:
* Take some photos using phone
* Download images from Google - [Repository](https://github.com/JPCLima/Google-Images-Downloader)


In [None]:
!pip install beautifulsoup4 requests

In [None]:
# Collect Images for each of the classes
search_words = ['cola-can', 'pespi-can']

# Google images path
gi_downloads_path = os.path.join('Tensorflow','workspace', 'images', 'googleImages', 'downloads')

# Create the img_downloaded from google images
if not os.path.exists(gi_downloads_path):
    os.mkdir(gi_downloads_path)

# Create folders for classes
for word in search_words:
    path = os.path.join(gi_downloads_path, word)
    if not os.path.exists(path):
        os.makedirs(path)
    !cd {google_images_path} && gi_downloader.py -f {os.path.abspath(path)} -k {word} -n {100}

In [None]:
import cv2 
import time
import uuid

# Capture images
number_imgs = 10
for label in labels:
    cap = cv2.VideoCapture(0)
    print('Collecting images for {}'.format(label))
    time.sleep(5)
    for imgnum in range(number_imgs):
        print('Collecting image {}'.format(imgnum))
        ret, frame = cap.read()
        imgname = os.path.join(images_collected_path,label,label+'.'+'{}.jpg'.format(str(uuid.uuid1())))
        cv2.imwrite(imgname, frame)
        cv2.imshow('frame', frame)
        time.sleep(5)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
cap.release()
cv2.destroyAllWindows()

## 4. Image Labeling

In [None]:
# Install dependencies
!pip install --upgrade pyqt5 lxml

In [None]:
import os
# Store the path of the labelImg program
labelimg_path = os.path.join('Tensorflow', 'labelimg')

# Download the labelImg if there is no folder 
if not os.path.exists(labelimg_path):
    !mkdir {labelimg_path}
    !git clone https://github.com/tzutalin/labelImg {labelimg_path}
        
# Move to the resouces folder
!cd {labelimg_path} && pyrcc5 -o libs/resources.py resources.qrc

<div class="alert alert-block alert-warning">
<b>Attention:</b> Open LabelImg 
</div>

In [None]:
labelimg_path = os.path.join('Tensorflow', 'labelimg')
# Open labelImg
!cd {labelimg_path} && python labelImg.py

## 5. Syntethetic Images

<div class="alert alert-block alert-warning">
<b>Attention:</b> The bg and masks path must to have all the images to generate new ones.This scripts doesnt create new folder for bg and mask
</div>

In [None]:
!cd {scripts_path} && python generate_synthetic_images.py -p {os.path.abspath(images_root_path)} -n 100

## 5. Split into train and test

<div class="alert alert-block alert-info">
<b>Tip:</b> Slip the data into 9:1</div>

In [None]:
# Split photos from taken by the phone
!cd {scripts_path} && python train_test_split.py -c {os.path.abspath(images_collected_path)} -t {os.path.abspath(train_path)} -e {os.path.abspath(test_path)} 

In [None]:
# Split images downloaded from google
!cd {scripts_path} && python train_test_split.py -c {os.path.abspath(gi_downloads_path)} -t {os.path.abspath(train_path)} -e {os.path.abspath(test_path)}

In [None]:
# Google images path
random_images_path = os.path.join('Tensorflow','workspace', 'images', 'random_images')
# Split random images
!cd {scripts_path} && python train_test_split.py -c {os.path.abspath(random_images_path)} -t {os.path.abspath(train_path)} -e {os.path.abspath(test_path)}