# Image Preprocessing

On this notebook you can find the image preprocessing to be loaded into the model.

Steps for the image preprocessing: 
1. Collect images 
2. Setup Folders
3. TFOD Utils & Google Images Downloader
4. Collect Images
5. Image Labeling  

    5.1. Download the labelImg by tzutalin    
    5.2. Label all the image using the labelImg     
    
    
6. Split into train and test       
7. Compress images

In [None]:
import os
import shutil
import math

## 1. Collect Images

On this we are going to collect images to train and test our model. For this project there will be 5 classes of different drinks. This images are going to be stored on the <i>TensorFlow->workspace->images->collectedimages</i>
* Redbull
* Coca-cola
* Coca-cola Zero
* Coca-cola light
* Pepsi
* Pepsi Max
* 7-up
* Ice-tea
* TODO: Choose 5 from those

In [None]:
# Labels of the images
labels = ['Cola', 'Pepsi']
number_classes = len(labels)

## 2. Setup Folders

Create a folder to save the images collected. Sub-folder created for each class


In [None]:
images_collected_path  = os.path.join('Tensorflow', 'workspace', 'images', 'collectedimages')
images_root_path  = os.path.join('Tensorflow', 'workspace', 'images')

# Archive Path
archive_path = os.path.join('Tensorflow', 'workspace', 'images', 'archive.tar.gz')

# Training and test Paths
train_path = os.path.join('Tensorflow', 'workspace', 'images', 'train')
test_path = os.path.join('Tensorflow', 'workspace', 'images', 'test')

# Create folder collectedimages
if not os.path.exists(images_collected_path):
    os.makedirs(images_collected_path)
    
# Create folders for the different classes
for label in labels:
    path = os.path.join(images_collected_path, label)
    if not os.path.exists(path):
        os.makedirs(path)
        
# Create folders train, test sets
for split_set in ['test', 'train']:
    path = os.path.join(images_root_path, split_set)
    if not os.path.exists(path):
         os.makedirs(path)

## 3. TFOD Utils & Google Images Downloader
Cloning a repositories with useful functions


In [None]:
# Path for the scripts folder
scripts_path = os.path.join('Tensorflow','scripts')
split_script = os.path.join(scripts_path, 'train_test_split.py')

# Clone repo of TFOD-utils
if not os.path.exists(scripts_path):
    os.mkdir(scripts_path)
    if not any(os.scandir(scripts_path)):
        !git clone https://github.com/JPCLima/TFOD-utils {scripts_path}  

# Clone repo for the collected Images
google_images_path = os.path.join('Tensorflow','workspace', 'images', 'googleImages')
gi_downloader_script = os.path.join(scripts_path, 'gi_downloader.py')

# Clone repo of Google-Images-Downloader
if not os.path.exists(google_images_path):
    os.mkdir(google_images_path)
    if not any(os.scandir(google_images_path)):
        !git clone https://github.com/JPCLima/Google-Images-Downloader {google_images_path}  


## 4. Collect Images

To collect images will have 2 approaches:
* Take some photos using phone
* Download images from Google - [Repository](https://github.com/JPCLima/Google-Images-Downloader)


In [None]:
!pip install beautifulsoup4 requests

In [None]:
# Collect Images for each of the classes
search_words = ['cola-can', 'pespi-can']

# Google images path
gi_downloads_path = os.path.join('Tensorflow','workspace', 'images', 'googleImages', 'downloads')

# Create the img_downloaded from google images
if not os.path.exists(gi_downloads_path):
    os.mkdir(gi_downloads_path)

# Create folders for classes
for word in search_words:
    path = os.path.join(gi_downloads_path, word)
    if not os.path.exists(path):
        os.makedirs(path)
    !cd {google_images_path} && gi_downloader.py -f {os.path.abspath(path)} -k {word} -n {2}

## 5. Image Labeling
Label a images with the labelImg.
1. Download the repository
2. Label the images

### 5.1. Download labelImg

In [None]:
# Install dependencies
!pip install --upgrade pyqt5 lxml

In [None]:
# Store the path of the labelImg program
labelimg_path = os.path.join('Tensorflow', 'labelimg')

# Download the labelImg if there is no folder 
if not os.path.exists(labelimg_path):
    !mkdir {labelimg_path}
    !git clone https://github.com/tzutalin/labelImg {labelimg_path}
        
# Move to the resouces folder
!cd {labelimg_path} && pyrcc5 -o libs/resources.py resources.qrc

### 5.2. Label the Images

In [None]:
# Open the labelImg
!cd {labelimg_path} && python labelImg.py

## 6. Split into train and test

Split the collected data into:
* **Train set** 80%
* **Test set** 20%

In [None]:
# Split photos from taken by the phone
!cd {scripts_path} && python train_test_split.py -c {os.path.abspath(images_collected_path)} -t {os.path.abspath(train_path)} -e {os.path.abspath(test_path)}

In [None]:
# Split images downloaded from google
!cd {scripts_path} && python train_test_split.py -c {os.path.abspath(gi_downloads_path)} -t {os.path.abspath(train_path)} -e {os.path.abspath(test_path)}

## 7. Compress images

Compress the images to load load them in Google Colab. This file should be placed on the Images folder with the name archive.tar.gz

In [None]:
!tar -czf {archive_path} {train_path} {test_path}